Information Visualization in Data Mining. S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto

Size: px
Start display at page:

Download "Information Visualization in Data Mining. S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto"

Transcription

1 Information Visualization in Data Mining S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto

2 Motivation Data visualization relies primarily on human cognition for value discovery; permits direct incorporation of human ingenuity and analytic capabilities into data mining; can very effectively deal with very large quantities of data; powerfully combines with machine-based discovery techniques.

3 Uses Explorative Analysis Data cleaning Provide hypotheses Confirmative Analysis Confirm or reject hypotheses Presentation Communicate your work

4

5 Calculated Properties of the Anscombe Data Sets mean of the x values = 9.0 mean of the y values = 7.5 equation of the leastsquared regression line is: y = x sums of squared errors (about the mean) = 110.0

6 Calculated Properties of the Anscombe Data Sets regression sums of squared errors (variance accounted for by x) = 27.5 residual sums of squared errors (about the regression line) = correlation coefficient = 0.82 coefficient of determination = 0.67

7 The Anscombe Data

8 Marley, 1885

9 Snow s Cholera Map, 1855

10

11 Graphical Excellence (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition) Graphical displays should: show the data induce the viewer to think about the substance, not the methodology avoid distorting what the data says present many numbers in a small space make large data sets coherent encourage the eye to compare different pieces of data reveal the data at several levels of detail (broad overview to fine structure) serve a reasonably clear purpose: description, exploration, tabulation, or decoration be closely integrated with the statistical and verbal descriptions of the data set.

12 Graphical Excellence Gives the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Nearly always multivariate. Requires telling the truth about the data. (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition)

13 Lie Factor=14.8 (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition)

14 Lie Factor Lie Factor sizeof effect shown ingraphic sizeof effect indata Lie Factor ( ) ( ) Require: 0.95<Lie Factor<1.05 (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition)

15 Using Area for One Dimensional Data Lie Factor=2.8 (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition)

16 More guidelines: The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. No legends: use labels on graph Graphics must not quote data out of context. (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition)

17 Data Ink Ratio Data ink Ratio total data ink ink usedto pr int the graphic Data ink Ratio = proportion of a graphic s ink devoted to the non-redundant display of data-information. Data ink Ratio=1.0-(proportion of a graphic that can be erased without loss of data-information) (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition)

18 Maximize Data Density data density of a graphic number of entriesinthe data matrix area of data graphic (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition)

19 Beware Chartjunk NO Isn t it remarkable that the computer can be programmed to draw like that. YES: My, what interesting data! (E.R. Tufte, The Visual Display of Quantitative Information, 2nd edition)

20 How to Say Nothing with Information Visualization Never include a color legend. Avoid annotation. Never mention error characteristics of the visualization method. When in doubt, smooth. Don t say how long it required to plot. Never compare your results with other data visualization techniques. Never cite references for the data. Claim generality but show results from a single data set. Use viewing angle to hide blemishes in 3D objects.

21 An Overview of Information Visualization Methods

22 Methods of Interest Scatterplot Matrices Parallel Coordinates Pixel Oriented Methods Icon based Methods Dimensional Stacking Treemap

23 Some websites of interest: Public_Domain_Software/ Visualization/

Tufte s Design Principles

Tufte s Design Principles Tufte s Design Principles CS 7450 - Information Visualization January 27, 2004 John Stasko HW 2 - Minivan Data Vis What people did Classes of solutions Data aggregation, transformation Tasks - particular

More information

visualizing q uantitative quantitative information information

visualizing q uantitative quantitative information information visualizing quantitative information visualizing quantitative information martin krzywinski outline best practices of graphical data design data-to-ink ratio cartjunk circos the visual display of quantitative

More information

Envisioning Information. Tufte s Design Principles. James Eagan

Envisioning Information. Tufte s Design Principles. James Eagan Envisioning Information Tufte s Design Principles James Eagan Adapted from slides by John Stasko Graphical excellence is the well-designed presentation of interesting data a matter of substance, of statistics,

More information

DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TORONTO CSC318S THE DESIGN OF INTERACTIVE COMPUTATIONAL MEDIA. Lecture March 1998

DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TORONTO CSC318S THE DESIGN OF INTERACTIVE COMPUTATIONAL MEDIA. Lecture March 1998 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TORONTO CSC318S THE DESIGN OF INTERACTIVE COMPUTATIONAL MEDIA Lecture 19 30 March 1998 PRINCIPLES OF DATA DISPLAY AND VISUALIZATION 19.1 Nature, purpose of

More information

Information Visualization. SWE 432, Fall 2016 Design and Implementation of Software for the Web

Information Visualization. SWE 432, Fall 2016 Design and Implementation of Software for the Web Information Visualization SWE 432, Fall 2016 Design and Implementation of Software for the Web Today What types of information visualization are there? Which one should you choose? What does usability

More information

Chapter 3. Determining Effective Data Display with Charts

Chapter 3. Determining Effective Data Display with Charts Chapter 3 Determining Effective Data Display with Charts Chapter Introduction Creating effective charts that show quantitative information clearly, precisely, and efficiently Basics of creating and modifying

More information

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots Plots & Graphs Why Should We Care? Everyone uses plots and/or graphs But most people ignore or are unaware of simple principles Default plotting tools (or default settings) are not always the best More

More information

Information Visualization Course

Information Visualization Course Information Visualization Course Informatica Umanistica Università di Pisa Lecture 2 Design Principles 3 rd March 2011 Emanuele Ruffaldi PERCRO - Scuola Superiore S.Anna Overview Graphical Integrity Design

More information

Introduction to Data Visualization

Introduction to Data Visualization Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

CPSC 481: Beyond Simple Screen Design

CPSC 481: Beyond Simple Screen Design CPSC 481: Beyond Simple Screen Design Creating visual representations Bertin s Visual variables: creating visual representations Tufte s guidelines: assessing visual representations Representations Good

More information

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix Paula Ahonen-Rainio Maa-123.3530 Visual Analysis in GIS 11.11.2015 Topics today YOUR REPORTS OF A-2 Thematic maps with charts

More information

EDWARD TUFTE. The Visual Display of Quantitative Information. Envisioning Information. Edward Tufte

EDWARD TUFTE. The Visual Display of Quantitative Information. Envisioning Information. Edward Tufte EDWARD TUFTE The Leonardo da Vinci of data. -The New York Times The Visual Display of Quantitative Information Envisioning Information Edward Tufte Edward Tufte Background Info Visual Display of Quantitative

More information

Cartographic Principles: Map design

Cartographic Principles: Map design MSc GIS: GIS Algorithms and Data Structures Cartographic Principles: Map design Martin Dodge (m.dodge@ucl.ac.uk) With Changes by Dan Ryan http://www.casa.ucl.ac.uk/martin/msc_gis/ some (scientific) rules

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

STK 573 Metode Grafik untuk Analisis dan Penyajian Data

STK 573 Metode Grafik untuk Analisis dan Penyajian Data STK 573 Metode Grafik untuk Analisis dan Penyajian Data Pertemuan 3 4 Penyajian Grafik dari Informasi Tim Dosen: Prof. Dr. Khairil Anwar Notodiputro Dr. Ir. Aji Hamim Wigena Dr. Agus M Soleh Outline Introduction

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms

More information

Raw Data. Statistics 1/8/2016. Relative Frequency Distribution. Frequency Distributions for Qualitative Data

Raw Data. Statistics 1/8/2016. Relative Frequency Distribution. Frequency Distributions for Qualitative Data Statistics Raw Data Raw data is random and unranked data. Organizing Data Frequency distributions list all the categories and the numbers of elements that belong to each category Frequency Distributions

More information

University of Florida CISE department Gator Engineering. Visualization

University of Florida CISE department Gator Engineering. Visualization Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to

More information

WCSD/NUES Educational Technology Animation

WCSD/NUES Educational Technology Animation Animation verification for you and your peer reviewer Animation (Frames, Flash, or Edge) Animation: Understand and control the timeline Explain the makeup of the objects (fill and line) Import graphics

More information

Problems With Using Microsoft Excel for Statistics

Problems With Using Microsoft Excel for Statistics Problems With Using Microsoft Excel for Statistics Jonathan D. Cryer (Jon-Cryer@uiowa.edu) Department of Statistics and Actuarial Science University of Iowa, Iowa City, Iowa Joint Statistical Meetings

More information

Engineering Graphics. Presentation graphics Levels of hardware visualization. Visual presentation of data. Sketching Drawing Drafting Solid modeling

Engineering Graphics. Presentation graphics Levels of hardware visualization. Visual presentation of data. Sketching Drawing Drafting Solid modeling Presentation graphics Levels of hardware visualization Sketching Drawing Drafting Solid modeling Visual presentation of data Presentation Graphics Always use landscape, not portrait layout Better fit to

More information

JMP 10 Student Edition Quick Guide

JMP 10 Student Edition Quick Guide JMP 10 Student Edition Quick Guide Instructions presume an open data table, default preference settings and appropriately typed, user-specified variables of interest. RMC = Click Right Mouse Button Graphing

More information

CIS 536/636 Introduction to Computer Graphics. Kansas State University. CIS 536/636 Introduction to Computer Graphics

CIS 536/636 Introduction to Computer Graphics. Kansas State University. CIS 536/636 Introduction to Computer Graphics Visualization, Part 1 of 3: Data (Quantities & Evidence) William H. Hsu Department of Computing and Information Sciences, KSU KSOL course pages: http://bit.ly/hgvxlh / http://bit.ly/evizre Public mirror

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a

More information

JMP Book Descriptions

JMP Book Descriptions JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked

More information

STA 490H1S Initial Examination of Data

STA 490H1S Initial Examination of Data Initial Examination of Data Alison L. Department of Statistics University of Toronto Winter 2011 Course mantra It s OK not to know. Expressing ignorance is encouraged. It s not OK to not have a willingness

More information

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,

More information

Visual Encoding Design

Visual Encoding Design CSE 442 - Data Visualization Visual Encoding Design Jeffrey Heer University of Washington Review: Expressiveness & Effectiveness / APT Choosing Visual Encodings Assume k visual encodings and n data attributes.

More information

CS Information Visualization Sep. 19, 2016 John Stasko

CS Information Visualization Sep. 19, 2016 John Stasko Multivariate Visual Representations 2 CS 7450 - Information Visualization Sep. 19, 2016 John Stasko Learning Objectives Explain the concept of dense pixel/small glyph visualization techniques Describe

More information

Making Science Graphs and Interpreting Data

Making Science Graphs and Interpreting Data Making Science Graphs and Interpreting Data Eye Opener: 5 mins What do you see? What do you think? Look up terms you don t know What do Graphs Tell You? A graph is a way of expressing a relationship between

More information

MIS2502: Data Analytics Principles of Data Visualization. Alvin Zuyin Zheng

MIS2502: Data Analytics Principles of Data Visualization. Alvin Zuyin Zheng MIS2502: Data Analytics Principles of Data Visualization Alvin Zuyin Zheng zheng@temple.edu http://community.mis.temple.edu/zuyinzheng/ Data visualization can: provide clear understanding of patterns in

More information

Outline. Design Principles and Graph Types. CS 795/895 Information Visualization Fall Dr. Michele C. Weigle

Outline. Design Principles and Graph Types. CS 795/895 Information Visualization Fall Dr. Michele C. Weigle CS 795/895 Information Visualization Fall 2012 Design Principles and Graph Types Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs795-f12/ Outline! Tables! Graph Basics! Design Principles! Graph

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Design of Experiments

Design of Experiments Seite 1 von 1 Design of Experiments Module Overview In this module, you learn how to create design matrices, screen factors, and perform regression analysis and Monte Carlo simulation using Mathcad. Objectives

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Activity Graphical Analysis with Excel and Logger Pro

Activity Graphical Analysis with Excel and Logger Pro Activity Graphical Analysis with Excel and Logger Pro Purpose Vernier s Logger Pro is a graphical analysis software that will allow you to collect, graph and manipulate data. Microsoft s Excel is a spreadsheet

More information

VEGETATION DESCRIPTION AND ANALYSIS

VEGETATION DESCRIPTION AND ANALYSIS VEGETATION DESCRIPTION AND ANALYSIS LABORATORY 5 AND 6 ORDINATIONS USING PC-ORD AND INSTRUCTIONS FOR LAB AND WRITTEN REPORT Introduction LABORATORY 5 (OCT 4, 2017) PC-ORD 1 BRAY & CURTIS ORDINATION AND

More information

Advanced Multivariate Continuous Displays and Diagnostics

Advanced Multivariate Continuous Displays and Diagnostics Advanced Multivariate Continuous Displays and Diagnostics 37 This activity explores advanced multivariate plotting. With the advanced diagnostic functions, output graphs are more readable and useful for

More information

CS Information Visualization Sep. 2, 2015 John Stasko

CS Information Visualization Sep. 2, 2015 John Stasko Multivariate Visual Representations 2 CS 7450 - Information Visualization Sep. 2, 2015 John Stasko Recap We examined a number of techniques for projecting >2 variables (modest number of dimensions) down

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Design World. Graphical Integrity. largely from Edward Tufte, The Visual Display of Quantitative Information, Graphics Press, 1983.

Design World. Graphical Integrity. largely from Edward Tufte, The Visual Display of Quantitative Information, Graphics Press, 1983. Design World Graphical Integrity largely from Edward Tufte, The Visual Display of Quantitative Information, Graphics Press, 1983. Graphical integrity Graphics can be a powerful communication tool Lies

More information

Visual Encoding Design

Visual Encoding Design CSE 442 - Data Visualization Visual Encoding Design Jeffrey Heer University of Washington Last Time: Data & Image Models The Big Picture task questions, goals assumptions data physical data type conceptual

More information

Hands-On Activities + Technology = Mathematical Understanding Through Authentic Modeling

Hands-On Activities + Technology = Mathematical Understanding Through Authentic Modeling Session #176, Beatini Hands-On Activities + Technology = Mathematical Understanding Through Authentic Modeling Hands-On Activities + Technology = Mathematical Understanding Through Authentic Modeling NCTM

More information

Polymath 6. Overview

Polymath 6. Overview Polymath 6 Overview Main Polymath Menu LEQ: Linear Equations Solver. Enter (in matrix form) and solve a new system of simultaneous linear equations. NLE: Nonlinear Equations Solver. Enter and solve a new

More information

Reporting Services Tips for the Stephen Few Fan

Reporting Services Tips for the Stephen Few Fan Reporting Services Tips for the Stephen Few Fan Meagan Longoria August 4, 2012 Presentation Information A copy of the presentation as well as links to sources and other helpful articles is posted on Google+.

More information

Error Analysis, Statistics and Graphing

Error Analysis, Statistics and Graphing Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your

More information

ACTIVITY TWO CONSTANT VELOCITY IN TWO DIRECTIONS

ACTIVITY TWO CONSTANT VELOCITY IN TWO DIRECTIONS 1 ACTIVITY TWO CONSTANT VELOCITY IN TWO DIRECTIONS Purpose The overall goal of this activity is for students to analyze the motion of an object moving with constant velocity along a diagonal line. In this

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Scientific Visualization

Scientific Visualization Scientific Visualization Topics Motivation Color InfoVis vs. SciVis VisTrails Core Techniques Advanced Techniques 1 Check Assumptions: Why Visualize? Problem: How do you apprehend 100k tuples? when your

More information

Visual Computing. Lecture 2 Visualization, Data, and Process

Visual Computing. Lecture 2 Visualization, Data, and Process Visual Computing Lecture 2 Visualization, Data, and Process Pipeline 1 High Level Visualization Process 1. 2. 3. 4. 5. Data Modeling Data Selection Data to Visual Mappings Scene Parameter Settings (View

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

Visualizing and Exploring Data

Visualizing and Exploring Data Visualizing and Exploring Data Sargur University at Buffalo The State University of New York Visual Methods for finding structures in data Power of human eye/brain to detect structures Product of eons

More information

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created

More information

InfoVis: a semiotic perspective

InfoVis: a semiotic perspective InfoVis: a semiotic perspective p based on Semiology of Graphics by J. Bertin Infovis is composed of Representation a mapping from raw data to a visible representation Presentation organizing this visible

More information

Exploring and Understanding Data Using R.

Exploring and Understanding Data Using R. Exploring and Understanding Data Using R. Loading the data into an R data frame: variable

More information

1. Data Analysis Yields Numbers & Visualizations. 2. Why Visualize Data? 3. What do Visualizations do? 4. Research on Visualizations

1. Data Analysis Yields Numbers & Visualizations. 2. Why Visualize Data? 3. What do Visualizations do? 4. Research on Visualizations Data Analysis & Business Intelligence Made Easy with Excel Power Tools Excel Data Analysis Basics = E-DAB Notes for Video: E-DAB-05- Visualizations: Table, Charts, Conditional Formatting & Dashboards Outcomes

More information

Chapter 1. Using the Cluster Analysis. Background Information

Chapter 1. Using the Cluster Analysis. Background Information Chapter 1 Using the Cluster Analysis Background Information Cluster analysis is the name of a multivariate technique used to identify similar characteristics in a group of observations. In cluster analysis,

More information

Data Analysis Guidelines

Data Analysis Guidelines Data Analysis Guidelines DESCRIPTIVE STATISTICS Standard Deviation Standard deviation is a calculated value that describes the variation (or spread) of values in a data set. It is calculated using a formula

More information

The Visual Display Of Quantitative Information Edward R Tufte

The Visual Display Of Quantitative Information Edward R Tufte The Visual Display Of Quantitative Information Edward R Tufte We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer,

More information

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool. Tina Memo No. 2014-004 Internal Report Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool. P.D.Tar. Last updated 07 / 06 / 2014 ISBE, Medical School, University of Manchester, Stopford

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Information Visualization

Information Visualization Information Visualization Introduction Inspired from Petra Isenberg petra.isenberg@inria.fr Why INFORMATION VISUALIZATION It is estimated that 800 exabyte (800x 10^19) of digital information will be generated

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

TNM093 Tillämpad visualisering och virtuell verklighet. Jimmy Johansson C-Research, Linköping University

TNM093 Tillämpad visualisering och virtuell verklighet. Jimmy Johansson C-Research, Linköping University TNM093 Tillämpad visualisering och virtuell verklighet Jimmy Johansson C-Research, Linköping University Introduction to Visualization New Oxford Dictionary of English, 1999 visualize - verb [with obj.]

More information

Learning Objectives for Data Concept and Visualization

Learning Objectives for Data Concept and Visualization Learning Objectives for Data Concept and Visualization Assignment 1: Data Quality Concept and Impact of Data Quality Summarize concepts of data quality. Understand and describe the impact of data on actuarial

More information

The Science of Data Visualization

The Science of Data Visualization Welcome # T C 1 8 The Science of Data Visualization Larry Silverstein Strategic Sales Consultant Tableau Start Your (Visualization) Engines Agenda The science of data visualization Best practices for building

More information

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Chapter 13 Multivariate Techniques. Chapter Table of Contents Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

Practical Bioinformatics

Practical Bioinformatics 4/25/2017 Mean def mean ( x ) : s = 0. 0 f o r i i n x : s += i return s / len ( x ) def mean ( x ) : return sum( x )/ f l o a t ( len ( x ) ) Standard Deviation σ x = N i (x i x) 2 N 1 Standard Deviation

More information

Midterm Exam Fundamentals of Computer Graphics (COMP 557) Thurs. Feb. 19, 2015 Professor Michael Langer

Midterm Exam Fundamentals of Computer Graphics (COMP 557) Thurs. Feb. 19, 2015 Professor Michael Langer Midterm Exam Fundamentals of Computer Graphics (COMP 557) Thurs. Feb. 19, 2015 Professor Michael Langer The exam consists of 10 questions. There are 2 points per question for a total of 20 points. You

More information

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,

More information

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed

More information

Math 263 Excel Assignment 3

Math 263 Excel Assignment 3 ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce

More information

Fitting a Polynomial to Heat Capacity as a Function of Temperature for Ag. by

Fitting a Polynomial to Heat Capacity as a Function of Temperature for Ag. by Fitting a Polynomial to Heat Capacity as a Function of Temperature for Ag. by Theresa Julia Zielinski Department of Chemistry, Medical Technology, and Physics Monmouth University West Long Branch, J 00764-1898

More information

Visualization as an Analysis Tool: Presentation Supplement

Visualization as an Analysis Tool: Presentation Supplement Visualization as an Analysis Tool: Presentation Supplement This document is a supplement to the presentation Visualization as an Analysis Tool given by Phil Groce and Jeff Janies on January 9, 2008 as

More information

Vocabulary Unit 2-3: Linear Functions & Healthy Lifestyles. Scale model a three dimensional model that is similar to a three dimensional object.

Vocabulary Unit 2-3: Linear Functions & Healthy Lifestyles. Scale model a three dimensional model that is similar to a three dimensional object. Scale a scale is the ratio of any length in a scale drawing to the corresponding actual length. The lengths may be in different units. Scale drawing a drawing that is similar to an actual object or place.

More information

Dealing with Data in Excel 2013/2016

Dealing with Data in Excel 2013/2016 Dealing with Data in Excel 2013/2016 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Scatterplot: The Bridge from Correlation to Regression

Scatterplot: The Bridge from Correlation to Regression Scatterplot: The Bridge from Correlation to Regression We have already seen how a histogram is a useful technique for graphing the distribution of one variable. Here is the histogram depicting the distribution

More information

Google Data Studio. Toronto, Ontario May 31, 2017

Google Data Studio. Toronto, Ontario May 31, 2017 Google Data Studio Toronto, Ontario May 31, 2017 Introductions Share with us: Your name, organization, and role How do you currently display and share data? e.g. Excel? PowerPoint? Dashboards in Google

More information

Graphs. The ultimate data structure. graphs 1

Graphs. The ultimate data structure. graphs 1 Graphs The ultimate data structure graphs 1 Definition of graph Non-linear data structure consisting of nodes & links between them (like trees in this sense) Unlike trees, graph nodes may be completely

More information

Data Analysis More Than Two Variables: Graphical Multivariate Analysis

Data Analysis More Than Two Variables: Graphical Multivariate Analysis Data Analysis More Than Two Variables: Graphical Multivariate Analysis Prof. Dr. Jose Fernando Rodrigues Junior ICMC-USP 1 What is it about? More than two variables determine a tough analytical problem

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles Topic Notes Multivariate Data & Tables and Graphs CS 7450 - Information Visualization Aug. 27, 2012 John Stasko Agenda Data and its characteristics Tables and graphs Design principles Fall 2012 CS 7450

More information

Year 10 General Mathematics Unit 2

Year 10 General Mathematics Unit 2 Year 11 General Maths Year 10 General Mathematics Unit 2 Bivariate Data Chapter 4 Chapter Four 1 st Edition 2 nd Edition 2013 4A 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 2F (FM) 1,

More information

Supplementary Material

Supplementary Material Supplementary Material Figure 1S: Scree plot of the 400 dimensional data. The Figure shows the 20 largest eigenvalues of the (normalized) correlation matrix sorted in decreasing order; the insert shows

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

IB Chemistry IA Checklist Design (D)

IB Chemistry IA Checklist Design (D) Taken from blogs.bethel.k12.or.us/aweyand/files/2010/11/ia-checklist.doc IB Chemistry IA Checklist Design (D) Aspect 1: Defining the problem and selecting variables Report has a title which clearly reflects

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

Lesson 13: Exploring Factored Form

Lesson 13: Exploring Factored Form Opening Activity Below is a graph of the equation y = 6(x 3)(x + 2). It is also the graph of: y = 3(2x 6)(x + 2) y = 2(3x 9)(x + 2) y = 2(x 3)(3x + 6) y = 3(x 3)(2x + 4) y = (3x 9)(2x + 4) y = (2x 6)(3x

More information