CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

Similar documents
Last Time: Value of Visualization

Data and Image Models

Data and Image Models

Data and Image Models

We will start at 2:05 pm! Thanks for coming early!

Visualization Re-Design

Last Time: Data and Image Models

Data+Dataset Types/Semantics Tasks

S. Rinzivillo DATA VISUALIZATION AND VISUAL ANALYTICS

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

University of Florida CISE department Gator Engineering. Visualization

Lecture 3: Data Principles

Multidimensional (Multivariate)

Machine Learning Chapter 2. Input

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Basic Concepts Weka Workbench and its terminology

What are we working with? Data Abstractions. Week 4 Lecture A IAT 814 Lyn Bartram

ARTIFICIAL INTELLIGENCE (CS 370D)

Grundlagen methodischen Arbeitens Informationsvisualisierung [WS ] Monika Lanzenberger

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Visual Encoding Design

Data Mining Practical Machine Learning Tools and Techniques

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Information Visualization

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

CSE4334/5334 Data Mining 4 Data and Data Preprocessing. Chengkai Li University of Texas at Arlington Fall 2017

Visual Computing. Lecture 2 Visualization, Data, and Process

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Information Visualization

MAT 155. Chapter 1 Introduction to Statistics. sample. population. parameter. statistic

Brief Contents. Foreword by Sarah Frostenson...xvii. Acknowledgments... Introduction... xxiii. Chapter 1: Creating Your First Database and Table...

Data Visualization. Fall 2016

Chapter 1 Introduction to Statistics

Data Has Shape. Did you know? Data has Shape! Examples. My Data What do you think the shape of height data for this class looks like?

Data analysis using Microsoft Excel

Data Analyst Nanodegree Syllabus

CP SC 8810 Data Visualization. Joshua Levine

INFORMATION VISUALIZATION

CS570 Introduction to Data Mining

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

MATH 117 Statistical Methods for Management I Chapter Two

Data mining, 4 cu Lecture 6:

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

Chapter Two: Descriptive Methods 1/50

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

DLM Mathematics Year-End Assessment Model Blueprint

Cartographic symbolization

DLM Mathematics Year-End Assessment Model Blueprint for New York State 1

DEPARTMENT OF HEALTH AND HUMAN SCIENCES HS900 RESEARCH METHODS

Relational Model, Relational Algebra, and SQL

Project II. argument/reasoning based on the dataset)

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

Nuts and Bolts Research Methods Symposium

DSC 201: Data Analysis & Visualization

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Contents NUMBER. Resource Overview xv. Counting Forward and Backward; Counting. Principles; Count On and Count Back. How Many? 3 58.

EECS 647: Introduction to Database Systems

INTRODUCTORY SPSS. Dr Feroz Mahomed Swalaha x2689

刘淇 School of Computer Science and Technology USTC

MODELS AND FRAMEWORKS. Information Visualization Fall 2009 Jinwook Seo SNU CSE

Data Visualization Principles for Scientific Communication

Correlation of Ontario Mathematics 2005 Curriculum to. Addison Wesley Mathematics Makes Sense

Data 8 Final Review #1

Data Mining: Exploring Data. Lecture Notes for Chapter 3

DATA ABSTRACTION & INTRO TO TABLEAU

Opening a Data File in SPSS. Defining Variables in SPSS

Input: Concepts, Instances, Attributes

STP 226 ELEMENTARY STATISTICS NOTES

Lecture 5: DATA MAPPING & VISUALIZATION. November 3 rd, Presented by: Anum Masood (TA)

TNM093 Tillämpad visualisering och virtuell verklighet. Jimmy Johansson C-Research, Linköping University

Computational Databases: Inspirations from Statistical Software. Linnea Passing, Technical University of Munich

Basic concepts and terms

Learning Objectives for Data Concept and Visualization

8.NS.1 8.NS.2. 8.EE.7.a 8.EE.4 8.EE.5 8.EE.6

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

DLM Mathematics Year-End Assessment Model Blueprint

Benjamin Adlard School 2015/16 Maths medium term plan: Autumn term Year 6

The Semiology of Graphics Pat Hanrahan Stanford University Representations

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Knowledge Discovery and Data Mining

CSC Advanced Scientific Computing, Fall Numpy

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

CS317 File and Database Systems

Part I. Fill in the blank. 2 points each. No calculators. No partial credit

USING SOFT COMPUTING TECHNIQUES TO INTEGRATE MULTIPLE KINDS OF ATTRIBUTES IN DATA MINING

Data Mining Concepts & Techniques

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

A Simple Guide to Using SPSS (Statistical Package for the. Introduction. Steps for Analyzing Data. Social Sciences) for Windows

IENG484 Quality Engineering Lab 1 RESEARCH ASSISTANT SHADI BOLOUKIFAR

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Week 2: Frequency distributions

74 Wyner Math Academy I Spring 2016

Perception Maneesh Agrawala CS : Visualization Fall 2013 Multidimensional Visualization

CORE BODY OF KNOWLEDGE MATH GRADE 6

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Data Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)

Implementation of Relational Operations

Transcription:

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang

Data Visualization Value of Visualization Data And Image Models Visualization Design Exploratory Data Analysis Adapted Slides from Jeffrey Heer at University of Washington

What is visualization? Transformation of the symbolic into the geometric [McCormick et al. 1987]... finding the artificial memory that best supports our natural means of perception. [Bertin 1967] The use of computer-generated, interactive, visual representations of data to amplify cognition. [Card, Mackinlay, & Shneiderman 1999] 3

Data 4

Visual Representation 5

Why visualization? Efficient use of Attention What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it. Herb Simon as quoted by Hal Varian Scientific American September 1995 6

Why create visualization? Answer questions (or discover them) (e.g., What is the silk road that travels from Europe to China?) Make decisions (e.g., stock market, monitoring system in hospitals) See data in context (e.g., map) Expand memory (e.g., multiplication) Find patterns (e.g., astronomy data, transaction) Present argument or tell a story (e.g., growth of Walmart: http://projects.flowingdata.com/walmart/) Inspire (e.g., textbook medicine, genome, DNA) 7

The Value of Visualization Record information Blueprints, photographs, seismographs, Analyze data to support reasoning Develop and assess hypotheses Discover errors in data Expand memory Find patterns Communicate information to others Share and persuade Collaborate and revise 8

Record information Leonardo da Vinci Map of Imola, created for Cesare Borgia (Up) Proportional of man (Left) 9

Support Reasoning Which animal has the most powerful brain? 10

The most powerful brain? 11

Communicate Information From the New York Times 1981 12

The Value of Visualization Record information Blueprints, photographs, seismographs, Analyze data to support reasoning Develop and assess hypotheses Discover errors in data Expand memory Find patterns Communicate information to others Share and persuade Collaborate and revise 13

Visualization Reference Model 14

Visualization Generation Process 15

Topics Properties of data Properties of images Mapping data to images 16

Data models vs. Conceptual models Data models are low level descriptions of the data (math abstraction) Math: Sets with operations on them Example: integers with + and operators Conceptual models are mental constructions Include semantics and support reasoning Examples (data vs. conceptual) (1D floats) vs. Temperature (3D vector of floats) vs. Space 17

Taxonomy of data types 1D (sets and sequences) Temporal 2D (maps) -- Spatial 3D (shapes) nd (relational) Trees (hierarchies) Networks (graphs) Combination: e.g., spatial + temporal, spatial + relational 18

Types of variables Physical types Characterized by storage format Characterized by machine operations Example: bool, short, int32, float, double, string, Abstract types Provide descriptions of the data May be characterized by methods May be organized into a hierarchy (e.g., ontology) 19

Abstract types of Variables Categorical (data that are counted) Nominal Ordinal Quantitative or Numerical (data that are measured) Interval Ratio Why is the type of variable important? The methods used to display, summarize, and analyze data depend on whether the variables are categorical or quantitative. 20

Categorical: Nominal Nominal Variables that are named, i.e. classified into one or more qualitative categories that describe the characteristic of interest no ordering of the different categories no measure of distance between values categories can be listed in any order without affecting the relationship between them Nominal variables are the simplest type of variable 21

Categorical: Ordinal Ordinal Variables that have an inherent order to the relationship among the different categories an implied ordering of the categories (levels) quantitative distance between levels is unknown distances between the levels may not be the same meaning of different levels may not be the same for different individuals 22

Quantitative/Numerical Interval Variables that have constant, equal distances between values, but the zero point is arbitrary. Ratio Variables have equal intervals between values, the zero point is meaningful, and the numerical relationships between numbers is meaningful. Continuous vs. discrete 23

Nominal, Ordinal and Quantitative N - Nominal (labels) Fruits: Apples, oranges, O Ordinal (ordered list) Quality of meat: Grade A, AA, AAA Q - Interval (Location of zero arbitrary) Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) Cannot compare directly Only differences (i.e. intervals) may be compared Q - Ratio (zero fixed) Physical measurement: Length, Mass, Temp, Counts and amounts Origin is meaningful 24

Level of Measurement Higher level variables can always be expressed at a lower level, but the reverse is not true. Q > O > N For example, Body Mass Index (BMI) is typically measured at an interval-level such as 23.4. BMI can be collapsed into lower-level Ordinal categories such as: >30: Obese 25-29.9: Overweight <25: Underweight or Nominal categories such as: Overweight Not overweight 25

Operations on N,O,Q Data Types N - Nominal (labels) Operations: =, O Ordinal (ordered list) Operations: =,, <, > Q - Interval (Location of zero arbitrary) Operations: =,, <, >, - Can measure distances or spans Q - Ratio (zero fixed) Operations: =,, <, >, -, % Can measure ratios or proportions 26

From data models to N,O,Q data types Data model 32.5, 54.0, -17.3, floats Conceptual model Temperature ( C) Data type Burned vs. Not burned (N) Hot, warm, cold (O) Continuous range of values (Q) 27

Example Sepal and petal lengths and widths for three species of iris [Fisher 1936]. 28

Example Sepal and petal lengths and widths for three species of iris [Fisher 1936]. 29

Relational data model Represent data as a table (relation) Each row (tuple) represents a single record Each record is a fixed-length tuple Each column (attribute) represents a single variable Each attribute has a name and a data type A table s schema is the set of names and data types A database is a collection of tables (relations) 30

Relational Algebra [Codd] Data transformations (sql) Projection (select) Selection (where) Sorting (order by) Aggregation (group by, sum, min, ) Set operations (union, ) Combine (inner join, outer join, ) 31

Statistical data model Variables or measurements Categories or factors or dimensions Observations or cases 32

Dimensions and Measures Dimensions: Discrete variables describing data Dates, categories of values (independent vars) Measures: Data values that can be aggregated Numbers to be analyzed (dependent vars) Aggregate as sum, count, average, std. deviation 33

Example: U.S. Census Data People: # of people in group Year: 1850 2000 (every decade) Age: 0 90+ Sex: Male, Female Marital Status: Single, Married, Divorced, 34

Example: U.S. Census People Year Age Sex Marital Status 2348 data points 35

Census: N, O, Q (R/I)? People Count Year Age Sex (M/F) Marital Status Q-Ratio Q-Interval (O) Q-Ratio (O) N N 36

Census: Measure or Dimension? People Count Year Age Sex (M/F) Marital Status Measure Dimension Dimension Dimension Dimension 37