IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram
|
|
- Julia Ward
- 5 years ago
- Views:
Transcription
1 IAT 355 Visual Analytics Data and Statistical Models Lyn Bartram
2 Exploring data Example: US Census People # of people in group Year # (every decade) Age # Sex (Gender) # Male, female Marital status # Single, Married, Divorced, 2348 data points Data and Image Models IAT 4355 Slide adapted from Jeff Heer
3 Census data: What type ( N, O, Q)? Example: US Census People Q- Ratio Year Q- interval Age Q - Ratio Sex (Gender) N Marital status N Data and Image Models IAT 4355 Slide adapted from Jeff Heer
4 Census data: What type ( N, O, Q)? Example: US Census People Count Measure (dependent variable) Year Dimension Age?? Sex (Gender) Dimension Marital status Dimension Data and Image Models IAT 4355 Slide adapted from Jeff Heer
5 Roll-up and Drill-Down Want to examine marital status in each decade Roll-up the data along the desired dimension Data and Image Models IAT 4355 Slide adapted from Jeff Heer
6 Roll-up and Drill-Down Need more detailed information? Drill-down into additional dimensions Data and Image Models IAT 4355 Slide adapted from Jeff Heer
7 Data and Image Models IAT 4355 Slide adapted from Jeff Heer
8 Distribution is important for understanding data Visualization helps us see relations or the trends of them - as visual patterns a lot of what we visualize are the descriptive statistics Example: mean income vs median income Need to ensure that the aggregate units of visualization are legit Rule: check your core units /variables. If hey are descriptive, look at the distribution Data and Image Models IAT
9 Example: job losses in US over time Data and Image Models IAT
10 Example: job losses in US over time Data and Image Models IAT
11 Data and Image Models IAT
12 Visualizing distribution We can t really tell much about this data set Even Min and Max are hard to see We can get a better idea of this data by looking at its distribution. Data Values X-axis labels Data and Image Models IAT
13 Data distribution Measures of dispersion characterise how spread out the distribution is, i.e., how variable the data are. Commonly used measures of dispersion include: 1. Range 2. Variance & Standard deviation 3. Coefficient of Variation (or relative standard deviation) 4. Inter-quartile range Data and Image Models IAT
14 Distribution and symmetry symmetric Median, mean and mode of symmetric, posi3vely and nega3vely skewed data positively skewed negatively skewed 14 January 15, 2014 Data Mining: Concepts and Techniques Adapted from Han, Kamber and Pei 2013
15 Properties of Normal Distribution Curve The normal (distribu3on) curve From μ σ to μ+σ: contains about 68% of the measurements (μ: mean, σ: standard devia3on) From μ 2σ to μ+2σ: contains about 95% of it From μ 3σ to μ+3σ: contains about 99.7% of it 15
16 Normal and Skewed Distributions 0.14 When data are skewed, the mean and SD can be misleading Skewness sk= 3(mean-median)/SD If sk> 1 then distribution is non-symetrical Negatively skewed Mean<Median Sk is negative Positively Skewed Mean>Median Sk is positive Data and Image Models IAT
17 Measuring the Dispersion of Data Quar3les, outliers and boxplots Quar%les: Q 1 (25 th percen3le), Q 3 (75 th percen3le) Inter- quar%le range: IQR = Q 3 Q 1 Five number summary: min, Q 1, median, Q 3, max Boxplot: ends of the box are the quar3les; median is marked; add whiskers, and plot outliers individually Outlier: usually, a value higher/lower than 1.5 x IQR Variance and standard devia%on (sample: s, popula,on: σ) Measure of how spread out the numbers are Adapted from Han, Kamber and Pei
18 Measures of variance Variance One measure of dispersion (deviation from the mean) of a data set. The larger the variance, the greater is the standard deviation. Standard Deviation the average deviation from the mean of a data set. Determines overall how spread out the data values are Variance and SD are critical in analysing your data distribution and determining how meaningful is the chosen average Data and Image Models IAT
19 Graphic Displays of Basic Statistical Descriptions Boxplot: graphic display of five- number summary Histogram: x- axis are values, y- axis repres. frequencies Sca?er plot: each pair of values is a pair of coordinates and plozed as points in the plane Quan%le plot: each value x i is paired with f i indica3ng that approximately 100 f i % of data are x i Quan%le- quan%le (q- q) plot: graphs the quan3les of one univariant distribu3on against the corresponding quan3les of another Adapted from Han, Kamber and Pei
20 Inter-quartile range The Median divides a distribution into two halves. The first and third quartiles (denoted Q 1 and Q 3 ) are defined as follows: 25% of the data lie below Q 1 (and 75% is above Q 1 ), 25% of the data lie above Q 3 (and 75% is below Q 3 ) The inter-quartile range (IQR) is the difference between the first and third quartiles, i.e. IQR = Q 3 - Q 1 Data and Image Models IAT
21 Outliers An outlier is an datum which does not appear to belong with the other data Outliers can arise because of a measurement or recording error or because of equipment failure during an experiment, etc. An outlier might be indicative of a sub-population, e.g. an abnormally low or high value in a medical test could indicate presence of an illness in the patient. Data and Image Models IAT
22 Box-plots A box-plot is a visual description of the distribution based on Minimum Q1 Median Q3 Maximum Useful for comparing large sets of data Data and Image Models IAT
23 Example 1: Box-plot Data and Image Models IAT
24 Outlier Boxplot Re-define the upper and lower limits of the boxplots (the whisker lines) as: Lower limit = Q IQR, and Upper limit = Q IQR Note that the lines may not go as far as these limits If a data point is < lower limit or > upper limit, the data point is considered to be an outlier. Data and Image Models IAT
25 Visualization of Data Dispersion: 3-D Boxplots 25 January 15, 2014 Data Mining: Concepts and Techniques
26 Histogram Q à Q à N Most common form: split data range into equal-sized bins and count the number of points from the data set that fall into the bin. Vertical axis: Frequency (i.e., counts for each bin) Horizontal axis: Response variable The histogram graphically shows the following: 1. center (i.e., the location) of the data; 2. spread (i.e., the scale) of the data; 3. skewness of the data; 4. presence of outliers; and 5. presence of multiple modes in the data. 26 Data and Image Models IAT 4355
27 Histograms Often Tell More than Boxplots n n The two histograms shown in the le\ may have the same boxplot representa3on n The same values for: min, Q1, median, Q3, max But they have rather different data distribu3ons 27
28 Plotting the distribution Determine a frequency table (bins) A histogram is a column chart of the frequencies Category Labels Frequency Frequency > >90 Scores Data and Image Models IAT
29 Issues with Histograms For small data sets, histograms can be misleading. Small changes in the data or to the bucket boundaries can result in very different histograms. Interactive bin-width example (online applet) For large data sets, histograms can be quite effective at illustrating general properties of the distribution. Histograms effectively only work with 1 variable at a time Difficult to extend to 2 dimensions, not possible for >2 So histograms tell us nothing about the relationships among variables Data and Image Models IAT
30 Scatter plot Provides a first look at bivariate data to see clusters of points, outliers, etc Each pair of values is treated as a pair of coordinates and plozed as points in the plane 30
31 Positively and Negatively Correlated Data The le\ half fragment is posi3vely correlated The right half is nega3vely correlated 31
32 Uncorrelated Data 32
33 Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired (x,y) sample data are plotted on a graph. The linear correlation coefficient r measures the strength of the linear relationship. Also called the Pearson correlation coefficient. Ranges from -1 to 1. r = 1 represents a perfect positive correlation. r = 0 represents no correlation r = -1 represents a perfect negative correlation Slide adapted from David Lippman's
34 Correlation Assesses the linear relationship between two variables Example: height and weight Strength of the association is described by a correlation coefficientr r = low, probably meaningless r = low, possible importance r = moderate correlation r = high correlation r =.8-1 very high correlation Can be positive or negative Pearson s, Spearman correlation coefficient Tells nothing about causation
35 Perfect positive Strong positive Positive correlation r = 1 correlation r = 0.99 correlation r = 0.80 Strong negative No Correlation Non-linear correlation r = r = 0.16 relationship Slide adapted from David Lippman's
36 Meanings r 2 represents the proportion of the variation in y that is explained by the linear relationship between x and y. Example: Using the heights and weights for a group of people, you find the correlation coefficient to be: r = 0.796, so r 2 = So we conclude that about 63.4% of the peoples weight can be explained by the relationship between height and weight. This suggests that 36.6% of the variation in weights cannot be explained by height. Slide adapted from David Lippman's
37 r 2 in Tableau
38 Example: Relationship between Tree Circumference and Height Height (ft) Circumference (ft) Slide adapted from David Lippman's
39 Relationship between Tree Circumference and Height Height (ft) Circumference (ft) Outliers can strongly influence the graph of the regression line and inflate the correlation coefficient. In the above example, removing the outlier drops the correlation coefficient from Slide adapted from David Lippman's r = to r =
40 Correlation Correlation Coefficient 0 Correlation Coefficient.3 Source: Altman. Practical Statistics for Medical Research
41 Correlation Correlation Coefficient -.5 Correlation Coefficient.7 Source: Altman. Practical Statistics for Medical Research
42 Summary Statistical models serve to inspect and categorise the nature of trends and relations between variables and fators (effects) Distribution is a critical element in deciding what statistical measures to use, should be the lens by which you determine the appropriate metric eyeballing your distribution is a first step in forming your next queries 42
43 Four sets of data with the same correlation of 0.816
44 Sheet a Avg. actual_score Avg. user_score Average of user_score vs. average of actual_score. Color shows details about action_name. 44
45 display Avg. actual_score chart treewithberry treewithoutber.. Average of actual_score for each display. Color shows details about action_name. Data and Image Models IAT
What are we working with? Data Abstractions. Week 4 Lecture A IAT 814 Lyn Bartram
What are we working with? Data Abstractions Week 4 Lecture A IAT 814 Lyn Bartram Munzner s What-Why-How What are we working with? DATA abstractions, statistical methods Why are we doing it? Task abstractions
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 10, 2013 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data
More information2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationAND NUMERICAL SUMMARIES. Chapter 2
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationMeasures of Dispersion
Measures of Dispersion 6-3 I Will... Find measures of dispersion of sets of data. Find standard deviation and analyze normal distribution. Day 1: Dispersion Vocabulary Measures of Variation (Dispersion
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationName Date Types of Graphs and Creating Graphs Notes
Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Data Exploration and Data Preprocessing Data and attributes Data exploration Data pre-processing 2 10 What is Data?
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationVocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.
5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationChapter 2 Describing, Exploring, and Comparing Data
Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative
More informationSummarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester
Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationCHAPTER 2 DESCRIPTIVE STATISTICS
CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of
More informationSTP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES
STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use
More informationUNIT 1A EXPLORING UNIVARIATE DATA
A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationCHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and
CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4
More informationStatistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.
Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationChapter 3 - Displaying and Summarizing Quantitative Data
Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationAP Statistics Summer Assignment:
AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 2 Original Slides: Jiawei Han and Micheline Kamber Modification: Li Xiong Data Mining: Concepts and Techniques 1 Chapter 2: Data Preprocessing Why preprocess
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationChapter 2 Modeling Distributions of Data
Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and
More informationSlide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques
SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationData can be in the form of numbers, words, measurements, observations or even just descriptions of things.
+ What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and
More informationRoad Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary
2. Data preprocessing Road Map Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2 Data types Categorical vs. Numerical Scale types
More informationBIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26
Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations
More informationMath 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency
Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationChapter 2: The Normal Distributions
Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and
More informationSTA Module 4 The Normal Distribution
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationSTA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationThe first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.
Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.
More informationM7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.
M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes. Population: Census: Biased: Sample: The entire group of objects or individuals considered
More informationUnivariate Statistics Summary
Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:
More informationChapter 5: The standard deviation as a ruler and the normal model p131
Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is
More information1.3 Graphical Summaries of Data
Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this
More informationChapter 1. Looking at Data-Distribution
Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw
More informationMAT 110 WORKSHOP. Updated Fall 2018
MAT 110 WORKSHOP Updated Fall 2018 UNIT 3: STATISTICS Introduction Choosing a Sample Simple Random Sample: a set of individuals from the population chosen in a way that every individual has an equal chance
More informationVCEasy VISUAL FURTHER MATHS. Overview
VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that
More informationResearch Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel
Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement
More informationSLStats.notebook. January 12, Statistics:
Statistics: 1 2 3 Ways to display data: 4 generic arithmetic mean sample 14A: Opener, #3,4 (Vocabulary, histograms, frequency tables, stem and leaf) 14B.1: #3,5,8,9,11,12,14,15,16 (Mean, median, mode,
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More informationLecture Notes 3: Data summarization
Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &
More informationSection 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc
Section 2-2 Frequency Distributions Copyright 2010, 2007, 2004 Pearson Education, Inc. 2.1-1 Frequency Distribution Frequency Distribution (or Frequency Table) It shows how a data set is partitioned among
More informationCHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)
CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is
More informationCHAPTER 3: Data Description
CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a
More informationStat 428 Autumn 2006 Homework 2 Solutions
Section 6.3 (5, 8) 6.3.5 Here is the Minitab output for the service time data set. Descriptive Statistics: Service Times Service Times 0 69.35 1.24 67.88 17.59 28.00 61.00 66.00 Variable Q3 Maximum Service
More informationDensity Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.
1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which
More informationProbability and Statistics. Copyright Cengage Learning. All rights reserved.
Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.6 Descriptive Statistics (Graphical) Copyright Cengage Learning. All rights reserved. Objectives Data in Categories Histograms
More informationChapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 3 Descriptive Measures Slide 3-2 Section 3.1 Measures of Center Slide 3-3 Definition 3.1 Mean of a Data Set The mean of a data set is the sum of the observations divided by the number of observations.
More informationName: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution
Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the
More information2.1: Frequency Distributions and Their Graphs
2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each
More informationChapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data
Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically
More informationBar Graphs and Dot Plots
CONDENSED LESSON 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs
More informationChapter 6 Normal Probability Distributions
Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central
More informationIT 403 Practice Problems (1-2) Answers
IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles
More informationLecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #
Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms
More informationCITS4009 Introduc0on to Data Science
School of Computer Science and Software Engineering CITS4009 Introduc0on to Data Science SEMESTER 2, 2017: CHAPTER 3 EXPLORING DATA 1 Chapter Objec0ves Using summary sta.s.cs to explore data Exploring
More informationCS378 Introduction to Data Mining. Data Exploration and Data Preprocessing. Li Xiong
CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data Exploration and Data Preprocessing Data and Attributes Data exploration Data pre-processing Data Mining: Concepts
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationCHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.
1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed
More information1. Descriptive Statistics
1.1 Descriptive statistics 1. Descriptive Statistics A Data management Before starting any statistics analysis with a graphics calculator, you need to enter the data. We will illustrate the process by
More informationChapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd
Chapter 3: Data Description - Part 3 Read: Sections 1 through 5 pp 92-149 Work the following text examples: Section 3.2, 3-1 through 3-17 Section 3.3, 3-22 through 3.28, 3-42 through 3.82 Section 3.4,
More informationLecture 1: Exploratory data analysis
Lecture 1: Exploratory data analysis Statistics 101 Mine Çetinkaya-Rundel January 17, 2012 Announcements Announcements Any questions about the syllabus? If you sent me your gmail address your RStudio account
More informationNo. of blue jelly beans No. of bags
Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in
More informationQuestion. Dinner at the Urquhart House. Data, Statistics, and Spreadsheets. Data. Types of Data. Statistics and Data
Question What are data and what do they mean to a scientist? Dinner at the Urquhart House Brought to you by the Briggs Multiracial Alliance Sunday night All food provided (probably Chinese) Contact Mimi
More informationSection 6.3: Measures of Position
Section 6.3: Measures of Position Measures of position are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different
More informationSelected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.
Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data
More informationappstats6.notebook September 27, 2016
Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using
More informationThings you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.
1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.
More information1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file
1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/
More informationChapter 2 Data Preprocessing
Chapter 2 Data Preprocessing CISC4631 1 Outline General data characteristics Data cleaning Data integration and transformation Data reduction Summary CISC4631 2 1 Types of Data Sets Record Relational records
More informationTMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS
To Describe Data, consider: Symmetry Skewness TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS Unimodal or bimodal or uniform Extreme values Range of Values and mid-range Most frequently occurring values In
More informationUnderstanding and Comparing Distributions. Chapter 4
Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables
More informationVisualizing and Exploring Data
Visualizing and Exploring Data Sargur University at Buffalo The State University of New York Visual Methods for finding structures in data Power of human eye/brain to detect structures Product of eons
More informationAverages and Variation
Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus
More information1 Overview of Statistics; Essential Vocabulary
1 Overview of Statistics; Essential Vocabulary Statistics: the science of collecting, organizing, analyzing, and interpreting data in order to make decisions Population and sample Population: the entire
More informationExploratory Data Analysis
Chapter 10 Exploratory Data Analysis Definition of Exploratory Data Analysis (page 410) Definition 12.1. Exploratory data analysis (EDA) is a subfield of applied statistics that is concerned with the investigation
More information