Exploratory Data Analysis! 1D!
|
|
- Amie Bailey
- 5 years ago
- Views:
Transcription
1 Exploratory Data Analysis! 1D!!
2 philosophical view
3 What is EDA? Paraphrasing John Tukey: It is a mindset (a willingness to look for what can be seen, whether or not it is an>cipated), a flexibility (let the data speak for themselves, explore lots of avenues) A way to make pictures (the picture-examining eye is the best finder we have of the wholly unan>cipated)
4 more concrete view
5 What is EDA? Verify expected rela>onships actually exist in the data Find unexpected structure in the data that must be accounted for Ensure the right ques>ons are being asked Generate addi>onal ques>ons to be considered Provide a basis for further data collec>on
6
7 There is a large emphasis on graphical exploration because it is the best way to reveal unanticipated structure. We want to plot the raw data plot simple statistics position objects and plots to maximize pattern recognition
8 one categorical variable
9 diamonds We examine a categorical variable by looking at counts
10 diamonds %>% count(cut) count() is a data manipulation verb table() is a vector function table(diamonds$cut)
11 da>ng_profiles %>% count(educa>on, sort = TRUE) %>% mutate(pct = 100 * n/sum(n))
12 da>ng_profiles %>% count(educa>on, sort = TRUE) %>% mutate(pct = 100 * n/sum(n)) Things to consider: 1. Group sizes 2. Vital few vs. trivial many 3. Missing data 4. Is there a natural ordering? 5. Recoding 6. Collapsing 7. Data type: chr vs. fct vs. ord
13 da>ng_profiles <- da>ng_profiles %>% mutate(educa>on = fct_explicit_na(educa>on)) levels(fct_infreq(da>ng_profiles$educa>on)) Things to consider: 1. Group sizes 2. Vital few vs. trivial many 3. Missing data 4. Is there a natural ordering? 5. Recoding 6. Collapsing 7. Data type: chr vs. fct vs. ord
14 da>ng_profiles %>% ggplot(aes(fct_rev(fct_infreq(educa>on)))) + geom_bar() + coord_flip() Use a bar plot to examine the distribu>on of a categorical variable
15 da>ng_profiles %>% ggplot(aes(fct_rev(fct_infreq(educa>on)))) + geom_bar() + coord_flip() Use a bar plot to examine the distribu>on of a categorical variable by default geom_bar() places the missingdata group last
16 Your Turn 1 Consider the pets variable in the dating_profiles dataset. Make a summary of the group counts and make a bar plot. Then, go through the checklist below and think about what you would consider doing. 1. Do group sizes vary a lot? 2. Are there a vital few and/or trivially many? 3. Any missing data? 4. Is there a natural ordering to the groups? 5. Should we consider recoding/collapsing?
17 da>ng_profiles %>% count(pets, sort = TRUE) %>% mutate(pct = 100 * n/sum(n)) da>ng_profiles %>% mutate(pets = fct_rev(fct_infreq(fct_explicit_na(pets)))) %>% ggplot(aes(pets)) + geom_bar() + coord_flip()
18
19 what about pie charts?
20 Pie charts Try placing the wedges in order from largest to smallest
21 Pie charts
22 Pie charts What is revealed about the data from this pie chart?
23 Pie charts
24 Pie charts Pie charts are awful because they encode information in angles and areas that are very difficult for humans to judge Graphics that make comparisons via position on a common scale are best
25 one numerical variable
26 diamonds We examine a numerical variable by looking at statistical summaries and binned counts (histograms)
27 diamonds %>% summarise( min = min(price), q1 = quan>le(price, 0.25), median = median(price), mean = mean(price), q3 = quan>le(price, 0.75), max = max(price) ) summarise() is a data manipulation verb summary(diamonds$price) summary() is a generic function
28 diamonds %>% summarise( min = min(price), q1 = quan>le(price, 0.25), median = median(price), mean = mean(price), q3 = quan>le(price, 0.75), max = max(price) ) summary(diamonds$price) Things to consider: 1. What values are most common? 2. Which values are rare/extreme? 3. Can you see unusual paferns? 4. Measures of center (mean, median, etc.) 5. Measures of spread (std. dev., IQR, etc.) 6. Skewness 7. Missing data
29 diamonds %>% ggplot(aes(price)) + geom_histogram(binwidth = 100) Use a histogram to examine the distribu>on of a numerical variable
30 diamonds %>% ggplot(aes(price)) + geom_histogram(binwidth = 100) + geom_vline(xintercept = mean(diamonds$price), color = "red") + geom_vline(xintercept = median(diamonds$price), color = "blue") sta>s>cs and other values can be overlaid
31 histogram examples
32 diamonds %>% filter(carat < 3) %>% ggplot(aes(carat)) + geom_histogram(binwidth = 0.01)
33 faithful
34 summary(faithful$erup>ons)
35 faithful %>% ggplot(aes(erup>ons)) + geom_histogram(binwidth = 0.25)
36 histograms on the density scale
37 diamonds %>% ggplot(aes(fct_rev(cut), y = stat(prop), group = 1)) + geom_bar() bar plots can be on a propor>on scale the height of each bar is the propor>on of values in that bar
38 diamonds %>% ggplot(aes(price, y = stat(density))) + geom_histogram(binwidth = 3000, color = "white") histograms can be on a density scale
39 diamonds %>% ggplot(aes(price, y = stat(density))) + geom_histogram(binwidth = 3000, color = "white") histograms can be on a density scale the area of each bin is the propor>on of values in that bin
40 top_movies %>% ggplot(aes(gross_adj, y = stat(density))) + geom_histogram(binwidth = 100, boundary = 300, color = "white")
41 top_movies %>% ggplot(aes(gross_adj, y = stat(density))) + geom_histogram(breaks = c(300, 400, 600, 1500), color = "white")
42 top_movies %>% ggplot(aes(gross_adj, y = stat(density))) + geom_histogram(breaks = c(300, 400, 1500), color = "white") no>ce that the density scale reasonably displays the distribu>on
43 top_movies %>% ggplot(aes(gross_adj)) + geom_histogram(binwidth = 100, boundary = 300, color = "white")
44 top_movies %>% ggplot(aes(gross_adj)) + geom_histogram(breaks = c(300, 400, 600, 1500), color = "white")
45 top_movies %>% ggplot(aes(gross_adj)) + geom_histogram(breaks = c(300, 400, 1500), color = "white") a count (or propor>on) scale loses the shape of the distribu>on
46 top_movies %>% ggplot(aes(gross_adj)) + geom_histogram(breaks = c(300, 400, 1500), color = "white") a count (or propor>on) scale loses the shape of the distribu>on this is not a histogram
47 top_movies %>% ggplot(aes(gross_adj, y = stat(density))) + geom_histogram(breaks = c(300, 400, 600, 1500), color = "white") On the density scale, the height of a bin is the density: the propor>on per unit on the horizontal axis
48 top_movies %>% ggplot(aes(gross_adj, y = stat(density))) + geom_histogram(breaks = c(seq(300, 400, by = 10), 600, 1500), color = "white") On the density scale, the height of a bin is the density: the propor>on per unit on the horizontal axis
49 top_movies %>% ggplot(aes(gross_adj, y = stat(density))) + geom_histogram(breaks = c(300, 350, 400, 450, 1500), color = "white") Recap: there are 25 movies represented in the [400,450) bin, and 92 in the [450, 1500) bin. But the last bin is much wider, so it is less crowded (less dense)
CITS4009 Introduc0on to Data Science
School of Computer Science and Software Engineering CITS4009 Introduc0on to Data Science SEMESTER 2, 2017: CHAPTER 3 EXPLORING DATA 1 Chapter Objec0ves Using summary sta.s.cs to explore data Exploring
More informationChapter 1. Looking at Data-Distribution
Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationTMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS
To Describe Data, consider: Symmetry Skewness TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS Unimodal or bimodal or uniform Extreme values Range of Values and mid-range Most frequently occurring values In
More informationName: Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations
Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: Chapter P: Preliminaries Section P.2: Exploring Data Example 1: Think About It! What will it look
More informationSummarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester
Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these
More information1.2. Pictorial and Tabular Methods in Descriptive Statistics
1.2. Pictorial and Tabular Methods in Descriptive Statistics Section Objectives. 1. Stem-and-Leaf displays. 2. Dotplots. 3. Histogram. Types of histogram shapes. Common notation. Sample size n : the number
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationLesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2
Topic 18 Set A Words survey data Topic 18 Set A Words Lesson 18-1 Lesson 18-1 sample line plot Lesson 18-1 Lesson 18-1 frequency table bar graph Lesson 18-2 Lesson 18-2 Instead of making 2-sided copies
More informationTopic (3) SUMMARIZING DATA - TABLES AND GRAPHICS
Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS 3- Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS A) Frequency Distributions For Samples Defn: A FREQUENCY DISTRIBUTION is a tabular or graphical display
More informationChapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response
Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: American River College Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative
More informationName Date Types of Graphs and Creating Graphs Notes
Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 2 Summarizing and Graphing Data 2-1 Overview 2-2 Frequency Distributions 2-3 Histograms
More informationChapter 2 Describing, Exploring, and Comparing Data
Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative
More informationChapter 5: The standard deviation as a ruler and the normal model p131
Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is
More informationMATH1635, Statistics (2)
MATH1635, Statistics (2) Chapter 2 Histograms and Frequency Distributions I. A Histogram is a form of bar graph in which: A. The width of a bar is designated by an interval or ratio data value and thus
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationCHAPTER 3: Data Description
CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a
More informationVisualizing the World
Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationMaking Science Graphs and Interpreting Data
Making Science Graphs and Interpreting Data Eye Opener: 5 mins What do you see? What do you think? Look up terms you don t know What do Graphs Tell You? A graph is a way of expressing a relationship between
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationDensity Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.
1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram
More informationOverview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution
Chapter 2 Summarizing & Graphing Data Slide 1 Overview Descriptive Statistics Slide 2 A) Overview B) Frequency Distributions C) Visualizing Data summarize or describe the important characteristics of a
More informationStatistical transformations
Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn
More informationChapter 2: Graphical Summaries of Data 2.1 Graphical Summaries for Qualitative Data. Frequency: Frequency distribution:
Chapter 2: Graphical Summaries of Data 2.1 Graphical Summaries for Qualitative Data Frequency: Frequency distribution: Example 2.1 The following are survey results from Fall 2014 Statistics class regarding
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationUnderstanding and Comparing Distributions. Chapter 4
Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables
More informationChapter 2 Descriptive Statistics. Tabular and Graphical Presentations
Chapter 2 Descriptive Statistics Tabular and Graphical Presentations Frequency Distributions Frequency distribution tabular summary of data showing the number of items that appear in non-overlapping classes.
More informationIT 403 Practice Problems (1-2) Answers
IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles
More informationCHAPTER 2. Objectives. Frequency Distributions and Graphs. Basic Vocabulary. Introduction. Organise data using frequency distributions.
CHAPTER 2 Objectives Organise data using frequency distributions. Distributions and Graphs Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives. Represent
More informationBIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA
BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the
More informationNOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA
NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA Statistics is concerned with scientific methods of collecting, recording, organising, summarising, presenting and analysing data from which future
More informationChapter 2: Descriptive Statistics
Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,
More informationCHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.
1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed
More informationUnit I Supplement OpenIntro Statistics 3rd ed., Ch. 1
Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 KEY SKILLS: Organize a data set into a frequency distribution. Construct a histogram to summarize a data set. Compute the percentile for a particular
More informationBar Charts and Frequency Distributions
Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats
More informationBox Plots. OpenStax College
Connexions module: m46920 1 Box Plots OpenStax College This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License 3.0 Box plots (also called box-and-whisker
More informationChapter 2 - Frequency Distributions and Graphs
1. Which of the following does not need to be done when constructing a frequency distribution? A) select the number of classes desired B) find the range C) make the class width an even number D) use classes
More informationCHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and
CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4
More informationCreate a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?
A frequency table is a table with two columns, one for the categories and another for the number of times each category occurs. See Example 1 on p. 247. Create a bar graph that displays the data from the
More informationChapter 3 - Displaying and Summarizing Quantitative Data
Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationStat Day 6 Graphs in Minitab
Stat 150 - Day 6 Graphs in Minitab Example 1: Pursuit of Happiness The General Social Survey (GSS) is a large-scale survey conducted in the U.S. every two years. One of the questions asked concerns how
More information2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationSection 6.3: Measures of Position
Section 6.3: Measures of Position Measures of position are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different
More informationChapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data
Chapter 2 Organizing and Graphing Data 2.1 Organizing and Graphing Qualitative Data 2.2 Organizing and Graphing Quantitative Data 2.3 Stem-and-leaf Displays 2.4 Dotplots 2.1 Organizing and Graphing Qualitative
More informationDataView Features. Input Data Formats. Current Release
DataView Features Input Data Formats STDF, ATDF NI-CSV, generic CSV, others WAT (fab parameters) Open Compressed (GZip) versions of any of the above Merge data files of any of the above types Link to existing
More information2.1: Frequency Distributions
2.1: Frequency Distributions Frequency Distribution: organization of data into groups called. A: Categorical Frequency Distribution used for and level qualitative data that can be put into categories.
More informationAND NUMERICAL SUMMARIES. Chapter 2
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationAverages and Variation
Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus
More informationChapter 2. Frequency Distributions and Graphs. Bluman, Chapter 2
Chapter 2 Frequency Distributions and Graphs 1 Chapter 2 Overview Introduction 2-1 Organizing Data 2-2 Histograms, Frequency Polygons, and Ogives 2-3 Other Types of Graphs 2 Chapter 2 Objectives 1. Organize
More informationSection 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc
Section 2-2 Frequency Distributions Copyright 2010, 2007, 2004 Pearson Education, Inc. 2.1-1 Frequency Distribution Frequency Distribution (or Frequency Table) It shows how a data set is partitioned among
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationRaw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.
Section 2.1 - Introduction Graphs are commonly used to organize, summarize, and analyze collections of data. Using a graph to visually present a data set makes it easy to comprehend and to describe the
More informationDAY 52 BOX-AND-WHISKER
DAY 52 BOX-AND-WHISKER VOCABULARY The Median is the middle number of a set of data when the numbers are arranged in numerical order. The Range of a set of data is the difference between the highest and
More informationThe main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?
Chapter 4 Analyzing Skewed Quantitative Data Introduction: In chapter 3, we focused on analyzing bell shaped (normal) data, but many data sets are not bell shaped. How do we analyze quantitative data when
More informationChapter 5snow year.notebook March 15, 2018
Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data
More informationStat 428 Autumn 2006 Homework 2 Solutions
Section 6.3 (5, 8) 6.3.5 Here is the Minitab output for the service time data set. Descriptive Statistics: Service Times Service Times 0 69.35 1.24 67.88 17.59 28.00 61.00 66.00 Variable Q3 Maximum Service
More informationLecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #
Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms
More informationExploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018
Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 1, 218 Vignette Outline This vignette provides examples of conducting exploratory data analysis (EDA) on NAEP
More informationProbability and Statistics. Copyright Cengage Learning. All rights reserved.
Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.6 Descriptive Statistics (Graphical) Copyright Cengage Learning. All rights reserved. Objectives Data in Categories Histograms
More informationSection 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics
Section. Displaying Quantitative Data with Graphs Mrs. Daniel AP Statistics Section. Displaying Quantitative Data with Graphs After this section, you should be able to CONSTRUCT and INTERPRET dotplots,
More information2.3 Organizing Quantitative Data
2.3 Organizing Quantitative Data This section will focus on ways to organize quantitative data into tables, charts, and graphs. Quantitative data is organized by dividing the observations into classes
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationThis chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.
CHAPTER 2 Frequency Distributions and Graphs Objectives Organize data using frequency distributions. Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives.
More informationTHE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann
Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG
More information1.3 Graphical Summaries of Data
Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationChapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd
Chapter 3: Data Description - Part 3 Read: Sections 1 through 5 pp 92-149 Work the following text examples: Section 3.2, 3-1 through 3-17 Section 3.3, 3-22 through 3.28, 3-42 through 3.82 Section 3.4,
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop Python Support for Time The datetime package - Has date, time, and datetime classes -.now() method: the current datetime
More informationStatistical Tables and Graphs
Unit 5C Statistical Tables and Graphs Ms. Young Slide 5-1 Frequency Tables A basic frequency table has two columns: The first column lists the categories of data. The second column lists the frequency
More informationBasic and Intermediate Math Vocabulary Spring 2017 Semester
Digit A symbol for a number (1-9) Whole Number A number without fractions or decimals. Place Value The value of a digit that depends on the position in the number. Even number A natural number that is
More informationCh3 E lement Elemen ar tary Descriptive Descriptiv Statistics
Ch3 Elementary Descriptive Statistics Section 3.1: Elementary Graphical Treatment of Data Before doing ANYTHING with data: Understand the question. An approximate answer to the exact question is always
More informationSTP 226 ELEMENTARY STATISTICS NOTES
ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 2 ORGANIZING DATA Descriptive Statistics - include methods for organizing and summarizing information clearly and effectively. - classify
More informationMTH 3210: PROBABILITY AND STATISTICS DESCRIPTIVE STATISTICS WORKSHEET
MTH 3210: PROBABILITY AND STATISTICS DESCRIPTIVE STATISTICS WORKSHEET Before you work on the practice problems (Section 3) please make sure that you read the supplementary notes (Section 1) and work through
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationVocabulary: Data Distributions
Vocabulary: Data Distributions Concept Two Types of Data. I. Categorical data: is data that has been collected and recorded about some non-numerical attribute. For example: color is an attribute or variable
More information8 Organizing and Displaying
CHAPTER 8 Organizing and Displaying Data for Comparison Chapter Outline 8.1 BASIC GRAPH TYPES 8.2 DOUBLE LINE GRAPHS 8.3 TWO-SIDED STEM-AND-LEAF PLOTS 8.4 DOUBLE BAR GRAPHS 8.5 DOUBLE BOX-AND-WHISKER PLOTS
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationLecture Notes 3: Data summarization
Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &
More informationIntroduction to Geospatial Analysis
Introduction to Geospatial Analysis Introduction to Geospatial Analysis 1 Descriptive Statistics Descriptive statistics. 2 What and Why? Descriptive Statistics Quantitative description of data Why? Allow
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop What is Exploratory Data Analysis? "Detective work" to summarize and explore datasets Includes: - Data acquisition and input
More informationVocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.
5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table
More informationDescriptive Statistics
Chapter 2 Descriptive Statistics 2.1 Descriptive Statistics 1 2.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Display data graphically and interpret graphs:
More informationChapter 2 - Graphical Summaries of Data
Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense
More informationData 100 Lecture 5: Data Cleaning & Exploratory Data Analysis
OrderNum ProdID Name OrderId Cust Name Date 1 42 Gum 1 Joe 8/21/2017 2 999 NullFood 2 Arthur 8/14/2017 2 42 Towel 2 Arthur 8/14/2017 1/31/18 Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationQuestion. Dinner at the Urquhart House. Data, Statistics, and Spreadsheets. Data. Types of Data. Statistics and Data
Question What are data and what do they mean to a scientist? Dinner at the Urquhart House Brought to you by the Briggs Multiracial Alliance Sunday night All food provided (probably Chinese) Contact Mimi
More informationData 100. Lecture 5: Data Cleaning & Exploratory Data Analysis
Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis Slides by: Joseph E. Gonzalez, Deb Nolan, & Joe Hellerstein jegonzal@berkeley.edu deborah_nolan@berkeley.edu hellerstein@berkeley.edu? Last
More information3. Data Analysis and Statistics
3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual
More informationEvgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages:
Today Problems with visualizing high dimensional data Problem Overview Direct Visualization Approaches High dimensionality Visual cluttering Clarity of representation Visualization is time consuming Dimensional
More informationNCSS Statistical Software
Chapter 152 Introduction When analyzing data, you often need to study the characteristics of a single group of numbers, observations, or measurements. You might want to know the center and the spread about
More informationChapter 3 Analyzing Normal Quantitative Data
Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationOrganizing and Summarizing Data
1 Organizing and Summarizing Data Key Definitions Frequency Distribution: This lists each category of data and how often they occur. : The percent of observations within the one of the categories. This
More information