Exploratory Data Analysis

Size: px
Start display at page:

Download "Exploratory Data Analysis"

Transcription

1 Chapter 10 Exploratory Data Analysis Definition of Exploratory Data Analysis (page 410) Definition Exploratory data analysis (EDA) is a subfield of applied statistics that is concerned with the investigation of the collected or transformed data to reveal patterns, peculiarities and relationships using visual displays, resistant statistics and a thorough examination of the residuals. EDA is a preliminary step in data analysis. It can be used to determine if the planned method for analysis is appropriate for the collected data. Four major themes that describe the methods used in EDA: revelation, resistance, reexpression, and residuals. 1

2 REVELATION (page ) EDA reveals the essential features of the dataset usually via simple graphical displays. (Example: stem-and-leaf display and the boxplot) These graphs can give us a general idea about the distribution such as its center and other quantiles, spread, symmetry, and kurtosis. Graphs can help detect sources of problems in analysis such as the presence of outliers and multimodality. Graphs can also help reveal patterns and possible relationships among the different variables in the study. RESISTANCE (page ) Definition 12.2 A statistic is said to be resistant if its value is not adversely affected (i) when we replace some of the values in a dataset with totally different values; or, (ii) when there are minor changes in all of the data values possibly due to rounding. * The mean and the variance are not resistant statistics that is why they are seldom used in EDA. 2

3 Example 12.1 (page 412) ORIGINAL DATASET: Mean and Median = 74. Obs No. Obs No Obs No. (i) Xi (i) Xi (i) Xi Let us examine the effect on the sample mean and median if we change one value in the dataset by an outlying value such as 1,000. Obs No. Obs No. Obs No. Modified Modified Modified (i) Mean Md (i) Mean Md (i) Mean Md Definition of Stem-and-Leaf Display (page 416) Definition The stem-and-leaf display (SALD) is a histogram-like display of the data where the digits of the data values replace the bars in representing the frequencies. Example: Stem Leaf (unit = 0.1) Note: We can retrieve the data value from the display by joining the digits in the stem and the leaf together then multiplying the number by the specified unit. For example, the smallest data value in the SALD above is 22 x 0.1=2.2. In the third row, the observations are 4.5, 4.6 and

4 Steps in Constructing the SALD (page 416) Step 1. Choose the common division point of each observation where we will split each data value into its stem and leaf components. Example 12.3: Smallest value is Largest value is Choices: Example Location (for Abra: 235.9) Values of Stem between ones and tenths place to 1394 between tens and ones place to 139 between hundreds and tens to 13 between thousands and hundreds to 1 Steps in Constructing the SALD (pages ) Step 2: Step 3: In a vertical column, list the smallest stem value up to the largest stem value, using increments of 1 unit. Draw a vertical line to the right of the stem value. Example: Stem

5 Steps in Constructing the SALD (page 417) Step 4. Step 5. Record the leaf portion of the first observation in the row corresponding to its stem value. Do the same for all of the observations. Sort the leaves within each stem row from lowest to highest. Maintain uniform spacing in between the leaves for each one of the rows. By doing so, the stem with the most number of leaves (observations) will have the longest line; that is, it will appear to have the longest bar Steps in Constructing the SALD (page 417) Step 6. Indicate the unit of the leaves to allow the recreation of the actual data values from the display. For example, Unit = represents = 35.6 Unit = represents = 356 Unit = represents = 3,560 (Unit = 0.1 million pesos)

6 Split Stem-and-Leaf Display (page 421) If there are too many leaves in some of the rows is too large, we may split each stem into two groups. In the first group, we include all leaves with leading digits from 0 4. In the second group, we include all leaves with leading digits from 5 9. We mark the stem of the first group with * and we mark the stem of the second group with.. If the number of leaves is still too large, we can divide each stem into five groups. We mark the stem of the first group with * and include all leaves with leading digits from 0-1. The second group is marked t and includes leaves with leading digits from 2-3, the third group is marked f and includes leaves with leading digits from 4-5, the fourth group is marked s and includes leaves with leading digits from 6-7. The last group is marked. and includes leaves with leading digits from 8 9. Example Below are the starting salaries of a sample of 100 computer science majors who earned their baccalaureate degrees during a recent year: Starting Salaries (P000) Values range from 18.5 to

7 Example of Split SALD Stem Leaf (unit = 0.1 thousand pesos) t f s t f s Note: If there are outlying values then these values can be reported inside the parentheses on a special row in the first row (if value is extremely low) labelled as low or on the last row (if value is extremely large) labelled as hi. For example, if the starting salaries of two graduates are as large as and then we will add the following row at the bottom of the SALD, hi (120.3, 150.4) Definition of Depth (page 419) Definition 12.5 If we determine the two ranks of a data value by recording its position from each end of the array, then its depth is the smaller between these two ranks. 7

8 Example 12.4 (page 419) The array and the corresponding ranks and depths of each observation in the array are as follows: Array Rank A (from lowest to highest) Rank B (from highest to lowest) Depth (the smaller between Rank A and Rank B) Q 1=10 Md=18 Q 3=25 We will observe that the depths of the 1 st and 3 rd quartiles are both equal to (n+1)/4=(11+1)/4=3; and, the depth of the median is (n+1)/2=6. Five-Number Summary (pages ) Definition 12.6 A letter value is a statistic whose value depends on its defined depth, which we tag using a particular letter. The median is a letter value whose depth is (n+1)/2 and its tag is M. Definition 12.7 The extremes are the two data values in the array with depths equal to 1. Definition 12.8 The fourths or the hinges are the two data values in the array with the following depth: ( depth of median) 1 when n is odd depth of fourth 2 ( depth of median) 0.5 when n is even 2 We use the letter F as our tag for the fourth. Definition 12.9 The five-number summary is a collection of letter values consisting of the median, the fourths, and the extremes 8

9 Note on the Fourth (page 426) The fourth can be viewed as the two observations that are halfway between the median and the corresponding extremes. The depth of the fourth is either a whole number or has a remainder of ½ since the numerator is always a whole number and the Position: denominator 1 1 is Interpolation is needed only when the depth has a remainder of ½. In this case, Lower just Fourth get the midpoint Median of the Upper two Fourth Upper values Fourth adjacent to the fourth. Example: n=6, depth of fourth = (depth of median + 0.5)/2 = ((6+1)/ )/2 = 2 The fourths are the 2 nd and the 2 nd to the last ordered statistics. n=7, depth of fourth = (depth of median + 1)/2 = ((7+1)/2 + 1)/2 = 2.5 The fourths are interpolated values. On each end of the array, it is computed as the average of the two observations with depths equal to 2 and 3. Definition of Box-and-Whisker Plot (page 430) Definition The box-and-whisker plot, or boxplot, is a simple graphical display of the data used to display the 5- letter summary. Note: The boxplot displays the following features of the data: (i) location, (ii) spread, (iii) symmetry, (iv) extremes, and (v) outliers. 9

10 Steps in Constructing the Boxplot (pages ) Step 1: Construct a rectangle with one end at the lower fourth (F L) and the other end at the upper fourth (F U) Step 2: Put a line across the interior of the rectangle at the median Depth of median = (15 +1)/2 = 8 Med=22 Depth of fourth = (depth of median + 1)/2 = 9/2 =4.5 F L = (15+18)/2=16.5 F U = (24+23)/2 = Steps in Constructing the Boxplot (cont d) Step 3: Compute for the fourth-spread (d F ), lower fence and upper fence as follows: d F = F U F L lower fence = F L 1.5 d F upper fence = F U d F The lower and upper fences are outlier cutoffs. We will consider all data points smaller than the lower fence or larger than the upper fence as outliers Example: d F = = 7 Lower fence = 16.5 (1.5)(7) = 6 Upper fence = (1.5)(7) = 34 10

11 Steps in Constructing the Boxplot (cont d) Step 4: Excluding outliers, identify the two data values that are closest to the lower fence and upper fence, respectively. Draw a line, starting from these values up to each side of the rectangle. We sometimes refer to these lines as the whiskers. Step 5: Plot each outlier at its corresponding value, using an x-mark or any other distinctive mark. We consider an outlying observation that is less than F L 3d F or greater than F U +3d F as an extreme outlier. We sometimes distinguish extreme outliers from other outliers by placing a circle at their actual location, instead of an x Lower fence= 6 F L =16.5 Med=22 F U = 23.5 Upper fence = 34 Outlier : 1 Closest data point to lower fence that is not an outlier: 10 Closest data point to upper fence that is not an outlier: 28 x Chapter Introduction 25 30to EDA Remarks (page 431) The height of the rectangle is usually arbitrary and has no specific meaning. If several boxplots appear together, however, the height is sometimes made proportional to the different sample sizes. This is rarely done, however, because an accurate representation is very difficult to achieve. The different statistical software present varying versions of the boxplot. For example, instead of plotting the sides of the rectangles at the lower fourth and upper fourth, these are plotted to related summary measures, the 1st and 3rd quartiles respectively and the fences are computed as follows: Lower fence = Q IQR Upper fence = Q IQR 11

12 Interpreting the Boxplot (page 433) 1. The line inside the rectangle shows the location of the median, our measure of central tendency. 2. The sides of the rectangle, which are plotted either at the fourths or the quartiles, indicate where the middle 50% of the observations lie. 3. The length of the rectangle represents the magnitude of either the fourth-spread or the inter-quartile range, our measure of dispersion. 4. The relative position of the line inside the rectangle to its sides gives us an idea on the degree and direction of symmetry because this shows the respective distances of the median to the lower and upper fourths. A line that is in the middle of the rectangle indicates that the distribution is symmetric; while a line that is closer to the lower fourth (or 1 st quartile) indicates that the distribution is skewed to right, and, a line that is closer to the upper fourth (or 3 rd quartile) indicates that the distribution is skewed to the left). 5. If there are no outliers then the ends of the whiskers indicate the respective values of both extremes; but, if there are outliers then the farthest outlier is our extreme. 6. The outliers are clearly identified by the distinctive marks used to plot them. Interpreting the Boxplot Symmetric distribution Negatively-skewed distribution Positively-skewed distribution 12

13 Comparing Distributions using the Boxplot Total Financial Resources Generated by Major Geographic Region million pesos Luzon Visayas Mindanao Assignment Use the data in page 434, Exercise 3, on the illiteracy rate among the male and female populations, 15 years of age and over, in Asia in Construct a split stem-and-leaf display of the illiteracy rate among the male population. Let the common division point be in between the tens and ones digit. Split each stem into two lines. 2. Compute for the median, lower fourth, upper fourth, fourth spread, lower fence, and upper fence of the illiteracy rate of: i. Male population ii. Female population 3. Use the values computed in no. 2 to draw the boxplot of the illiteracy rate among the male population. On the same plotting area, draw the boxplot of the illiteracy rate among the female population. Chapter 12. Exploratory Data Analysis 13

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years. 3: Summary Statistics Notation Consider these 10 ages (in years): 1 4 5 11 30 50 8 7 4 5 The symbol n represents the sample size (n = 10). The capital letter X denotes the variable. x i represents the

More information

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd Chapter 3: Data Description - Part 3 Read: Sections 1 through 5 pp 92-149 Work the following text examples: Section 3.2, 3-1 through 3-17 Section 3.3, 3-22 through 3.28, 3-42 through 3.82 Section 3.4,

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS To Describe Data, consider: Symmetry Skewness TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS Unimodal or bimodal or uniform Extreme values Range of Values and mid-range Most frequently occurring values In

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

WHOLE NUMBER AND DECIMAL OPERATIONS

WHOLE NUMBER AND DECIMAL OPERATIONS WHOLE NUMBER AND DECIMAL OPERATIONS Whole Number Place Value : 5,854,902 = Ten thousands thousands millions Hundred thousands Ten thousands Adding & Subtracting Decimals : Line up the decimals vertically.

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

DAY 52 BOX-AND-WHISKER

DAY 52 BOX-AND-WHISKER DAY 52 BOX-AND-WHISKER VOCABULARY The Median is the middle number of a set of data when the numbers are arranged in numerical order. The Range of a set of data is the difference between the highest and

More information

1.3 Graphical Summaries of Data

1.3 Graphical Summaries of Data Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this

More information

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40 Chpt 3 Data Description 3-2 Measures of Central Tendency 1 /40 Chpt 3 Homework 3-2 Read pages 96-109 p109 Applying the Concepts p110 1, 8, 11, 15, 27, 33 2 /40 Chpt 3 3.2 Objectives l Summarize data using

More information

Section 6.3: Measures of Position

Section 6.3: Measures of Position Section 6.3: Measures of Position Measures of position are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different

More information

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data MATH& 146 Lesson 10 Section 1.6 Graphing Numerical Data 1 Graphs of Numerical Data One major reason for constructing a graph of numerical data is to display its distribution, or the pattern of variability

More information

UNIT 1A EXPLORING UNIVARIATE DATA

UNIT 1A EXPLORING UNIVARIATE DATA A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics

More information

Measures of Central Tendency

Measures of Central Tendency Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which

More information

Understanding and Comparing Distributions. Chapter 4

Understanding and Comparing Distributions. Chapter 4 Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables

More information

15 Wyner Statistics Fall 2013

15 Wyner Statistics Fall 2013 15 Wyner Statistics Fall 2013 CHAPTER THREE: CENTRAL TENDENCY AND VARIATION Summary, Terms, and Objectives The two most important aspects of a numerical data set are its central tendencies and its variation.

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use

More information

Boxplots. Lecture 17 Section Robb T. Koether. Hampden-Sydney College. Wed, Feb 10, 2010

Boxplots. Lecture 17 Section Robb T. Koether. Hampden-Sydney College. Wed, Feb 10, 2010 Boxplots Lecture 17 Section 5.3.3 Robb T. Koether Hampden-Sydney College Wed, Feb 10, 2010 Robb T. Koether (Hampden-Sydney College) Boxplots Wed, Feb 10, 2010 1 / 34 Outline 1 Boxplots TI-83 Boxplots 2

More information

Lecture Notes 3: Data summarization

Lecture Notes 3: Data summarization Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use? Chapter 4 Analyzing Skewed Quantitative Data Introduction: In chapter 3, we focused on analyzing bell shaped (normal) data, but many data sets are not bell shaped. How do we analyze quantitative data when

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Probability and Statistics. Copyright Cengage Learning. All rights reserved. Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.5 Descriptive Statistics (Numerical) Copyright Cengage Learning. All rights reserved. Objectives Measures of Central Tendency:

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Learning Log Title: CHAPTER 7: PROPORTIONS AND PERCENTS. Date: Lesson: Chapter 7: Proportions and Percents

Learning Log Title: CHAPTER 7: PROPORTIONS AND PERCENTS. Date: Lesson: Chapter 7: Proportions and Percents Chapter 7: Proportions and Percents CHAPTER 7: PROPORTIONS AND PERCENTS Date: Lesson: Learning Log Title: Date: Lesson: Learning Log Title: Chapter 7: Proportions and Percents Date: Lesson: Learning Log

More information

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set. Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean the sum of all data values divided by the number of values in

More information

Chapter 3: Describing, Exploring & Comparing Data

Chapter 3: Describing, Exploring & Comparing Data Chapter 3: Describing, Exploring & Comparing Data Section Title Notes Pages 1 Overview 1 2 Measures of Center 2 5 3 Measures of Variation 6 12 4 Measures of Relative Standing & Boxplots 13 16 3.1 Overview

More information

IT 403 Practice Problems (1-2) Answers

IT 403 Practice Problems (1-2) Answers IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

Measures of Central Tendency

Measures of Central Tendency Measures of Central Tendency MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2017 Introduction Measures of central tendency are designed to provide one number which

More information

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes. M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes. Population: Census: Biased: Sample: The entire group of objects or individuals considered

More information

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques. Section 2.1 - Introduction Graphs are commonly used to organize, summarize, and analyze collections of data. Using a graph to visually present a data set makes it easy to comprehend and to describe the

More information

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one. Probability and Statistics Chapter 2 Notes I Section 2-1 A Steps to Constructing Frequency Distributions 1 Determine number of (may be given to you) a Should be between and classes 2 Find the Range a The

More information

LESSON 3: CENTRAL TENDENCY

LESSON 3: CENTRAL TENDENCY LESSON 3: CENTRAL TENDENCY Outline Arithmetic mean, median and mode Ungrouped data Grouped data Percentiles, fractiles, and quartiles Ungrouped data Grouped data 1 MEAN Mean is defined as follows: Sum

More information

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram

More information

Middle Years Data Analysis Display Methods

Middle Years Data Analysis Display Methods Middle Years Data Analysis Display Methods Double Bar Graph A double bar graph is an extension of a single bar graph. Any bar graph involves categories and counts of the number of people or things (frequency)

More information

Chapter 5: The beast of bias

Chapter 5: The beast of bias Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

+ Statistical Methods in

+ Statistical Methods in 9/4/013 Statistical Methods in Practice STA/MTH 379 Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics Sam Houston State University Discovering Statistics

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

AP Statistics Summer Assignment:

AP Statistics Summer Assignment: AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this

More information

1.2. Pictorial and Tabular Methods in Descriptive Statistics

1.2. Pictorial and Tabular Methods in Descriptive Statistics 1.2. Pictorial and Tabular Methods in Descriptive Statistics Section Objectives. 1. Stem-and-Leaf displays. 2. Dotplots. 3. Histogram. Types of histogram shapes. Common notation. Sample size n : the number

More information

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots Math 167 Pre-Statistics Chapter 4 Summarizing Data Numerically Section 3 Boxplots Objectives 1. Find quartiles of some data. 2. Find the interquartile range of some data. 3. Construct a boxplot to describe

More information

No. of blue jelly beans No. of bags

No. of blue jelly beans No. of bags Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in

More information

Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night

Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night 2 nd Year Maths Revision Worksheet: Algebra I Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night 1. I know how to add and subtract positive and negative numbers. 2. I know how to

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

Numerical Summaries of Data Section 14.3

Numerical Summaries of Data Section 14.3 MATH 11008: Numerical Summaries of Data Section 14.3 MEAN mean: The mean (or average) of a set of numbers is computed by determining the sum of all the numbers and dividing by the total number of observations.

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 Sections 2.3 and 2.4 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) 2 / 25 Descriptive statistics For continuous

More information

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree.

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree. Two-way Frequency Tables two way frequency table- a table that divides responses into categories. Joint relative frequency- the number of times a specific response is given divided by the sample. Marginal

More information

3.2-Measures of Center

3.2-Measures of Center 3.2-Measures of Center Characteristics of Center: Measures of center, including mean, median, and mode are tools for analyzing data which reflect the value at the center or middle of a set of data. We

More information

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics Section. Displaying Quantitative Data with Graphs Mrs. Daniel AP Statistics Section. Displaying Quantitative Data with Graphs After this section, you should be able to CONSTRUCT and INTERPRET dotplots,

More information

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys Unit 7 Statistics AFM Mrs. Valentine 7.1 Samples and Surveys v Obj.: I will understand the different methods of sampling and studying data. I will be able to determine the type used in an example, and

More information

Week 2: Frequency distributions

Week 2: Frequency distributions Types of data Health Sciences M.Sc. Programme Applied Biostatistics Week 2: distributions Data can be summarised to help to reveal information they contain. We do this by calculating numbers from the data

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response

Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: American River College Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

AP Statistics Prerequisite Packet

AP Statistics Prerequisite Packet Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these

More information

Measures of Position

Measures of Position Measures of Position In this section, we will learn to use fractiles. Fractiles are numbers that partition, or divide, an ordered data set into equal parts (each part has the same number of data entries).

More information

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs DATA PRESENTATION At the end of the chapter, you will learn to: Present data in textual form Construct different types of table and graphs Identify the characteristics of a good table and graph Identify

More information

2.1: Frequency Distributions and Their Graphs

2.1: Frequency Distributions and Their Graphs 2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each

More information

STP 226 ELEMENTARY STATISTICS NOTES

STP 226 ELEMENTARY STATISTICS NOTES ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 2 ORGANIZING DATA Descriptive Statistics - include methods for organizing and summarizing information clearly and effectively. - classify

More information

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies. Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.

More information

Lecture 6: Chapter 6 Summary

Lecture 6: Chapter 6 Summary 1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Descriptive Statistics

Descriptive Statistics Chapter 2 Descriptive Statistics 2.1 Descriptive Statistics 1 2.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Display data graphically and interpret graphs:

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

Pre-Calculus Multiple Choice Questions - Chapter S2

Pre-Calculus Multiple Choice Questions - Chapter S2 1 Which of the following is NOT part of a univariate EDA? a Shape b Center c Dispersion d Distribution Pre-Calculus Multiple Choice Questions - Chapter S2 2 Which of the following is NOT an acceptable

More information

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form. CHAPTER 2 Frequency Distributions and Graphs Objectives Organize data using frequency distributions. Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives.

More information

6th Grade Vocabulary Mathematics Unit 2

6th Grade Vocabulary Mathematics Unit 2 6 th GRADE UNIT 2 6th Grade Vocabulary Mathematics Unit 2 VOCABULARY area triangle right triangle equilateral triangle isosceles triangle scalene triangle quadrilaterals polygons irregular polygons rectangles

More information

Chapter Two: Descriptive Methods 1/50

Chapter Two: Descriptive Methods 1/50 Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained

More information

How individual data points are positioned within a data set.

How individual data points are positioned within a data set. Section 3.4 Measures of Position Percentiles How individual data points are positioned within a data set. P k is the value such that k% of a data set is less than or equal to P k. For example if we said

More information

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 KEY SKILLS: Organize a data set into a frequency distribution. Construct a histogram to summarize a data set. Compute the percentile for a particular

More information

Mean,Median, Mode Teacher Twins 2015

Mean,Median, Mode Teacher Twins 2015 Mean,Median, Mode Teacher Twins 2015 Warm Up How can you change the non-statistical question below to make it a statistical question? How many pets do you have? Possible answer: What is your favorite type

More information

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly.

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly. GRAPHING We have used statistics all our lives, what we intend to do now is formalize that knowledge. Statistics can best be defined as a collection and analysis of numerical information. Often times we

More information

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4

More information

3. Data Analysis and Statistics

3. Data Analysis and Statistics 3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual

More information

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information

More information

MATH 112 Section 7.2: Measuring Distribution, Center, and Spread

MATH 112 Section 7.2: Measuring Distribution, Center, and Spread MATH 112 Section 7.2: Measuring Distribution, Center, and Spread Prof. Jonathan Duncan Walla Walla College Fall Quarter, 2006 Outline 1 Measures of Center The Arithmetic Mean The Geometric Mean The Median

More information