Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31

Similar documents
Today s Topics. Percentile ranks and percentiles. Standardized scores. Using standardized scores to estimate percentiles

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

3. Data Analysis and Statistics

Chapter 6: DESCRIPTIVE STATISTICS

appstats6.notebook September 27, 2016

CHAPTER 3: Data Description

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?

Chapter 5snow year.notebook March 15, 2018

Chapter 3 Analyzing Normal Quantitative Data

Name Date Types of Graphs and Creating Graphs Notes

How individual data points are positioned within a data set.

Chapter 2 Modeling Distributions of Data

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

CHAPTER 2 Modeling Distributions of Data

Normal Data ID1050 Quantitative & Qualitative Reasoning

Measures of Central Tendency

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N

Univariate Statistics Summary

Averages and Variation

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

Central Limit Theorem Sample Means

Chapter 2: Modeling Distributions of Data

Unit 5: Estimating with Confidence

6-1 THE STANDARD NORMAL DISTRIBUTION

3 Graphical Displays of Data

Data organization. So what kind of data did we collect?

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Section 2.2 Normal Distributions. Normal Distributions

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly.

Classification of Surfaces

Lecture Notes 3: Data summarization

MAT 110 WORKSHOP. Updated Fall 2018

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

1. The Normal Distribution, continued

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Normal Curves and Sampling Distributions

Chapters 5-6: Statistical Inference Methods

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Chapter 2: The Normal Distribution

CHAPTER 2 DESCRIPTIVE STATISTICS

Chapter 5: The standard deviation as a ruler and the normal model p131

Distributions of Continuous Data

Data Collection and Analysis. ENGR 1181 Class 7

CHAPTER 2: SAMPLING AND DATA

8 2 Properties of a normal distribution.notebook Properties of the Normal Distribution Pages

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

Chapter 2: The Normal Distributions

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

September 11, Unit 2 Day 1 Notes Measures of Central Tendency.notebook

Frequency Distributions

Goals. The Normal Probability Distribution. A distribution. A Discrete Probability Distribution. Results of Tossing Two Dice. Probabilities involve

AP Statistics Summer Assignment:

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

3 Graphical Displays of Data

Parents Names Mom Cell/Work # Dad Cell/Work # Parent List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

The Normal Probability Distribution. Goals. A distribution 2/27/16. Chapter 7 Dr. Richard Jerz

Downloaded from

The standard deviation 1 n

Section 6.3: Measures of Position

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

No. of blue jelly beans No. of bags

AP Statistics. Study Guide

Descriptive Statistics, Standard Deviation and Standard Error

MAT 102 Introduction to Statistics Chapter 6. Chapter 6 Continuous Probability Distributions and the Normal Distribution

Math 155. Measures of Central Tendency Section 3.1

Grades 7 & 8, Math Circles 31 October/1/2 November, Graph Theory

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

4.3 The Normal Distribution

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Chapter 6. The Normal Distribution. McGraw-Hill, Bluman, 7 th ed., Chapter 6 1

Data Analysis & Probability

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

Probability Distributions

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

Basic Statistical Terms and Definitions

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

Statistics can best be defined as a collection and analysis of numerical information.

MEASURES OF CENTRAL TENDENCY

Name: Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations

Vocabulary: Data Distributions

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Unit #4: Statistics Lesson #1: Mean, Median, Mode, & Standard Deviation

Using Large Data Sets Workbook Version A (MEI)

Chapter 6. THE NORMAL DISTRIBUTION

Case study for robust design and tolerance analysis

Measures of Dispersion

Recitation Handout 10: Experiments in Calculus-Based Kinetics

+ Statistical Methods in

STA 570 Spring Lecture 5 Tuesday, Feb 1

Transcription:

Statistics: Interpreting Data and Making Predictions Visual Displays of Data 1/31

Last Time Last time we discussed central tendency; that is, notions of the middle of data. More specifically we discussed the mean (= average) and median. Depending on the situation, one of these may be more indicative of the middle of the data than the others. We will see today that the talking about the middle of the data shows only part of the story. On Monday we discussed some weather data for San Francisco. Let s look at weather data for both San Francisco and Las Cruces. We will also calculate the mean and median for both data sets. Visual Displays of Data 2/31

Visual Displays of Data 3/31

Trying to see how temperatures in the two cities compare is really hard with the table of data above. We ll see that we can get some sense of the difference with histograms. One interesting point of this data is the calculation of the mean and median: San Francisco Las Cruces Mean 58.5 58.9 Median 58.0 58.5 The mean and median are virtually identical for the two cities. We will now draw histograms in a similar way as we did earlier. Visual Displays of Data 4/31

Clicker Question What both stands out to you about these and could be significant? A One uses red and one uses blue B The Las Cruces data is more spread out C The biggest bars for SF are larger than those for LC D A and B E B and C Visual Displays of Data 5/31

Answer The most correct answer is E. It is true, and significant, that the Las Cruces data is more spread out. That means there is more variation of high temperatures for Las Cruces than San Francisco. This is due, in part, to the ocean s moderation of the San Francisco weather versus Las Cruces being in the desert. It is true that the highest bars for San Francisco are higher than those for Las Cruces. This is actually a consequence of the previous observation. We ll see other graphs that share this property. Visual Displays of Data 6/31

While the mean and median of the two sets of temperature data are nearly the same, the graphical representation makes the data look much different. The data for Las Cruces is spread out much more than that of San Francisco. A calculation of the middle of the data only presents part of the story. The dispersion or deviation of the data is also an important part of the data. While there are several measures of deviation, the most common one is called standard deviation. Visual Displays of Data 7/31

Standard Deviation The most basic property of standard deviation is: The larger the standard deviation, the more spread out the data. That is, the larger the deviation, the more the data is away from the middle. We won t formally define standard deviation or give a formula for how to calculate it. Spreadsheets have built in functions to calculate standard deviation. Visual Displays of Data 8/31

The point of measuring deviation is to give a sense of how far data is from the middle, or the average. Standard deviation approximately measures the average of how far data is from the middle. For example, the standard deviations of the San Francisco and Las Cruces weather data are 4.0 and 8.0, respectively. The standard deviation for Las Cruces is much larger than for San Francisco, reflecting that the Las Cruces data is more spread out. Visual Displays of Data 9/31

Some Coin Flipping Data Let s look at the experiment of flipping a coin repeatedly. We will simulate this with Excel and the computer program Maple. In the experiment we simulate a bunch of people flipping 100 coins and determining the percentage of flips which came up heads. We ll first look at an Excel spreadsheet, Coin Flip Distribution.xlsx. Visual Displays of Data 10/31

Excel isn t the best way to simulate a large amount of data. The following charts were created with the program Maple. We used this program to demonstrate RSA encryption in the beginning of the semester. Visual Displays of Data 11/31

Simulation of flipping 100 coins 1,000 times Visual Displays of Data 12/31

Simulation of flipping 100 coins 10,000 times Visual Displays of Data 13/31

Simulation of flipping 100 coins 100,000 times Visual Displays of Data 14/31

Simulation of flipping 100 coins 500,000 times Visual Displays of Data 15/31

Simulation of flipping 100 coins 1,000,000 times Visual Displays of Data 16/31

As the number of flips gets larger and larger, the graph looks more and more regular. Note that the graphs are roughly symmetric, and that the middle of each graph is at 50, the expected percentage of heads. Q Does the shape of the graphs, especially the latter ones, look at all familiar? A Yes B No Visual Displays of Data 17/31

The Bell Curve (or Normal Curve) Visual Displays of Data 18/31

The importance of the bell curve is that as the number of trials gets larger and larger, histograms generally look more and more like a bell curve. The particular shape of the bell curve reflects the mean and the standard deviation. The center of the curve represents the mean. How wide or thin is the curve is an indication of the standard deviation. The larger the standard deviation the wider is the curve. Visual Displays of Data 19/31

Bell Curves with Different Standard Deviations Standard Deviation = 5 Standard Deviation = 2 For both of these graphs the mean is 50. Visual Displays of Data 20/31

The blue and red graph have mean 0 and the green graph has mean 2. The blue graph has the smallest standard deviation, followed by the green graph, and finally by the red graph, which has the largest standard deviation. Visual Displays of Data 21/31

Standard Deviation and Number of Coins Flipped Let s go back to the coin flipping experiment. How do things change if each person changes the number of flips? The following graphs represent a simulation of 50,000 people flipping a coin repeatedly. Each person determines the percentages of heads on their flips. The first graph represents each person flipping a coin 10 times and recording the percentage of heads. The second graph represents each person flipping 100 times. Visual Displays of Data 22/31

Clicker Question Which graph represents the most variability in the data? A The graph on the left B The graph on the right 10 flips per person 100 flips per person Visual Displays of Data 23/31

Answer The first graph has more variability. While nearly all the data of the second graph is between 40 and 60, a lot of the data in the first graph is outside that range. Visual Displays of Data 24/31

In terms of the normal curve, standard deviation can be interpreted approximately with the following rules of thumb: 68% of all data is within 1 standard deviation of the mean. 95% of all data is within 2 standard deviations of the mean. 99.7% of all data is within 3 standard deviations of the mean. Visual Displays of Data 25/31

The letter σ is an abbreviation for the standard deviation, and µ for the mean. Finding the area under a curve was one of the problems that led to the development of Calculus in the 17th century. Visual Displays of Data 26/31

How to Tell if a Coin is Unfair? We ll use the normal distribution to get some sense on how to tell if a coin is fair. Suppose you flip a coin and get at least 60% heads. Can you conclude the coin is unfair? Visual Displays of Data 27/31

Clicker Question Q Suppose you flip a coin 10 times and get 6 heads. Do you think this is good evidence to conclude the coin is unfair? A Yes B No A No it really isn t. If you flip a coin 10 times, you are pretty likely to get 6 or more heads quite often. In fact, the probability of getting 6 or more heads is a little over 30%, which we can estimate with our spreadsheet. Visual Displays of Data 28/31

Let s suppose we flip a coin 100 times and get at least 60% heads. We can ask what is the probability that that happens. Let s imagine we do this many many times. As our simulations indicate, we can consider the distribution of trials giving us a bell curve. Based on the data from Coin Flip Distribution.xlsx, the mean for this curve is 50% and the standard deviation is 5%. Then 60% is two standard deviations to the right of the mean. The amount of data to the right of 60% is then approximately 2.25% of the data. So, there is only about a 2% chance that flipping a coin 100 times results in at least 60% heads, but that is not so small. Unless you had a reason to think the coin might be unfair, it would probably be hard to argue from this data that the coin is unfair, even though it is not too likely to get at least 60% heads. Visual Displays of Data 29/31

To think a little further about it, suppose a class of 100 students each flipped a coin 100 times. Even if everybody had a fair coin, we d expect, on average, 2 of the 100 students to get at least 60% heads. Therefore, while getting this many heads isn t too likely for any one person, with enough people it will happen. Visual Displays of Data 30/31

Next Week and Homework # 8 Next week s topic is making indirect measurements. For example, how do people measure the heights of mountains? Even more difficult is how we determine the distance to the sun and to the moon. We ll discuss methods of estimating these kinds of distances that were discovered a very long time ago. Homework #8 is on the class website. It is due a week from today. Visual Displays of Data 31/31