CHAPTER 6. The Normal Probability Distribution

Similar documents
Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Distributions of Continuous Data

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Week 7: The normal distribution and sample means

BIOL Gradation of a histogram (a) into the normal curve (b)

Ch6: The Normal Distribution

Organizing and Summarizing Data

Chapter 3 Analyzing Normal Quantitative Data

Chapter 6 Normal Probability Distributions

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 6.2-1

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

23.2 Normal Distributions

Chapter 6. The Normal Distribution. McGraw-Hill, Bluman, 7 th ed., Chapter 6 1

Chapter 2 - Graphical Summaries of Data

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler

Frequency Distributions

Measures of Dispersion

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Applied Statistics for the Behavioral Sciences

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution

Applied Regression Modeling: A Business Approach

CHAPTER 2: SAMPLING AND DATA

6-1 THE STANDARD NORMAL DISTRIBUTION

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Central Limit Theorem Sample Means

The Normal Distribution & z-scores

Measures of Position

Female Brown Bear Weights

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Microscopic Measurement

Basic Statistical Terms and Definitions

Chapter 2 Describing, Exploring, and Comparing Data

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

IT 403 Practice Problems (1-2) Answers

MAT 110 WORKSHOP. Updated Fall 2018

Spreadsheet View and Basic Statistics Concepts

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2: The Normal Distributions

Excel 2010 with XLSTAT

courtesy 1

Pre-Lab Excel Problem

Continuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit.

Using Large Data Sets Workbook Version A (MEI)

Excel Functions & Tables

Page 1. Graphical and Numerical Statistics

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Fathom Dynamic Data TM Version 2 Specifications

How individual data points are positioned within a data set.

Activity: page 1/10 Introduction to Excel. Getting Started

Statistics with a Hemacytometer

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Frequency Distributions and Descriptive Statistics in SPS

= 3 + (5*4) + (1/2)*(4/2)^2.

Excel Functions & Tables

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Chapter 12: Quadratic and Cubic Graphs

0 Graphical Analysis Use of Excel

Normal Distribution. 6.4 Applications of Normal Distribution

Please consider the environment before printing this tutorial. Printing is usually a waste.

+ Statistical Methods in

Data organization. So what kind of data did we collect?

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Homework 1 Excel Basics

Section 6.3: Measures of Position

Pivot Tables, Lookup Tables and Scenarios

MATH NATION SECTION 9 H.M.H. RESOURCES

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Bar Charts and Frequency Distributions

Rockefeller College MPA Excel Workshop: Clinton Impeachment Data Example

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Unit 1, Lesson 1: Moving in the Plane

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Release notes for StatCrunch mid-march 2015 update

IQR = number. summary: largest. = 2. Upper half: Q3 =

Using Excel for Graphical Analysis of Data

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;

Distributions of random variables

Chapter 5: The standard deviation as a ruler and the normal model p131

4. TANGENTS AND NORMALS

appstats6.notebook September 27, 2016

8. MINITAB COMMANDS WEEK-BY-WEEK

SAT Released Test 8 Problem #28

Excel Functions & Tables

Chapter 6: DESCRIPTIVE STATISTICS

3/31/2016. Spreadsheets. Spreadsheets. Spreadsheets and Data Management. Unit 3. Can be used to automatically

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

The Normal Distribution & z-scores

How to Use a Statistical Package

Make sure to keep all graphs in same excel file as your measures.

9 POINTS TO A GOOD LINE GRAPH

Your Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread.

Ms Nurazrin Jupri. Frequency Distributions

For a walkthrough on how to install this ToolPak, please follow the link below.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Chapter 6. THE NORMAL DISTRIBUTION

Week 2: Frequency distributions

CHAPTER 2 DESCRIPTIVE STATISTICS

Transcription:

The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit theorem is probably the main reason that contributes to the importance of the normal distribution. It is essential for statistics students to learn how to use the normal probability distribution for solving applied problems. In this Chapter we are going to study the normal probability distribution using the appropriate functions in JMP. Also, we are going to perform simulations using a random function to generate a normally distributed random variable with a specified mean and standard deviation. We are going to perform a statistical experiment to demonstrate numerically the central limit theorem, and finally we are going to assess the normality of a given dataset. Class Exercises: Compute probabilities for the normal distribution Class example 1: According to the National Health Survey, heights of adult males are normally distributed with a mean of 69 and a standard deviation of 2.9. Compute the percentage of the population of adult males that falls between 64 and 76. First, let s open a new data table, Figure 6.1 then, right click at the heading of Column 1 Chapter 6 Page 1

Figure 6.2 click on the text box for Column Name and change the name to x, as follows: Figure 6.3 left click twice at the right side of the first column heading to open a new column Figure 6.4 you can save the file as Normal Dist (or anything you like), then right click over Column 2, select Column Info and change the name to P(x), then click over Column Properties and select formula, as shown below, Chapter 6 Page 2

Figure 6.5 a new window will open, then choose Probability from Functions (grouped) and select Normal Distribution as shown below Figure 6.6 then click twice over variable x, and click inside the parenthesis, then after the variable x, type,69,2.8 as shown below: Chapter 6 Page 3

Figure 6.7 click over Apply, you are going to see the following screen: Figure 6.8 then click over OK on this window and in the next window, next we want to compute the cumulative probability for x=64 and x = 76, let s input these numbers in the first column as shown below: Chapter 6 Page 4

Figure 6.9 the cumulative probabilities for these numbers are shown above. Thus, the probability that the height of one person is between 64 and 76 is (rounding to three digits): P(64<x<76) = 0.994-.037 = 0.957 Class Exercise 2: We can also perform probability computations using a simulation, for example, let s generate 10,000 random numbers from a normal distribution with a mean of 69 and standard deviation of 2.9, to do this, let s open a new data table as follows, Figure 6.10 then, right click at the heading of Column 1 and click over Column Info Figure 6.11 Chapter 6 Page 5

click on the text box for Column Name and change the name to x, as follows: Figure 6.12 then click over Column Properties and select Formula, Figure 6.13 next, click over Edit Formula and select Random, then select Random Normal Chapter 6 Page 6

Figure 6.14 click inside the parenthesis, and input the numbers, 69 and 2.8 separated by a comma as follows: Figure 6.15 click over OK on this window and in the next window, then right click over the first column (below the red arrow) and select Add Rows, as below Chapter 6 Page 7

Figure 6.16 type 10000 at the dialog box and click OK Figure 6.17 at this point, you are going to see a sequence of randomly generated numbers from a normal distribution, Figure 6.18 Chapter 6 Page 8

you can draw a histogram using the Analyze menu and choosing the Distribution option, (see Chapter 3 for more details, this procedure is not shown here). You can check the shape of the distribution and take a look at the summary statistics that will be approximately equal to the requested mean and standard deviation (this activity is highly recommended, please ask you lab instructor if you do not know how to do it). Next, you need to sort the numbers from lowest to highest, by selecting Tables and Sort, then choose the variable x and click over By, you will see the next window Figure 6.19 click over OK, and you are going to see the sequence of random numbers ordered from lowest to highest as follows: Figure 6.20 Chapter 6 Page 9

computing the simulated probabilities is just a matter of counting the number of observations that match the requirements for this problem. To compute the requested probabilities, you need to count the number of observations that are less than 64, you can do it by scanning the ordered dataset, and looking at the index number on the left side of the screen, Figure 6.21 we can see at the Figure above that there are 349 observations less than 64, then this probability is computed as follows: P(x<64) = 349/10,0000 = 0.0349 Which is close to the computed probability using the normal distribution formula (see Figure 6.9) of 0.0370, please do not forget that this is a numeric simulation and the results shown here are approximations to the true probabilities, but this result is close enough. Next, we need to find the probability that a man selected at random has a height less than 76, to do it we need to count the number of observations that are less than 76 as shown below: Chapter 6 Page 10

Figure 6.22 we found 9940 observations that are less than 76, thus the probability associated with that event is computed as follows =9,940/10,000 = 0.994, then the computation for the probability that one man selected at random is between 64 and 76 is as follows: P(64<x<76) = 0.994 0.035 = 0.959, which is very close to the probability computed using the formulas, as you can see here, the simulation provided acceptable results! Class Exercise: The Central Limit Theorem Please go to the website: http://onlinestatbook.com/rvls.html or search in your browser Rice virtual labs 1) Select Simulations and Demonstrations, and select Sampling Distribution Simulation 2) Select a normal distribution and choose a small sample size, then you can take 50,000 samples (or more) and look at the graph for the sampling distribution of the mean 3) Select a skewed distribution and choose a small sample size (n = 2 or 5), repeat the same procedure and see what happens. 4) Select a skewed distribution and choose the largest sample size available (n=25) and generate again the sampling distribution of the means 5) What are your conclusions? Did you notice any difference among the previous simulations? How can you relate your findings to the theory studied in class? Please remember the requirements for the application of the central limit theorem Chapter 6 Page 11

Now, let s do a simulation using JMP, we are going to generate an integer uniform distribution using the numbers 1 to 10 and we are going to obtain samples from this distribution First, let s open a new data table: Figure 6.23 then, right click over the heading of Column 1 and select Column Info, Figure 6.24 choose Formula from Column Properties and select Edit Formula Chapter 6 Page 12

Figure 6.25 select Random and Random Integer as follows, Figure 6.26 type 1 inside the red box, and hit enter, type, and 10, you should see the following window Chapter 6 Page 13

Figure 6.27 then hit enter, click OK on this window and click OK again in the next window. You are not going to see any changes at the data window as we still have to add some columns. To do this, right click over the cell below the red triangle and select Add Rows as follows Figure 6.28 type 200 inside the box Chapter 6 Page 14

Figure 6.29 you can see randomly generated numbers from 1 to 10, Figure 6.30 then left click twice over the space to the right of Column 1 and keep doing that until you generate 4 new columns as follows Figure 6.31 Chapter 6 Page 15

Figure 6.32 next, right click over the heading of Column 1 and select, Copy Column Properties then go over the heading of each new column and right click over the heading and select Paste Column Properties, repeat this procedure for each column Figure 6.33 you are going to see 5 columns with integer random numbers ranging from 1 to 10 Chapter 6 Page 16

Figure 6.34 now, let s compute the mean for each row, and put these results in column 6. Let s generate a new column by double clicking on the space right to the heading of Column 5. Then, right click over the heading of the new column and as we have done before. Select Column info, then select Formula from Column Properties and click over Edit Formula (as in Figures 6.1 to 6.4), then choose Statistical from Functions and select Mean from the menu as follows, Figure 6.35 then, click inside the parenthesis and click twice over Column 1 under Table Columns, type a comma and click over Column 2 and so on, until you add all columns until Column 5, your formula should look like this: Chapter 6 Page 17

Figure 6.36 click over OK on this window and in the next window, now you can see the mean computed for every row. The interesting thing about the new column is that it contains the sampling distribution of the means from a uniform probability distribution of integers ranging from 1 to 10. It will be interesting to take a look at the properties of the sampling distribution of the means that we got on column 6. With that purpose in mind, let s choose the Analyze menu and select Distribution, then click over Column 6 and next, click over Y, Columns, and click over OK. You are going to obtain a histogram for the sampling distribution of the means. You can see a bell shaped distribution with a mean of 5.618 and a standard deviation of 1.278534 (results may vary). You can get a horizontal layout by choosing this option from the Display Options located under the second red triangle. Notice that the mean of your sampling distribution approximates the mean of the uniform distribution of the integers (the mean is 5.5). Figure 6.37 Chapter 6 Page 18

Also, you should observe that the sampling distribution of the means approximates a normal distribution even that the original population is uniform with integers ranging from 1 to 10 and we used a small sample size. The next step is to check your sampling distribution of the means for normality. Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling distribution of the means located on Column 6. Let s proceed as follows: click over the lower right triangle on the window shown in Figure 6.37 and select Continuous Fit, then select Normal Figure 6.38 This option overlaps a normal shape over the histogram as shown below, but probably this is not enough to assess normality, Figure 6.39 then select from the lower right triangle, and choose the option Normal Quantile Plot Chapter 6 Page 19

Figure 6.40 At this point, you can see a Q-Q plot (normal quantile plot) for the data in Column 6 as shown bellow Figure 6.41 we can see that the Q-Q plot follows a straight line pattern (more or less) and the dots are located within the curves described with red dots. There is no presence of an obvious pattern on the Q-Q plot, therefore we can accept normality of the sampling distribution of the means as predicted by the central limit theorem (even that in this case the sample size was small). Chapter 6 Page 20

Class Exercises: 1- Probability functions: Consider that women s heights are normally distributed with a mean of 63.6 and a standard deviation of 2.5 then, answer the following questions using the function Normal Distribution as in class example 1 (shown at the beginning of this Chapter). a. Find the probability that a woman selected at random is between the heights of 60 and 66. b. Find the probability that a woman selected at random is taller than 69 2- Simulations: Solve the previous problems using a simulation (Generate a sequence of 10,000 normally distributed random numbers). Compare the simulated results with the computed probabilities from problem 1. 3- Central Limit Theorem: Generate 4 columns with 250 numbers in each column, using a random normal distribution with a mean of 63.6 and a standard deviation of 2.5 a. Compute the mean for each row on the fifth column b. Analyze the sampling distribution of the means on the fifth column, obtain summary statistics, describe the shape of the distribution and make comments c. Compare the population mean with the mean from the sample means at Column 6, Are they similar? d. Compare the standard deviation of Column 6, with the standard deviation of the population, how they are related? (Hint: take a look at the CLT) e. Discuss your findings with your classmates Team Assignment: Assessing Normality Use your random sample that you obtained from the file Small Town.xls and do the following: 1- Assess normality using a Q-Q plot (Normal Probability Plot) for all numeric variables 2- Write a report showing your findings: a. Show a histogram for each continuous variable b. Show a Q-Q plot (normal probability plot) for each numeric continuous variable c. Based on the previous graphs discuss if normality is acceptable for each variable, write briefly the reasons that support your conclusion d. Explore transformations for those variables that normality was not acceptable, that is: apply a mathematical function such as the logarithmic function or the square root to transform every value, and discuss if the results are different (better) than before e. Summarize your findings on a table, showing which variables can be considered normally distributed and which variables can t be considered normally distributed, specify if a transformation was applied to achieve normality 3- Choose a variable that is normally distributed, compute the mean and standard deviation and simulate the results an equivalent normal distribution. Simulate a normal random variable with these parameters, and find the probability that one observation is between 1.5 standard deviations below the mean and 1.2 standard deviations around the mean, compare the result obtained by simulation with the probability for a standard normal distribution P(-1.5< z <1.2) Chapter 6 Page 21