The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit theorem is probably the main reason that contributes to the importance of the normal distribution. It is essential for statistics students to learn how to use the normal probability distribution for solving applied problems. In this Chapter we are going to study the normal probability distribution using the appropriate functions in JMP. Also, we are going to perform simulations using a random function to generate a normally distributed random variable with a specified mean and standard deviation. We are going to perform a statistical experiment to demonstrate numerically the central limit theorem, and finally we are going to assess the normality of a given dataset. Class Exercises: Compute probabilities for the normal distribution Class example 1: According to the National Health Survey, heights of adult males are normally distributed with a mean of 69 and a standard deviation of 2.9. Compute the percentage of the population of adult males that falls between 64 and 76. First, let s open a new data table, Figure 6.1 then, right click at the heading of Column 1 Chapter 6 Page 1
Figure 6.2 click on the text box for Column Name and change the name to x, as follows: Figure 6.3 left click twice at the right side of the first column heading to open a new column Figure 6.4 you can save the file as Normal Dist (or anything you like), then right click over Column 2, select Column Info and change the name to P(x), then click over Column Properties and select formula, as shown below, Chapter 6 Page 2
Figure 6.5 a new window will open, then choose Probability from Functions (grouped) and select Normal Distribution as shown below Figure 6.6 then click twice over variable x, and click inside the parenthesis, then after the variable x, type,69,2.8 as shown below: Chapter 6 Page 3
Figure 6.7 click over Apply, you are going to see the following screen: Figure 6.8 then click over OK on this window and in the next window, next we want to compute the cumulative probability for x=64 and x = 76, let s input these numbers in the first column as shown below: Chapter 6 Page 4
Figure 6.9 the cumulative probabilities for these numbers are shown above. Thus, the probability that the height of one person is between 64 and 76 is (rounding to three digits): P(64<x<76) = 0.994-.037 = 0.957 Class Exercise 2: We can also perform probability computations using a simulation, for example, let s generate 10,000 random numbers from a normal distribution with a mean of 69 and standard deviation of 2.9, to do this, let s open a new data table as follows, Figure 6.10 then, right click at the heading of Column 1 and click over Column Info Figure 6.11 Chapter 6 Page 5
click on the text box for Column Name and change the name to x, as follows: Figure 6.12 then click over Column Properties and select Formula, Figure 6.13 next, click over Edit Formula and select Random, then select Random Normal Chapter 6 Page 6
Figure 6.14 click inside the parenthesis, and input the numbers, 69 and 2.8 separated by a comma as follows: Figure 6.15 click over OK on this window and in the next window, then right click over the first column (below the red arrow) and select Add Rows, as below Chapter 6 Page 7
Figure 6.16 type 10000 at the dialog box and click OK Figure 6.17 at this point, you are going to see a sequence of randomly generated numbers from a normal distribution, Figure 6.18 Chapter 6 Page 8
you can draw a histogram using the Analyze menu and choosing the Distribution option, (see Chapter 3 for more details, this procedure is not shown here). You can check the shape of the distribution and take a look at the summary statistics that will be approximately equal to the requested mean and standard deviation (this activity is highly recommended, please ask you lab instructor if you do not know how to do it). Next, you need to sort the numbers from lowest to highest, by selecting Tables and Sort, then choose the variable x and click over By, you will see the next window Figure 6.19 click over OK, and you are going to see the sequence of random numbers ordered from lowest to highest as follows: Figure 6.20 Chapter 6 Page 9
computing the simulated probabilities is just a matter of counting the number of observations that match the requirements for this problem. To compute the requested probabilities, you need to count the number of observations that are less than 64, you can do it by scanning the ordered dataset, and looking at the index number on the left side of the screen, Figure 6.21 we can see at the Figure above that there are 349 observations less than 64, then this probability is computed as follows: P(x<64) = 349/10,0000 = 0.0349 Which is close to the computed probability using the normal distribution formula (see Figure 6.9) of 0.0370, please do not forget that this is a numeric simulation and the results shown here are approximations to the true probabilities, but this result is close enough. Next, we need to find the probability that a man selected at random has a height less than 76, to do it we need to count the number of observations that are less than 76 as shown below: Chapter 6 Page 10
Figure 6.22 we found 9940 observations that are less than 76, thus the probability associated with that event is computed as follows =9,940/10,000 = 0.994, then the computation for the probability that one man selected at random is between 64 and 76 is as follows: P(64<x<76) = 0.994 0.035 = 0.959, which is very close to the probability computed using the formulas, as you can see here, the simulation provided acceptable results! Class Exercise: The Central Limit Theorem Please go to the website: http://onlinestatbook.com/rvls.html or search in your browser Rice virtual labs 1) Select Simulations and Demonstrations, and select Sampling Distribution Simulation 2) Select a normal distribution and choose a small sample size, then you can take 50,000 samples (or more) and look at the graph for the sampling distribution of the mean 3) Select a skewed distribution and choose a small sample size (n = 2 or 5), repeat the same procedure and see what happens. 4) Select a skewed distribution and choose the largest sample size available (n=25) and generate again the sampling distribution of the means 5) What are your conclusions? Did you notice any difference among the previous simulations? How can you relate your findings to the theory studied in class? Please remember the requirements for the application of the central limit theorem Chapter 6 Page 11
Now, let s do a simulation using JMP, we are going to generate an integer uniform distribution using the numbers 1 to 10 and we are going to obtain samples from this distribution First, let s open a new data table: Figure 6.23 then, right click over the heading of Column 1 and select Column Info, Figure 6.24 choose Formula from Column Properties and select Edit Formula Chapter 6 Page 12
Figure 6.25 select Random and Random Integer as follows, Figure 6.26 type 1 inside the red box, and hit enter, type, and 10, you should see the following window Chapter 6 Page 13
Figure 6.27 then hit enter, click OK on this window and click OK again in the next window. You are not going to see any changes at the data window as we still have to add some columns. To do this, right click over the cell below the red triangle and select Add Rows as follows Figure 6.28 type 200 inside the box Chapter 6 Page 14
Figure 6.29 you can see randomly generated numbers from 1 to 10, Figure 6.30 then left click twice over the space to the right of Column 1 and keep doing that until you generate 4 new columns as follows Figure 6.31 Chapter 6 Page 15
Figure 6.32 next, right click over the heading of Column 1 and select, Copy Column Properties then go over the heading of each new column and right click over the heading and select Paste Column Properties, repeat this procedure for each column Figure 6.33 you are going to see 5 columns with integer random numbers ranging from 1 to 10 Chapter 6 Page 16
Figure 6.34 now, let s compute the mean for each row, and put these results in column 6. Let s generate a new column by double clicking on the space right to the heading of Column 5. Then, right click over the heading of the new column and as we have done before. Select Column info, then select Formula from Column Properties and click over Edit Formula (as in Figures 6.1 to 6.4), then choose Statistical from Functions and select Mean from the menu as follows, Figure 6.35 then, click inside the parenthesis and click twice over Column 1 under Table Columns, type a comma and click over Column 2 and so on, until you add all columns until Column 5, your formula should look like this: Chapter 6 Page 17
Figure 6.36 click over OK on this window and in the next window, now you can see the mean computed for every row. The interesting thing about the new column is that it contains the sampling distribution of the means from a uniform probability distribution of integers ranging from 1 to 10. It will be interesting to take a look at the properties of the sampling distribution of the means that we got on column 6. With that purpose in mind, let s choose the Analyze menu and select Distribution, then click over Column 6 and next, click over Y, Columns, and click over OK. You are going to obtain a histogram for the sampling distribution of the means. You can see a bell shaped distribution with a mean of 5.618 and a standard deviation of 1.278534 (results may vary). You can get a horizontal layout by choosing this option from the Display Options located under the second red triangle. Notice that the mean of your sampling distribution approximates the mean of the uniform distribution of the integers (the mean is 5.5). Figure 6.37 Chapter 6 Page 18
Also, you should observe that the sampling distribution of the means approximates a normal distribution even that the original population is uniform with integers ranging from 1 to 10 and we used a small sample size. The next step is to check your sampling distribution of the means for normality. Class Exercise: Assessing normality, Using results from the previous exercise we will assess normality of the sampling distribution of the means located on Column 6. Let s proceed as follows: click over the lower right triangle on the window shown in Figure 6.37 and select Continuous Fit, then select Normal Figure 6.38 This option overlaps a normal shape over the histogram as shown below, but probably this is not enough to assess normality, Figure 6.39 then select from the lower right triangle, and choose the option Normal Quantile Plot Chapter 6 Page 19
Figure 6.40 At this point, you can see a Q-Q plot (normal quantile plot) for the data in Column 6 as shown bellow Figure 6.41 we can see that the Q-Q plot follows a straight line pattern (more or less) and the dots are located within the curves described with red dots. There is no presence of an obvious pattern on the Q-Q plot, therefore we can accept normality of the sampling distribution of the means as predicted by the central limit theorem (even that in this case the sample size was small). Chapter 6 Page 20
Class Exercises: 1- Probability functions: Consider that women s heights are normally distributed with a mean of 63.6 and a standard deviation of 2.5 then, answer the following questions using the function Normal Distribution as in class example 1 (shown at the beginning of this Chapter). a. Find the probability that a woman selected at random is between the heights of 60 and 66. b. Find the probability that a woman selected at random is taller than 69 2- Simulations: Solve the previous problems using a simulation (Generate a sequence of 10,000 normally distributed random numbers). Compare the simulated results with the computed probabilities from problem 1. 3- Central Limit Theorem: Generate 4 columns with 250 numbers in each column, using a random normal distribution with a mean of 63.6 and a standard deviation of 2.5 a. Compute the mean for each row on the fifth column b. Analyze the sampling distribution of the means on the fifth column, obtain summary statistics, describe the shape of the distribution and make comments c. Compare the population mean with the mean from the sample means at Column 6, Are they similar? d. Compare the standard deviation of Column 6, with the standard deviation of the population, how they are related? (Hint: take a look at the CLT) e. Discuss your findings with your classmates Team Assignment: Assessing Normality Use your random sample that you obtained from the file Small Town.xls and do the following: 1- Assess normality using a Q-Q plot (Normal Probability Plot) for all numeric variables 2- Write a report showing your findings: a. Show a histogram for each continuous variable b. Show a Q-Q plot (normal probability plot) for each numeric continuous variable c. Based on the previous graphs discuss if normality is acceptable for each variable, write briefly the reasons that support your conclusion d. Explore transformations for those variables that normality was not acceptable, that is: apply a mathematical function such as the logarithmic function or the square root to transform every value, and discuss if the results are different (better) than before e. Summarize your findings on a table, showing which variables can be considered normally distributed and which variables can t be considered normally distributed, specify if a transformation was applied to achieve normality 3- Choose a variable that is normally distributed, compute the mean and standard deviation and simulate the results an equivalent normal distribution. Simulate a normal random variable with these parameters, and find the probability that one observation is between 1.5 standard deviations below the mean and 1.2 standard deviations around the mean, compare the result obtained by simulation with the probability for a standard normal distribution P(-1.5< z <1.2) Chapter 6 Page 21