Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals.

Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals. In this Computer Class we are going to use Statgraphics to simulate random samples and to evaluate their goodness of fit with respect to some random variable with known probability law. Moreover, we are going to study an application of the Central Limit Theorem and an introduction to the confidence intervals. At the end of each Section, we must save the simulated random samples, close Statgraphics and open it again (alternatively, we could save the simulated random samples, clean the DataBook and close all the active windows). This operation is fundamental to answer to the questions of Section 4. To save the simulated random samples we can make the following: Save as Save Data File as 1. Simulation of samples from random variables; Goodness of fit. Statgraphics allows generating samples from random variables, i.e. samples based on probability laws. For example, in this section we are going to simulate two samples with size n=100: RAND1: sample of a N(0,1). RAND2: sample of a Student s t with 5 degrees of freedom. Thanks to the above simulations, we are also going to recall the main differences between the Student s t and the Normal laws. Let s start generating the random sample RAND1. To do this, select Describe Distribution Fitting Probability Distributions In the emerging window, select the Probability Distribution in which we are interested, that is the Normal. In the window Normal Options insert the parameters, that are Mean = 0 and Std. Dev. = 1. In the window Tables and Graphs select Analysis Summary, Density/Mass Function and Random

Numbers. With the option Density/Mass Function we are going to compare the density functions of the N(0,1) and of the Student s t with 5 degrees of freedom, whereas with the option Random Numbers we are going to generate the random samples. To obtain the sample RAND1, first we press the button: Successively, we are repeat the following instructions in the window Save Results Options : In this way, we obtain the sample RAND1, and you can find it in the DataBook of Statgraphics:

Next, we generate the sample RAND2. To do this, select Describe Distribution Fitting Probability Distributions In the emerging window, select the Probability Distribution in which we are interested, that is the Student s t. In the window Student s t Options insert the parameter, that is D. F. = 5 (we are going to insert it in the second box; in this way Statgraphics recognizes that the Student s t is the second distribution with which we want to work). In the window Tables and Graphs select Analysis Summary, Density/Mass Function and Random Numbers. Thanks to the option Density/Mass Function we can now compare the two density functions. What are the main differences that you can appreciate? Let s come back to the main goal of this part of the Notes, that is the generation of RAND2. In this case, we have to repeat the following instructions in the window Save Results Options : Once we obtain RAND2, we can study its goodness of fit. To do this with Statgraphics, we can draw the histogram of RAND2 and we can compare it with the density function of a Normal random variable having parameters equal to the mean and standard deviation sample estimates. With the same estimates, we can obtain also the corresponding QQ-plot. To obtain these two graphs, select: Describe Distribution Fitting Fitting Uncensored Data

We are interested in knowing if the sample RAND2 is coherent with a Normal law, that is we want to examine the goodness of fit of RAND2 with respect to a Normal law. To do this, select RAND2 and, in the emerging window, select the Normal option. In the window Tables and Graphs select Analysis Summary, Frequency Histogram and Quantile-Quantile Plot. The following figure shows an example of the result that you should obtain: Clearly, each of you will obtain different results because of the random nature of the examples above. Exercise: Examine the goodness of fit of RAND1 and analyze if there are differences in the results. Write your comments in last sheet of these Notes. BEFORE TO PASS TO THE NEXT SECTION, SAVE THE DATA, CLOSE AND OPEN AGAIN STATGRAPHICS.

2. An application of the Central Limit Theorem. In this Section we are going to study the probability law of the Sample Mean estimator. To do this, we are going to generate M samples with size n from a random variable with known probability law. Next, for each sample we are going to compute its sample mean, obtaining at the end M different values. Finally, we are going to study the probability law of this sample with size M of sample means, using the fact that according to the Central Limit Theorem the probability law should be Normal. Let s start considering a random variable X with Uniform law with parameters a=1 and b=10, where a and b are the lower and the upper limits of the random variables X respectively. Using this probability law, let s generate M=30 sample with size n=30. To do this, select the following path: Describe Distribution Fitting Probability Distributions In the emerging window, select the Probability Distribution in which we are interested, that is Uniform and in the window Uniform Options insert the same values for the parameters in all the 5 rows: Lower Limit = 1 y Upper Limit = 10. In the window Tables and Graphs select Analysis Summary and Random Numbers. To change the size of each sample, position the mouse on the second window ( Random Numbers ), press the right button, select Pane Options and change the Size from 100 to 30. On the other hand, thanks to Section 1, we know how to generate the first 5 samples, and in particular for this case we are going to repeat the following instructions in the window Save Results Options :

Thanks to the previous steps we have generated the first 5 random samples. Now, to generate the remaining 25 samples, we have to repeat 5 times the previous operations, but using different names for the Target Variables. At the end of these operations, our DataBook will contain M=30 columns and each of them will be a sample with size n=30 of a random variable X with Uniform law with parameters a=1 and b=10. Performing a Multiple-Variable Analysis of these data, we can obtain the M=30 sample means: Describe Numeric Data Multiple-Variable Analysis

Using the Pane Options of the Multiple-Variable Analysis we can reduce the number of statistics that we want to observe. Indeed, we are going to select exclusively Average. Now, we are going to copy the 30 averages in the first column of the sheet B of the DataBook (ADVICE: write them on a paper and then copy them in Statgraphics. There is no other way to do that!!! Use two decimals). Change also the name of the column (with a double click; for example it could be Sample means ):

Let s recall that our main goal consists in studying the probability law of the Sample Mean estimator, and for that we want to use the Central Limit Theorem. This theorem says that, whatever is the probability law of X (i.e. the random variable used to generate the samples with the same size, n), if n is sufficiently big, the estimator Sample Mean will have a Normal probability law: E[X], Var(X) n In our case, given that X follows a Uniform law with parameters a=1 and b=10, its expected value and its variance are given by: a + b E[X] = = 1 + 10 = 5.5 2 2 (b a)2 100 + 1 20 Var(X) = = = 81 12 12 12 = 6.75 It follows that if we apply the Central Limit Theorem, we obtain that the estimator Sample Mean should have a law such as X ~ N(5.5, 0.4743). As in the previous section, we can examine the goodness of fit of the random sample Sample Means (i.e. the column we have built) with respect to a Normal law: Describe Distribution Fitting Fitting Uncensored Data In the emerging window, select the Normal option and in the window Tables and Graphs select Analysis Summary, Histogram and Quantile-Quantile Plot. The following figure shows an example of the result that you should obtain:

Clearly, each of you will obtain different results because of the random nature of the examples above. In this example and according to the previous figure, the estimates for the mean and for the standard deviation are: Mean = 5.45067 (in place of 5.5) Standard Deviation = 0,410155 Important: you should remember that Statgraphics calls variance (standard deviation) the following: 2 s n 1 = 1 n (x n 1 i=1 i x ) 2 The expression ns 2 2 n = (n 1)s n 1 explains the relation between one and the other. Finally, are the estimates of the mean and of the standard deviation good? What are the absolute errors in your case? Exercise: examine the goodness of fit of the sample Sample Means for the cases n=15 and n=60. Analyze how changes the distribution of the Sample Mean, and if the approximation improves or not calculating the absolute errors. Write your comments in the last page of these Notes BEFORE TO PASS TO THE NEXT SECTION, SAVE THE DATA, CLOSE AND OPEN AGAIN STATGRAPHICS.

3. Introduction to the construction of confidence intervals. In this section we are going to define two confidence intervals: The first is related with a sample from a N(0,1). The second is related with a sample from a Student s t with 5 degrees of freedom. For what concerns to the confidence intervals, Statgraphics always assumes that the population from which the data come follows a Normal law. For this reason, it is advisable to use the following instructions exclusively when the sample size is big (n 30), and this is because in these cases the Central Limit Theorem assures that the distribution of the Sample Mean estimator follows approximately a Normal law, whatever is the distribution of the initial data. In Section 1 we have seen in details how to generate samples. Let s start generating two samples with size n=100, and in particular: SAMPLE1: sample with size 100 from a N(0,1). SAMPLE2: sample with size 100 from a Student s t with 5 degrees of freedom. Once obtained SAMPLE1 and SAMPLE2, select the following path: Describe Numeric Data One-Variable Analysis Select for example SAMPLE1, and in the windows Tables and Graphs select Analysis Summary and Confidence Intervals. The following table shows the confidence interval for the mean obtained with SAMPLE1: 95% Confidence interval for the mean: 0,118804 +/- 0,224542 [-0,105738; 0,343347] Clearly, each of you will obtain different results because of the random nature of the examples above. Exercise: compute the confidence interval associated to SAMPLE2. Write it in the last page of these Notes. BEFORE TO PASS TO THE NEXT SECTION, SAVE THE DATA, CLOSE AND OPEN AGAIN STATGRAPHICS.

4. Exercises (you should hand them at the end of this class, using the last sheet of these Notes) Each section concludes with a bold text: develop the instructions and report the comments and/or the answers in the last sheet of these Notes.

Answers for Section 4. Name and Surname(s): NIU: Degree: Group Section 1: Comments about the goodness of fit of RAND1 and comparison with the results obtained for RAND2. Section 2: Comments about the goodness of fit of Sample Means for the cases n=15 and n=60. Compute the absolute errors for the estimation of the mean and of the standard deviation. Compare with the results obtained for n=30. Section 3: 95% Confidence interval for the mean obtained using SAMPLE2.