Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals.

Similar documents
Statistics I Practice 2 Notes Probability and probabilistic models; Introduction of the statistical inference

Page 1. Graphical and Numerical Statistics

Multivariate Normal Random Numbers

Topic 5 - Joint distributions and the CLT

Week 7: The normal distribution and sample means

CHAPTER 6. The Normal Probability Distribution

We have seen that as n increases, the length of our confidence interval decreases, the confidence interval will be more narrow.

Using Large Data Sets Workbook Version A (MEI)

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Unit 5: Estimating with Confidence

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Central Limit Theorem Sample Means

Use of Extreme Value Statistics in Modeling Biometric Systems

Probability Models.S4 Simulating Random Variables

Brief Guide on Using SPSS 10.0

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

So..to be able to make comparisons possible, we need to compare them with their respective distributions.

Problem Set #8. Econ 103

One Factor Experiments

Chapter 6: Simulation Using Spread-Sheets (Excel)

Basics: How to Calculate Standard Deviation in Excel

Lesson 19: The Graph of a Linear Equation in Two Variables Is a Line

CHAPTER 2: Describing Location in a Distribution

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

R Programming Basics - Useful Builtin Functions for Statistics

Multivariate Capability Analysis

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Lecture 6: Chapter 6 Summary

CREATING THE DISTRIBUTION ANALYSIS

Fathom Dynamic Data TM Version 2 Specifications

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Descriptive Statistics, Standard Deviation and Standard Error

Exam 2 is Tue Nov 21. Bring a pencil and a calculator. Discuss similarity to exam1. HW3 is due Tue Dec 5.

Screening Design Selection

Table 1 below illustrates the construction for the case of 11 integers selected from 20.

Lesson 19: The Graph of a Linear Equation in Two Variables is a Line

Probability and Statistics for Final Year Engineering Students

LASER s Level 2 Maths Course - Summary

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS

Robust Linear Regression (Passing- Bablok Median-Slope)

Applied Regression Modeling: A Business Approach

Chapter 5: The standard deviation as a ruler and the normal model p131

If the active datasheet is empty when the StatWizard appears, a dialog box is displayed to assist in entering data.

Quantitative - One Population

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

Using Arithmetic of Real Numbers to Explore Limits and Continuity

1. Select[] is used to select items meeting a specified criterion.

Error Analysis, Statistics and Graphing

Experiment 1 Yahtzee or Validating the t-table

A Constant Rate of Change Name Part 1

Assignment 4/5 Statistics Due: Nov. 29

Measures of Dispersion

MHPE 494: Data Analysis. Welcome! The Analytic Process

Computational Mathematics/Information Technology. Worksheet 2 Iteration and Excel

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015

STAT 135 Lab 1 Solutions

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Spreadsheet and Graphing Exercise Biology 210 Introduction to Research

Applied Regression Modeling: A Business Approach

CHAPTER 2 Modeling Distributions of Data

Regression Analysis and Linear Regression Models

Fractions and decimals have opposites, just as integers do. For example, 5 8 and 2 5

= 3 + (5*4) + (1/2)*(4/2)^2.

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Behavior of the sample mean. varx i = σ 2

4. Descriptive Statistics: Measures of Variability and Central Tendency

STATS PAD USER MANUAL

Using Excel This is only a brief overview that highlights some of the useful points in a spreadsheet program.

Lab 7 Statistics I LAB 7 QUICK VIEW

E-Campus Inferential Statistics - Part 2

Winstats Instruction Sheet

Instructions for Using ABCalc James Alan Fox Northeastern University Updated: August 2009

VALIDITY OF 95% t-confidence INTERVALS UNDER SOME TRANSECT SAMPLING STRATEGIES

Critical Numbers, Maximums, & Minimum

CHAPTER 2 Modeling Distributions of Data

STAT 503 Fall Introduction to SAS

4b: Making an auxiliary table for calculating the standard deviation

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Lecture 3 - Object-oriented programming and statistical programming examples


Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Week 4: Describing data and estimation

Chapter 5snow year.notebook March 15, 2018

[Note: each line drawn must be a single line segment satisfying x = 3] (b) y = x drawn 1 B1 for y = x drawn

Visual-XSel Introduction & SixSigma Selected statistical methods, examples and SixSigma with Visual-XSel Copyright CRGRAPH

Chapter 6: DESCRIPTIVE STATISTICS

Using R. Liang Peng Georgia Institute of Technology January 2005

Assignment 5.5. Nothing here to hand in

Tutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures

Testing Random- Number Generators

Chemical Reaction dataset ( )

Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots.

Optimization and Simulation

What s Normal Anyway?

Chapters 5-6: Statistical Inference Methods

Transcription:

Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals. In this Computer Class we are going to use Statgraphics to simulate random samples and to evaluate their goodness of fit with respect to some random variable with known probability law. Moreover, we are going to study an application of the Central Limit Theorem and an introduction to the confidence intervals. At the end of each Section, we must save the simulated random samples, close Statgraphics and open it again (alternatively, we could save the simulated random samples, clean the DataBook and close all the active windows). This operation is fundamental to answer to the questions of Section 4. To save the simulated random samples we can make the following: Save as Save Data File as 1. Simulation of samples from random variables; Goodness of fit. Statgraphics allows generating samples from random variables, i.e. samples based on probability laws. For example, in this section we are going to simulate two samples with size n=100: RAND1: sample of a N(0,1). RAND2: sample of a Student s t with 5 degrees of freedom. Thanks to the above simulations, we are also going to recall the main differences between the Student s t and the Normal laws. Let s start generating the random sample RAND1. To do this, select Describe Distribution Fitting Probability Distributions In the emerging window, select the Probability Distribution in which we are interested, that is the Normal. In the window Normal Options insert the parameters, that are Mean = 0 and Std. Dev. = 1. In the window Tables and Graphs select Analysis Summary, Density/Mass Function and Random

Numbers. With the option Density/Mass Function we are going to compare the density functions of the N(0,1) and of the Student s t with 5 degrees of freedom, whereas with the option Random Numbers we are going to generate the random samples. To obtain the sample RAND1, first we press the button: Successively, we are repeat the following instructions in the window Save Results Options : In this way, we obtain the sample RAND1, and you can find it in the DataBook of Statgraphics:

Next, we generate the sample RAND2. To do this, select Describe Distribution Fitting Probability Distributions In the emerging window, select the Probability Distribution in which we are interested, that is the Student s t. In the window Student s t Options insert the parameter, that is D. F. = 5 (we are going to insert it in the second box; in this way Statgraphics recognizes that the Student s t is the second distribution with which we want to work). In the window Tables and Graphs select Analysis Summary, Density/Mass Function and Random Numbers. Thanks to the option Density/Mass Function we can now compare the two density functions. What are the main differences that you can appreciate? Let s come back to the main goal of this part of the Notes, that is the generation of RAND2. In this case, we have to repeat the following instructions in the window Save Results Options : Once we obtain RAND2, we can study its goodness of fit. To do this with Statgraphics, we can draw the histogram of RAND2 and we can compare it with the density function of a Normal random variable having parameters equal to the mean and standard deviation sample estimates. With the same estimates, we can obtain also the corresponding QQ-plot. To obtain these two graphs, select: Describe Distribution Fitting Fitting Uncensored Data

We are interested in knowing if the sample RAND2 is coherent with a Normal law, that is we want to examine the goodness of fit of RAND2 with respect to a Normal law. To do this, select RAND2 and, in the emerging window, select the Normal option. In the window Tables and Graphs select Analysis Summary, Frequency Histogram and Quantile-Quantile Plot. The following figure shows an example of the result that you should obtain: Clearly, each of you will obtain different results because of the random nature of the examples above. Exercise: Examine the goodness of fit of RAND1 and analyze if there are differences in the results. Write your comments in last sheet of these Notes. BEFORE TO PASS TO THE NEXT SECTION, SAVE THE DATA, CLOSE AND OPEN AGAIN STATGRAPHICS.

2. An application of the Central Limit Theorem. In this Section we are going to study the probability law of the Sample Mean estimator. To do this, we are going to generate M samples with size n from a random variable with known probability law. Next, for each sample we are going to compute its sample mean, obtaining at the end M different values. Finally, we are going to study the probability law of this sample with size M of sample means, using the fact that according to the Central Limit Theorem the probability law should be Normal. Let s start considering a random variable X with Uniform law with parameters a=1 and b=10, where a and b are the lower and the upper limits of the random variables X respectively. Using this probability law, let s generate M=30 sample with size n=30. To do this, select the following path: Describe Distribution Fitting Probability Distributions In the emerging window, select the Probability Distribution in which we are interested, that is Uniform and in the window Uniform Options insert the same values for the parameters in all the 5 rows: Lower Limit = 1 y Upper Limit = 10. In the window Tables and Graphs select Analysis Summary and Random Numbers. To change the size of each sample, position the mouse on the second window ( Random Numbers ), press the right button, select Pane Options and change the Size from 100 to 30. On the other hand, thanks to Section 1, we know how to generate the first 5 samples, and in particular for this case we are going to repeat the following instructions in the window Save Results Options :

Thanks to the previous steps we have generated the first 5 random samples. Now, to generate the remaining 25 samples, we have to repeat 5 times the previous operations, but using different names for the Target Variables. At the end of these operations, our DataBook will contain M=30 columns and each of them will be a sample with size n=30 of a random variable X with Uniform law with parameters a=1 and b=10. Performing a Multiple-Variable Analysis of these data, we can obtain the M=30 sample means: Describe Numeric Data Multiple-Variable Analysis

Using the Pane Options of the Multiple-Variable Analysis we can reduce the number of statistics that we want to observe. Indeed, we are going to select exclusively Average. Now, we are going to copy the 30 averages in the first column of the sheet B of the DataBook (ADVICE: write them on a paper and then copy them in Statgraphics. There is no other way to do that!!! Use two decimals). Change also the name of the column (with a double click; for example it could be Sample means ):

Let s recall that our main goal consists in studying the probability law of the Sample Mean estimator, and for that we want to use the Central Limit Theorem. This theorem says that, whatever is the probability law of X (i.e. the random variable used to generate the samples with the same size, n), if n is sufficiently big, the estimator Sample Mean will have a Normal probability law: E[X], Var(X) n In our case, given that X follows a Uniform law with parameters a=1 and b=10, its expected value and its variance are given by: a + b E[X] = = 1 + 10 = 5.5 2 2 (b a)2 100 + 1 20 Var(X) = = = 81 12 12 12 = 6.75 It follows that if we apply the Central Limit Theorem, we obtain that the estimator Sample Mean should have a law such as X ~ N(5.5, 0.4743). As in the previous section, we can examine the goodness of fit of the random sample Sample Means (i.e. the column we have built) with respect to a Normal law: Describe Distribution Fitting Fitting Uncensored Data In the emerging window, select the Normal option and in the window Tables and Graphs select Analysis Summary, Histogram and Quantile-Quantile Plot. The following figure shows an example of the result that you should obtain:

Clearly, each of you will obtain different results because of the random nature of the examples above. In this example and according to the previous figure, the estimates for the mean and for the standard deviation are: Mean = 5.45067 (in place of 5.5) Standard Deviation = 0,410155 Important: you should remember that Statgraphics calls variance (standard deviation) the following: 2 s n 1 = 1 n (x n 1 i=1 i x ) 2 The expression ns 2 2 n = (n 1)s n 1 explains the relation between one and the other. Finally, are the estimates of the mean and of the standard deviation good? What are the absolute errors in your case? Exercise: examine the goodness of fit of the sample Sample Means for the cases n=15 and n=60. Analyze how changes the distribution of the Sample Mean, and if the approximation improves or not calculating the absolute errors. Write your comments in the last page of these Notes BEFORE TO PASS TO THE NEXT SECTION, SAVE THE DATA, CLOSE AND OPEN AGAIN STATGRAPHICS.

3. Introduction to the construction of confidence intervals. In this section we are going to define two confidence intervals: The first is related with a sample from a N(0,1). The second is related with a sample from a Student s t with 5 degrees of freedom. For what concerns to the confidence intervals, Statgraphics always assumes that the population from which the data come follows a Normal law. For this reason, it is advisable to use the following instructions exclusively when the sample size is big (n 30), and this is because in these cases the Central Limit Theorem assures that the distribution of the Sample Mean estimator follows approximately a Normal law, whatever is the distribution of the initial data. In Section 1 we have seen in details how to generate samples. Let s start generating two samples with size n=100, and in particular: SAMPLE1: sample with size 100 from a N(0,1). SAMPLE2: sample with size 100 from a Student s t with 5 degrees of freedom. Once obtained SAMPLE1 and SAMPLE2, select the following path: Describe Numeric Data One-Variable Analysis Select for example SAMPLE1, and in the windows Tables and Graphs select Analysis Summary and Confidence Intervals. The following table shows the confidence interval for the mean obtained with SAMPLE1: 95% Confidence interval for the mean: 0,118804 +/- 0,224542 [-0,105738; 0,343347] Clearly, each of you will obtain different results because of the random nature of the examples above. Exercise: compute the confidence interval associated to SAMPLE2. Write it in the last page of these Notes. BEFORE TO PASS TO THE NEXT SECTION, SAVE THE DATA, CLOSE AND OPEN AGAIN STATGRAPHICS.

4. Exercises (you should hand them at the end of this class, using the last sheet of these Notes) Each section concludes with a bold text: develop the instructions and report the comments and/or the answers in the last sheet of these Notes.

Answers for Section 4. Name and Surname(s): NIU: Degree: Group Section 1: Comments about the goodness of fit of RAND1 and comparison with the results obtained for RAND2. Section 2: Comments about the goodness of fit of Sample Means for the cases n=15 and n=60. Compute the absolute errors for the estimation of the mean and of the standard deviation. Compare with the results obtained for n=30. Section 3: 95% Confidence interval for the mean obtained using SAMPLE2.