Statistics I Practice 2 Notes Probability and probabilistic models; Introduction of the statistical inference

Similar documents
Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals.

IT 403 Practice Problems (1-2) Answers

CHAPTER 2 Modeling Distributions of Data

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

CHAPTER 6. The Normal Probability Distribution

Measures of Dispersion

Chapter 5snow year.notebook March 15, 2018

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Data Management Project Using Software to Carry Out Data Analysis Tasks

CHAPTER 2 Modeling Distributions of Data

Fathom Dynamic Data TM Version 2 Specifications

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Lecture 3 Questions that we should be able to answer by the end of this lecture:

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Experiment 3 Microsoft Excel in Scientific Applications I

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Confidence Intervals: Estimators

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots.

Excel 2010 with XLSTAT

Instructions for Using ABCalc James Alan Fox Northeastern University Updated: August 2009

Lab 7 Statistics I LAB 7 QUICK VIEW

The Normal Distribution & z-scores

IQC monitoring in laboratory networks

Lecture 6: Chapter 6 Summary

INSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING

Pre-Lab Excel Problem

CHAPTER 2 Modeling Distributions of Data

Page 1. Graphical and Numerical Statistics

Using Large Data Sets Workbook Version A (MEI)

IQR = number. summary: largest. = 2. Upper half: Q3 =

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

The Normal Distribution & z-scores

Excel Simulations - 1

Learning Log Title: CHAPTER 7: PROPORTIONS AND PERCENTS. Date: Lesson: Chapter 7: Proportions and Percents

Chapter 2 Modeling Distributions of Data

So..to be able to make comparisons possible, we need to compare them with their respective distributions.

CMPF124 Microsoft Excel Tutorial

The Normal Distribution & z-scores

Frequency Distributions

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Graphing with Microsoft Excel

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Goals. The Normal Probability Distribution. A distribution. A Discrete Probability Distribution. Results of Tossing Two Dice. Probabilities involve

Chapter 5: The standard deviation as a ruler and the normal model p131

BIOL Gradation of a histogram (a) into the normal curve (b)

The Normal Probability Distribution. Goals. A distribution 2/27/16. Chapter 7 Dr. Richard Jerz

Spreadsheet and Graphing Exercise Biology 210 Introduction to Research

MAT 102 Introduction to Statistics Chapter 6. Chapter 6 Continuous Probability Distributions and the Normal Distribution

Key Terms. Symbology. Categorical attributes. Style. Layer file

CHAPTER 2: Describing Location in a Distribution

Section 2.2 Normal Distributions. Normal Distributions

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

MAT 110 WORKSHOP. Updated Fall 2018

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Math 227 EXCEL / MEGASTAT Guide

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Chapter 6 Normal Probability Distributions

Probability Models.S4 Simulating Random Variables

Processing, representing and interpreting data

Introduction to CS databases and statistics in Excel Jacek Wiślicki, Laurent Babout,

Descriptive Statistics, Standard Deviation and Standard Error

Plotting Graphs. Error Bars

Ms Nurazrin Jupri. Frequency Distributions

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Chapter 2: The Normal Distribution

Working with Microsoft Excel. Touring Excel. Selecting Data. Presented by: Brian Pearson

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

6-1 THE STANDARD NORMAL DISTRIBUTION

15 Wyner Statistics Fall 2013

CHAPTER 2: SAMPLING AND DATA

Quantitative - One Population

Software Reference Sheet: Inserting and Organizing Data in a Spreadsheet

Week 7: The normal distribution and sample means

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Data Presentation. Figure 1. Hand drawn data sheet

Statistics Lecture 6. Looking at data one variable

Normal Distribution. 6.4 Applications of Normal Distribution

Numerical Descriptive Measures

Chapter 6: DESCRIPTIVE STATISTICS

Data analysis using Microsoft Excel

Sections 4.3 and 4.4

MATH11400 Statistics Homepage

Starting Excel application

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Continuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit.

Descriptive and Graphical Analysis of the Data

9 POINTS TO A GOOD LINE GRAPH

Measures of Central Tendency

Introduction to the workbook and spreadsheet

Common Core Vocabulary and Representations

2.1: Frequency Distributions and Their Graphs

Course of study- Algebra Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by

1. What specialist uses information obtained from bones to help police solve crimes?

COMPUTING AND DATA ANALYSIS WITH EXCEL. Numerical integration techniques

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Transcription:

Statistics I Practice 2 Notes Probability and probabilistic models; Introduction of the statistical inference 1. Simulation of random variables In Excel we can simulate values from random variables (discrete or continuous). The simulation tool is in the data analysis complement that we have already installed in the first computer class. The steps for simulating values of random variables are similar for all types of variables. First, we open Excel and select Datos in the Menu above, where we can look for the Análisis de datos complement: Now, we can look for a function called Generación de números aleatorios. Once this is selected, a new window opens: Número de variables: the number of variables that we want to simulate. Usually 1. Cantidad de números aleatorios: the sample size. Distribución: the distribution of our variable: either discrete (Bernoulli, Binomial) or continuous (Uniforme, Normal). Parámetros: the parameters of the distribution. Iniciar con: left unfilled. 1

Opciones de salida: this command is useful for selecting the range of the output in the actual sheet or in a new sheet. Moreover, we can give it a name that may depends of the distribution that we are using. 1.1. Discrete random variables: Bernoulli and Binomial 1.1.1. First, we simulate from a sample of n = 50 observations of the Bernoulli distribution. We open the simulation window as we have seen before, fill the following fields and click on Aceptar: In column A we get a simple random sample of a Bernoulli distribution with parameter p = 0.4. We know that and, then and. We compute the sample mean and variance using the Excel functions PROMEDIO and VAR, and compare the sample quantities with their population counterparts: Important: each student will have different results because the simulated values are random. 1.1.2. Following the same steps, we simulate a sample of size n = 100 from a Binomial distribution:. 2

We compute the population mean and variance and compare with the sample mean and variance: 1.2. Continuous random variables: Normal We want to generate a sample of size n = 20 from a Normal:, where and. We follow the same steps as it was explained before, and compute the sample mean and standard deviation: Are the sample parameters close to the population parameters? What would happen if, instead of n = 20, we take n = 1000? 2. Point estimation and adjustment 2.1. Quantile-quantile Plot (QQ - plot) for a Normal distribution We use the same data that we have generated from a Normal. First, we insert an additional row at the top with the names of the columns. After that, we select all the data and sort them from lowest to biggest through Datos in the above Menu and obtain the following: 3

The next step is to compute the sample quantiles. For that purpose, it is necessary to assign first the range of each observation. Put on the cell B2 and write 1, which means that the number in A2 is the first observation. In B3, we introduce the formula =B2+1 and copy the formula till the end of the column. Finally, we compute the sample quantiles in the third column. Put in cell C2 and introduce the formula =(B2-0.5)/20 (remain that 20 is the sample size). Copy this formula till the end of the column. To check if the sample quantiles have been obtained properly, we can compute the median that should be at position (20+1)/2=10.5, between 10 and 11. As we can see, the Q50% appears just between the positions 10 and 11. Finally, we have to compute the values of the estimated Normal distribution, associated with each quantile:, where and are the sample mean and standard deviation. Before that, we compute the z-scores, which are the values of the standard Normal distribution associated with each quantile. Put on cell D2 and introduce the following Excel function =DISTR.NORM.ESTAND.INV(C2), and copy this formula till the end of the column. To convert these z-scores in the associated values with the original sample, it is necessary to perform the inverse operation, i.e. the inverse standarization: multiply each score with the sample standard deviation and add the estimated mean of X (called x-scores): 4

Now, we have all the information needed to graph the QQ-plot. Before that, it is necessary to copy the column A with the original data at the right of column E of x-scores, because Excel can now recognize which data are on axis x and which data are on axis y. Now, we select the two columns and click on Insertar in the above Menu. Then click on Dispersión in the above Menu where we select the type of plot that we want (only points): To change the size and style of the points, it is necessary to put on one point, right click on the mouse and select Dar formato a serie de datos, Opciones de marcador. If the data have been generated from the considered distribution, then the points in the plot should be along a straight line. To plot this line, we copy in column G the x-scores, select the three columns and repeat: Insertar, Dispersión Then, Excel plots the straight line (be careful when copying and pasting the x-scores because there are formulas copied. Then, right click on the mouse and select Pegado Especial and then select Sólo Valores). 5

When the following plot appears, we change the style of the points of the x-scores to convert them in a straight line: put the mouse on a point, right click on the mouse and select Dar formato a series de datos, Opciones de marcador: ninguno, Color de línea: Línea Sólida. Finally, we obtain the following plot: As we can see, the points of the plot are along the straight line. This means that the distribution fits well the data. 2.2. Graphical fitting: histograms with area of 1 (on a density scale) and density curves We use the same data that we have generated from a Normal. For this example, we are interested in generating again 20 observations. In order to create the histogram with area of 1 (on a density scale), we need to use the following information as explained in Lab 1,: Number of observations: 20 Minimum value: -3,470255928 approximate -3,4 Maximum value: 3,70535465 approximate 3,8 Range: 7,2 Number of classes: 20^(1/2)= 4,472135955 approximate 4 or 5 classes. The steps would be the following: 1.- Imagine that we are going to use 5 classes. Following the steps explained in laboratory 1, the length of the intervals (range / number of classes = 1.44) and the upper limits of the classes starting with the minimum value are established and then adding the amplitude to the previous limit. 2.- Once the upper limits of the classes are obtained, we create the histogram by selecting Análisis de datos in Datos; Histograma and click on Aceptar. So, we obtain the absolute frequency of each interval. 6

3.- The relative frequencies associated with each interval (relative frequency -fi- = absolute frequency / n) are calculated. 4.- To create a histogram with area of 1 (on a density scale), it is necessary to divide the relative frequencies by the amplitude of the intervals (fi / ai) obtaining the height of the bars. So, the histogram with area of 1 (on a density scale) is plotted changing the data of the column of absolute frequencies by the heights. We also remove the space between bars. 7

5.- Once the histogram with area of 1 is obtained, the normal density curve can be added. In order to perform the graph of the N(, ), the values of the axis OX are obtained as the center point between upper and lower limits of the intervals. 6.- We calculate and add the value of the normal density in the histogram as the density curve. It is necessary to calculate the mean and standard deviation of the simulated values. We can use, for example, the PROMEDIO and DESVEST statistics functions. The density would be calculated using DISTR.NORM function. DISTR.NORM( punto central ;PROMEDIO(A$2:A$21);DESVEST(A$2:A$21);0) In order to add the density curve to the histogram with area of 1 (on a density scale), you have to position the graph, right button, Seleccionar datos, Agregar, nombre de la serie (for example, curva) and valores de la serie (we select the density values). So, the bars corresponding to the densities are added in another color. In order to be drawn as a curve, you must change chart type into lines by selecting a line type without points (Cambiar tipo de gráfico, Líneas). 8

3. Confidence Intervals In order to calculate a confidence interval we can use statistical function INTERVALO.CONFIANZA INTERVALO.CONFIANZA Returns the confidence interval for the mean μ of a population distributed as a normal distribution. Alfa: significance level used to calculate the confidence level. The confidence level is equal to 100 * (1 - alpha)%, ie, an alpha of 0.05 indicates a 95% confidence level. Desv_estándar: standard deviation of the population. It is assumed that it is known. Tamaño: sample size. The confidence interval for the population mean, given the level of significance, is calculated by adding (and subtracting) to the sample mean the value calculated with this formula thus obtaining the upper limit and the lower limit of the interval. 9

Example In order to estimate the average grade of a given subject in a University, a sample of 35 marks of students has been obtained. It is known from other courses that the grade of this subject follow a Normal distribution, N(, ). The standard deviation of the grades is 2.41 points. Considering that the average score obtained in the sample has been of 5,02, find: a) A 90% confidence interval for the mean based on the sample INTERVALO.CONFIANZA(0.1;2,41;35) = 0,67005473 So, confidence interval will be: 5,02 0,67005473 ; 5,02 + 0,67005473 (4,34994527; 5,69005473) b) A 95% confidence interval for the mean based on the sample INTERVALO.CONFIANZA(0.05;2,41;35) = 0,67005473 So, confidence interval: 5,02 0,787905522 ; 5,02 + 0,787905522 (4,232094478; 5,807905522) 10

4. Exercises (give to the professor at the end of the class with the answers written in the last page) 4.1. Simulate a random variable of size n = 150 from the Uniform distribution X U(3,12), compute the sample mean, variance and standard deviation and their sample counterparts and write the results in Table 1. 4.2. Simulate a random variable of size n = 50 from the Normal X N(4,2) a. Compute the sample mean, variance and standard deviation and their sample counterparts and write the results in Table 2. b. Draw the QQ plot of this approximation and explain the results. c. Draw the corresponding histogram with area of 1 and density curve. d. Find a 98% confidence interval considering a random sample size = 250. 11

Answers to part 4. Name: NIU: Degree: Group Table 1. Results for n = 150, X U(3,12) X Sample Population Mean Variance Standard deviation Table 2. Results for n = 50, X N(4,2) X Sample Population Mean Variance Standard deviation Explain the results from the QQ plot: A 98% confidence interval (CI) considering a random sample size = 250 Fill in the statistical function and the results: INTERVALO.CONFIANZA( ; ; ; ) = So, CI will be (, ). 12