LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Size: px
Start display at page:

Download "LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT"

Transcription

1 NAVAL POSTGRADUATE SCHOOL LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT Statistics (OA3102)

2 Lab #2: Sampling, Sampling Distributions, and the Central Limit Theorem Goal: Use R to demonstrate sampling methods, sampling distributions, and the Central Limit Theorem (CLT). Lab type: Interactive lab demonstration followed by hands-on exercises. Time allotted: Lecture for ~50 minutes followed by ~50 minutes of group exercises. R libraries: Just the base package. Data: mk48.down.csv 1. Keeping a Log of Your R Session R HINT OF THE WEEK a. While using a script can help you keep a record of the commands your run or the functions you've written, it does not keep track of the details of what happened during a particular session. b. In Windows-based PCs running the R Console, the easiest way to save what you've done is to use the "Save to file " option under the File pull down menu in the R Console. This will save whatever is in your console window to a text file. This is usually sufficient for most sessions, but note that for long sessions or if you print out large datasets, earlier material may have exceeded the R Console's window and be lost. c. On Macs running the R Console, use the "Save As " option under the File pull down menu. DEMONSTRATION 2. Now, before we begin, let's talk a bit about lists, data frames, and matrices in R. a. The most general R data type is the list. A list can hold objects of any type: numeric, character, matrices, functions, even other lists. Furthermore, the objects don t even have to be the of the same mode or the same length. b. Here s one example of a list: result <- list(letters[1:5],1:10,letters[1:3],"by Ron Fricker") c. What things are in result? You can type result to see. Note that each of the elements are of different lengths i. You can extract items from a list with the square brackets; this gives you another list. For example, what do you get with result[1]? How do you know it s a list? Remember the class function from last class. Also, note Revision: January

3 that it s printed out on two lines, the first one containing [[1]]. (If the elements of the list were named, then the name would have printed on this line.) ii. Usually instead of this sub-list you want the contents of the sub-list; for that you can use double square brackets. Try result[[1]] and see what you get. iii. You can check to see what mode (i.e., type) each of the elements is with the mode() function. For example, try: mode(result[[1]]) mode(result[[2]]) What's the difference between mode and class? The mode function tells you the storage mode of an object (e.g., "numeric," "character," "list," etc.) while class is an object attribute (e.g., "matrix," "array," "data frame," etc.). Sometimes they're the same, but often different. iv. If you want to assign names to the elements in the list: names(result) <- c("caps", "Ints", "Sm.Ltrs", "Author") v. Now what do you get when you type result? vi. With names you can use the dollar sign syntax to access contents of a sublist, as in result$caps. This is equivalent to result[[1]]. d. Data frames are the most frequently used R objects for storing data, at least for statistical analysis, and they are essentially a special kind of list. i. Data frames have a specific structure: Columns are variables and rows are observations. In addition, a data frame must be rectangular, so all columns must be of the same length and all rows must be of the same length. ii. What is useful about data frames from an analysis point of view is that the columns can be of different types (numeric, character, factor, logical, etc.) iii. The function data.frame() can be used to create a data frame. In a data frame the variables generally are named. Here s a simple data frame: First.name <- c("joe", "Peggy", "Harry", "Joan") Last.name <- c("sixpack", "Sue", "Henderson", "Jett") Age <- c(32, 27, 38, 35) Active.duty.ind <- c(1,1,1,0) At.NPS <- c(true, FALSE, TRUE, TRUE) Fake.data <- data.frame(first.name, Last.name, Age, Active.duty.ind, At.NPS) iv. Now, you can list the names of all the variables in the data frame, as you've done with other data frames, with names(fake.data). v. And, as we have been doing in class, you can use the name of a variable to call it. For example: Fake.data$First.name. Revision: January

4 3. Matrices most often come into play in R when you want to do numerical calculations. a. For a matrix in R, every element of a matrix has to be of the same mode (for example, they must all be numeric, or all character, or all logical). Most commonly, matrices are numeric and used in computations. For example, two matrices A and x can be multiplied using the notation A%*%x. b. The function for creating a matrix is, not surprisingly, matrix(). See the help for the arguments, particularly nrow and ncol which define the size of the matrix and byrow which defines how the data is read into the matrix. To illustrate, consider the following example and, before running it in R, guess what the matrix will look like: Fake.matrix <- matrix(1:20, nrow=5, ncol=4, byrow=true) 4. Illustrating the Central Limit Theorem (CLT). a. The CLT says that sums of iid random variables have an approximately normal distribution. The greater the number of r.v.s summed, the better the approximation. What this means, in particular, is that sample means have an approximate normal distribution. Let s illustrate this with a couple of simulation examples, similar in spirit to the applet we looked at in class. b. To begin, let s look at a random variable that s uniformly distributed on the unit interval [0,1]: X~U[0,1]. It s easy to generate such r.v.s in R using the runif() function, which we will use to create a matrix of 300,000 random draws from a U[0,1] distribution: Xmatrix <- matrix(runif(10000*30),nrow=10000) This syntax creates a matrix with 10,000 rows and 30 columns, each entry of which is a random draw from a U[0,1] distribution. Check that the range of variables looks reasonable with summary(xmatrix) and hist(xmatrix) c. Now, let s look at how by increasing the sample size the sample mean becomes more and more normally distributed. The syntax below creates four new vectors which are the row means of the first 2, 5, 10 and all 30 columns of the matrix. um2<-rowmeans(xmatrix[,1:2]) um5<-rowmeans(xmatrix[,1:5]) um10<-rowmeans(xmatrix[,1:10]) um30<-rowmeans(xmatrix) So, they re each vectors that are 10,000 observations long, and we would expect um2 to be the least normally distributed and um30 to be quite close to normal. Check this with: par(mfrow=c(2,2)) qqnorm(um2); qqnorm(um5); qqnorm(um10); qqnorm(um30) Revision: January

5 It doesn t take long for the CLT to kick in, does it? Even for this example, in which the population distribution is very non-normally distributed, the normal probability plots start to look pretty straight with means of just five observations. 5. Picturing a (Non-Normal) Sampling Distribution: A Real-Data Example. a. Read in the file named mk48.down.csv. mk48.down <- read.csv(file.choose()) Of course, once you type the above, you need to then find the CSV file on your computer and click on it via the dialog box that pops up. You now have the data in a vector called mk48.down$down.days whose 9,505 entries give the number of down days for the Marines MK-48 Logistic Vehicle System (LVS). How did I know that the name of the vector is down.days? Remember: names(mk48.down) b. What sort of distribution does it look like the data came from? hist(mk48.down$down.days,prob=true) Answer: it looks a lot like an exponential distribution. c. What is a good estimate of the exponential distribution s parameter? Answer: the method of moments and maximum likelihood techniques (both of which we will learn about in upcoming lectures) give ˆ 1/ X. This is called a point estimate, where we are using the data to estimate the parameter. In this case we have , which we find by executing the following command: 1/mean(mk48.down$down.days) How good is the fit? Let's overlay the parametric distribution exp(0.0158) with the density histogram: hist(mk48.down$down.days,prob=true) curve(dexp(x,0.0158),lwd=2,col="red",add=true) Looks pretty good, eh? However, note that plotting a histogram and a probability density curve like this is not the best way to make such a comparison. Your eye is simply not calibrated finely enough to see more than obvious, gross differences. A better approach is to use the qqplot fucntion, plotting the data versus some random observations from an exp(0.0158) distribution. This is called a quantilequantile (or Q-Q for short) plot, which is very similar to a normal probability plot (the qqnorm function in R), but instead of comparing the data to the theoretical quantiles of a normal distribution, we now compare two data sets against each other. As with the normal probability plot, if they come from the same distribution, the points on the qqplot should fall close to a straight line. Here're the R commands, where we first generate 9,505 observations from an exp(0.0158) distribution. Then we plot them versus the actual data. Finally, we overlay a straight line to help us visually see what's going on. Revision: January

6 rand.exps <- rexp(9505,0.0158) qqplot(mk48.down$down.days, rand.exps) abline(lm(y~x,data=qqplot(mk48.down$down.days,rand.exps))) Here we see that there are some down days observations that are a lot larger than would be expected if the data did come from an exp(0.0158) distribution. But if we focus in on the majority of observations less than 400 days, it doesn't look too bad: qqplot(mk48.down$down.days[mk48.down$down.days<=400], rexp(400,0.0158)) And almost all of the observations are have less than 400 days down: table(mk48.down$down.days<=400) FALSE TRUE So, we'll assume we know the population comes from an exp(0.0158) distribution. d. By the CLT, we know X has a normal distribution, but 1/ X does not. What does the distribution of 1/ X from an exp(0.0158) distribution look like? Well, that depends on the sample size since n is involved in the calculation of X. For this demonstration, let s imagine we re interested in a sample of size n=10 and we want to know what the sampling distribution for ˆ looks like. One way to do this is to use simulation, generating lots of samples of size 10 from an exp(0.0158) distribution and then plotting them to get a picture of the sampling distribution: ee <- matrix (rexp (10000 *10, rate=0.0158), nrow=10000) ee.m10 <- rowmeans (ee) hist (1/ee.m10) Here we can clearly see a skewed distribution, which validates our original assertion that the distribution of 1/ X is not normal. e. To be a bit fancier, we can overlay a normal density curve to better show the skew: hist (1/ee.m10,prob=TRUE,xlim=c(-0.1,0.1)) curve(dnorm(x,mean(1/ee.m10),sd(1/ee.m10)),lwd=2,col="red",add=true) And, if we want to be a bit more formal we can use a normal probability plot: qqnorm(1/ee.m10) qqline(1/ee.m10) Thus, what we see here is that not all sampling distributions are normal. Remember, a sampling distribution is just the probability distribution of a statistic, and often they are not normally distributed. 6. Another Approach: Using Sampling to Construct an Empirical Estimate of the Sampling Distribution of 1/ X. Revision: January

7 a. In the previous section, we approximated the population distribution of LVS down days with an exponential distribution. That's what would be referred to as a "parametric" approach to estimating the sampling distribution. It's parametric because we chose a particular family of distributions (exponential) and then fit a particular distribution from this family by estimating the parameter of the exponential distribution from the data. b. An alternative approach, which is "nonparametric," is to use the data itself to estimate the sampling distribution. We will do this by sampling directly from the data, for which the sample() function will be very helpful. The idea is that we will repeatedly randomly sample 10 observations from the data, calculate the inverse of the means of each sample, and plot on a histogram. c. To begin, we calculate a vector called resamples that contains 10,000 inverse means of 10 observations. Each sample of 10 is drawn without replacement from the 9,505 observations, but any particular observation in the data can show up in more than one sample of 10. resamples <- vector(length = 10000) for(i in 1:10000){ resamples[i] <- 1/mean(sample(mk48.down$down.days,size=10)) } d. Now, let s plot a histogram of these 10,000 resamples and compare it to the histogram that used the random exponentials in the last example: par(mfrow=c(1,2)) hist(1/ee.m10) hist(resamples) They look pretty close, eh? But as we previously discussed, it s actually pretty hard for the human eye to distinguish differences between two or more histograms this way, so let s compare with a Q-Q plot: qqplot(resamples, 1/ee.m10) abline(lm(y~x,data=qqplot(resamples, 1/ee.m10))) Not a bad fit! It looks like there is just a bit of deviation in the right tails of the distributions, but overall pretty darn good. So, it looks like we basically get the same sampling distribution estimate whether we use a parametric or nonparametric approach. Revision: January

8 GROUP # EXERCISES Members:,,, 1. Illustrate the CLT on sample totals using some very non-normal data. In particular, draw random samples from a gamma distribution (see the rgamma() function) with parameter shape = 2. First, using a normal probability plot, demonstrate that the data is not normal. a. Now, vary the sample size (n) for the sample total from very small (say 2) to quite large (you choose). i. The apply() and sum() functions will likely be useful if you first create a matrix of gamma distributed data, as in the earlier demonstration. ii. Be sure to simulate enough samples that you get relatively smooth histograms and/or normal probability plots. b. As your output, create a single chart with a sequence of plots showing how the normal approximation gets better and better as n gets large that is, as the CLT "kicks in." i. The par(mfrow=c(a,b)) command will be useful for putting multiple plots on one chart, where a is the number rows and b is the number of columns in the matrix of figures. 2. The coefficient of variation (CV) is a normalized measure of the dispersion of a probability distribution. It is defined as the ratio of the standard deviation to the mean : CV= Empirically estimate the CV sampling distribution for the MK-48 LVS data for various sample sizes as follows. a. Resample 10,000 times from the data samples of size n. b. For each resample, estimate the CV as the ratio of the sample standard deviation to the sample mean: s/ x. Note that the sample mean and standard deviation are calculated on the same sample of data. c. As your output, create a single chart with a sequence of plots showing what happens as n gets bigger. Does the CLT "kick in" for the CV? Revision: January

9 Name: INDIVIDUAL EXERCISES 1. Repeat the demonstration of the CLT in the lab on a discrete uniform distribution. In particular, consider the uniform distribution on the integers from 1 to 6, which would simulate a fair die. (a) You can easily generate 300,000 observations from such a random variable in R using the runif() function combined with the ceiling() function: ceiling(runif(10000*30,0,6)) Here, runif(10000*30,0,6) generates 300,000 random observations between 0 and 6 and the ceiling() function rounds each observation up to the next higher integer, thereby simulating 300,000 rolls of a fair die. (b) So, using the matrix() function, repeat the illustration of the CLT in item 3, but using the discrete uniform distribution just specified. Turn in a sequence of quantile-quantile plots showing the progression of the sample mean towards normality for increasing sample sizes For X ~ (2,1), the mean is X EX ( ) 2 and the variance is Var( X ) X 2. For the various sample sizes, empirically demonstrate that for the total of n iid observations, T0 X1 X 2 X n, it follows that E( T0 ) n X 2n and 2 Var( T ) n 2n. 0 X What do I mean by "empirically demonstrate" here? I mean that you should simulate some data and from it some totals, then estimate the theoretical quantities using an appropriate statistic on the totals, and then show that the estimates get closer and closer to the theoretical quantities as the number of totals is increased. For example, choose an n, say n=5. Now, using simulation, generate m totals, which are each the sum of five gamma random variables: T X X X X X. 0, i 1, i 2, i 3, i 4, i 5, i m 1 Now, estimate ET ( 0) with E( T0 ) T0 T0, i and show that, as you let m get large, E( T0 ) 10 2n. m i 0 Revision: January

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

CHAPTER 6. The Normal Probability Distribution

CHAPTER 6. The Normal Probability Distribution The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Week 7: The normal distribution and sample means

Week 7: The normal distribution and sample means Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

Introduction to scientific programming in R

Introduction to scientific programming in R Introduction to scientific programming in R John M. Drake & Pejman Rohani 1 Introduction This course will use the R language programming environment for computer modeling. The purpose of this exercise

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

Lab 4: Distributions of random variables

Lab 4: Distributions of random variables Lab 4: Distributions of random variables In this lab we ll investigate the probability distribution that is most central to statistics: the normal distribution If we are confident that our data are nearly

More information

Lab 3: Sampling Distributions

Lab 3: Sampling Distributions Lab 3: Sampling Distributions Sampling from Ames, Iowa In this lab, we will investigate the ways in which the estimates that we make based on a random sample of data can inform us about what the population

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

CS CS 5623 Simulation Techniques

CS CS 5623 Simulation Techniques CS 4633 - CS 5623 Simulation Techniques How to model data using matlab Instructor Dr. Turgay Korkmaz This tutorial along with matlab s statistical toolbox manual will be useful for HW 5. Data collection

More information

MITOCW watch?v=r6-lqbquci0

MITOCW watch?v=r6-lqbquci0 MITOCW watch?v=r6-lqbquci0 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

1 Matrices and Vectors and Lists

1 Matrices and Vectors and Lists University of Wollongong School of Mathematics and Applied Statistics STAT231 Probability and Random Variables 2014 Second Lab - Week 4 If you can t finish the log-book questions in lab, proceed at home.

More information

STAT 135 Lab 1 Solutions

STAT 135 Lab 1 Solutions STAT 135 Lab 1 Solutions January 26, 2015 Introduction To complete this lab, you will need to have access to R and RStudio. If you have not already done so, you can download R from http://cran.cnr.berkeley.edu/,

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015 MAT 142 College Mathematics Statistics Module ST Terri Miller revised July 14, 2015 2 Statistics Data Organization and Visualization Basic Terms. A population is the set of all objects under study, a sample

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

(1) Generate 1000 samples of length 1000 drawn from the uniform distribution on the interval [0, 1].

(1) Generate 1000 samples of length 1000 drawn from the uniform distribution on the interval [0, 1]. PRACTICAL EXAMPLES: SET 1 (1) Generate 1000 samples of length 1000 drawn from the uniform distribution on the interval [0, 1]. >> T=rand(1000,1000); The command above generates a matrix, whose columns

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

Topic 5 - Joint distributions and the CLT

Topic 5 - Joint distributions and the CLT Topic 5 - Joint distributions and the CLT Joint distributions Calculation of probabilities, mean and variance Expectations of functions based on joint distributions Central Limit Theorem Sampling distributions

More information

CREATING THE DISTRIBUTION ANALYSIS

CREATING THE DISTRIBUTION ANALYSIS Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184

More information

Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3

Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3 Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3 This tutorial shows you: how to simulate a random process how to plot the distribution of a variable how to assess the distribution

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Chapter 1. Math review. 1.1 Some sets

Chapter 1. Math review. 1.1 Some sets Chapter 1 Math review This book assumes that you understood precalculus when you took it. So you used to know how to do things like factoring polynomials, solving high school geometry problems, using trigonometric

More information

1 Pencil and Paper stuff

1 Pencil and Paper stuff Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman

More information

Week 4: Describing data and estimation

Week 4: Describing data and estimation Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Specific Objectives Students will understand that that the family of equation corresponds with the shape of the graph. Students will be able to create a graph of an equation by plotting points. In lesson

More information

Team Prob. Team Prob

Team Prob. Team Prob 1 Introduction In this module, we will be simulating the draft lottery used by the National Basketball Association (NBA). Each year, the worst 14 teams are entered into a drawing to determine who will

More information

STAT 540: R: Sections Arithmetic in R. Will perform these on vectors, matrices, arrays as well as on ordinary numbers

STAT 540: R: Sections Arithmetic in R. Will perform these on vectors, matrices, arrays as well as on ordinary numbers Arithmetic in R R can be viewed as a very fancy calculator Can perform the ordinary mathematical operations: + - * / ˆ Will perform these on vectors, matrices, arrays as well as on ordinary numbers With

More information

R practice. Eric Gilleland. 20th May 2015

R practice. Eric Gilleland. 20th May 2015 R practice Eric Gilleland 20th May 2015 1 Preliminaries 1. The data set RedRiverPortRoyalTN.dat can be obtained from http://www.ral.ucar.edu/staff/ericg. Read these data into R using the read.table function

More information

Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals.

Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals. Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals. In this Computer Class we are going to use Statgraphics

More information

Behavior of the sample mean. varx i = σ 2

Behavior of the sample mean. varx i = σ 2 Behavior of the sample mean We observe n independent and identically distributed (iid) draws from a random variable X. Denote the observed values by X 1, X 2,..., X n. Assume the X i come from a population

More information

Computational statistics Jamie Griffin. Semester B 2018 Lecture 1

Computational statistics Jamie Griffin. Semester B 2018 Lecture 1 Computational statistics Jamie Griffin Semester B 2018 Lecture 1 Course overview This course is not: Statistical computing Programming This course is: Computational statistics Statistical methods that

More information

Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots

Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots STAT 350 (Spring 2015) Lab 3: SAS Solutions 1 Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots Note: The data sets are not included in the solutions;

More information

Lecture 3 - Object-oriented programming and statistical programming examples

Lecture 3 - Object-oriented programming and statistical programming examples Lecture 3 - Object-oriented programming and statistical programming examples Björn Andersson (w/ Ronnie Pingel) Department of Statistics, Uppsala University February 1, 2013 Table of Contents 1 Some notes

More information

LAB #6: DATA HANDING AND MANIPULATION

LAB #6: DATA HANDING AND MANIPULATION NAVAL POSTGRADUATE SCHOOL LAB #6: DATA HANDING AND MANIPULATION Statistics (OA3102) Lab #6: Data Handling and Manipulation Goal: Introduce students to various R commands for handling and manipulating data,

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org

More information

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012 An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Central Limit Theorem Sample Means

Central Limit Theorem Sample Means Date Central Limit Theorem Sample Means Group Member Names: Part One Review of Types of Distributions Consider the three graphs below. Match the histograms with the distribution description. Write the

More information

Chapter 6. THE NORMAL DISTRIBUTION

Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

Lab 5 - Risk Analysis, Robustness, and Power

Lab 5 - Risk Analysis, Robustness, and Power Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors

More information

EE 301 Signals & Systems I MATLAB Tutorial with Questions

EE 301 Signals & Systems I MATLAB Tutorial with Questions EE 301 Signals & Systems I MATLAB Tutorial with Questions Under the content of the course EE-301, this semester, some MATLAB questions will be assigned in addition to the usual theoretical questions. This

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

Your Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread.

Your Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread. Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #3 Interpreting the Standard Deviation and Exploring Transformations Objectives: 1. To review stem-and-leaf plots and their

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of

More information

UNIT 15 GRAPHICAL PRESENTATION OF DATA-I

UNIT 15 GRAPHICAL PRESENTATION OF DATA-I UNIT 15 GRAPHICAL PRESENTATION OF DATA-I Graphical Presentation of Data-I Structure 15.1 Introduction Objectives 15.2 Graphical Presentation 15.3 Types of Graphs Histogram Frequency Polygon Frequency Curve

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM 1 Instructions Labs may be done in groups of 2 or 3 (i.e., not alone). You may use any programming language you wish but MATLAB is highly suggested.

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Command Line and Python Introduction Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Today Assignment #1! Computer architecture Basic command line skills Python fundamentals

More information

Basic matrix math in R

Basic matrix math in R 1 Basic matrix math in R This chapter reviews the basic matrix math operations that you will need to understand the course material and how to do these operations in R. 1.1 Creating matrices in R Create

More information

What s Normal Anyway?

What s Normal Anyway? Name Class Problem 1 A Binomial Experiment 1. When rolling a die, what is the theoretical probability of rolling a 3? 2. When a die is rolled 100 times, how many times do you expect that a 3 will be rolled?

More information

Probability Models.S4 Simulating Random Variables

Probability Models.S4 Simulating Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Probability Models.S4 Simulating Random Variables In the fashion of the last several sections, we will often create probability

More information

Figure 1. Figure 2. The BOOTSTRAP

Figure 1. Figure 2. The BOOTSTRAP The BOOTSTRAP Normal Errors The definition of error of a fitted variable from the variance-covariance method relies on one assumption- that the source of the error is such that the noise measured has a

More information

Ms Nurazrin Jupri. Frequency Distributions

Ms Nurazrin Jupri. Frequency Distributions Frequency Distributions Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get a general overview of the results.

More information

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties. Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square

More information

Practical 2: Using Minitab (not assessed, for practice only!)

Practical 2: Using Minitab (not assessed, for practice only!) Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

MATLAB Modul 4. Introduction

MATLAB Modul 4. Introduction MATLAB Modul 4 Introduction to Computational Science: Modeling and Simulation for the Sciences, 2 nd Edition Angela B. Shiflet and George W. Shiflet Wofford College 2014 by Princeton University Press Introduction

More information

Data organization. So what kind of data did we collect?

Data organization. So what kind of data did we collect? Data organization Suppose we go out and collect some data. What do we do with it? First we need to figure out what kind of data we have. To illustrate, let s do a simple experiment and collect the height

More information

Statistics I Practice 2 Notes Probability and probabilistic models; Introduction of the statistical inference

Statistics I Practice 2 Notes Probability and probabilistic models; Introduction of the statistical inference Statistics I Practice 2 Notes Probability and probabilistic models; Introduction of the statistical inference 1. Simulation of random variables In Excel we can simulate values from random variables (discrete

More information

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N Math 3: CSM Tutorial: Probability, Statistics, and Navels Fall 2 In this worksheet, we look at navel ratios, means, standard deviations, relative frequency density histograms, and probability density functions.

More information

Geology Geomath Estimating the coefficients of various Mathematical relationships in Geology

Geology Geomath Estimating the coefficients of various Mathematical relationships in Geology Geology 351 - Geomath Estimating the coefficients of various Mathematical relationships in Geology Throughout the semester you ve encountered a variety of mathematical relationships between various geologic

More information

Chapter 6. THE NORMAL DISTRIBUTION

Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

Applied Calculus. Lab 1: An Introduction to R

Applied Calculus. Lab 1: An Introduction to R 1 Math 131/135/194, Fall 2004 Applied Calculus Profs. Kaplan & Flath Macalester College Lab 1: An Introduction to R Goal of this lab To begin to see how to use R. What is R? R is a computer package for

More information

Organizing and Summarizing Data

Organizing and Summarizing Data 1 Organizing and Summarizing Data Key Definitions Frequency Distribution: This lists each category of data and how often they occur. : The percent of observations within the one of the categories. This

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

Sec 6.3. Bluman, Chapter 6 1

Sec 6.3. Bluman, Chapter 6 1 Sec 6.3 Bluman, Chapter 6 1 Bluman, Chapter 6 2 Review: Find the z values; the graph is symmetrical. z = ±1. 96 z 0 z the total area of the shaded regions=5% Bluman, Chapter 6 3 Review: Find the z values;

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Introduction to Geospatial Analysis

Introduction to Geospatial Analysis Introduction to Geospatial Analysis Introduction to Geospatial Analysis 1 Descriptive Statistics Descriptive statistics. 2 What and Why? Descriptive Statistics Quantitative description of data Why? Allow

More information

Here is the probability distribution of the errors (i.e. the magnitude of the errorbars):

Here is the probability distribution of the errors (i.e. the magnitude of the errorbars): The BOOTSTRAP Normal Errors The definition of error of a fitted variable from the variance-covariance method relies on one assumption- that the source of the error is such that the noise measured has a

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Solution to Tumor growth in mice

Solution to Tumor growth in mice Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Slide 1 CS 170 Java Programming 1 Multidimensional Arrays Duration: 00:00:39 Advance mode: Auto

Slide 1 CS 170 Java Programming 1 Multidimensional Arrays Duration: 00:00:39 Advance mode: Auto CS 170 Java Programming 1 Working with Rows and Columns Slide 1 CS 170 Java Programming 1 Duration: 00:00:39 Create a multidimensional array with multiple brackets int[ ] d1 = new int[5]; int[ ][ ] d2;

More information

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments; A. Goals of Exercise Biology 164 Laboratory Using Comparative Statistics in Biology "Statistics" is a mathematical tool for analyzing and making generalizations about a population from a number of individual

More information

IN-CLASS EXERCISE: INTRODUCTION TO R

IN-CLASS EXERCISE: INTRODUCTION TO R NAVAL POSTGRADUATE SCHOOL IN-CLASS EXERCISE: INTRODUCTION TO R Survey Research Methods Short Course Marine Corps Combat Development Command Quantico, Virginia May 2013 In-class Exercise: Introduction to

More information

Intro To Excel Spreadsheet for use in Introductory Sciences

Intro To Excel Spreadsheet for use in Introductory Sciences INTRO TO EXCEL SPREADSHEET (World Population) Objectives: Become familiar with the Excel spreadsheet environment. (Parts 1-5) Learn to create and save a worksheet. (Part 1) Perform simple calculations,

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms

More information

[1] CURVE FITTING WITH EXCEL

[1] CURVE FITTING WITH EXCEL 1 Lecture 04 February 9, 2010 Tuesday Today is our third Excel lecture. Our two central themes are: (1) curve-fitting, and (2) linear algebra (matrices). We will have a 4 th lecture on Excel to further

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

The Normal Distribution & z-scores

The Normal Distribution & z-scores & z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information