Notes on Simulations in SAS Studio

Size: px
Start display at page:

Download "Notes on Simulations in SAS Studio"

Transcription

1 Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write to RESULTS tab (where results are normally displayed). If you are doing many t-tests, for example, then it takes a fair bit of memory and is likely to have trouble. It helps enormously to do something like this ods output TTests=pvalues; ods select TTests; proc ttest data=sim; by iter n; value x; run; The ODS SELECT statement reduces the output and increases the speed and number of iterations you can do. In SAS Studio, it can make the difference between your code working and not working. SAS Programming November 13, / 63

2 Power: comparing methods Here s an example from an empirical paper, SAS Programming November 13, / 63

3 Power: comparing methods SAS Programming November 13, / 63

4 Speed: comparing methods For large analyses, speed and/or memory might be an issue for choosing between methods and/or algorithms. This paper compared using different methods within SAS based on speed for doing permutation tests. SAS Programming November 13, / 63

5 Use of macros for simulations The author of the previous paper provides an appendix with lengthy macros to use as more efficient substitutes to use as replacements for SAS procedures such as PROC NPAR1WAY and PROC MULTTEST, which from his data could crash or not terminate in a reasonable time. In addition to developing your own macros, a common use of macros is to use macros written by someone else that have not been incorporated into the SAS language. You might just copy and paste the macro into your code, possibly with some modification, and you can use the macro even if you cannot understand it. Popular macros might eventually get replaced by new PROCs or new functionality within SAS. This is sort of the SAS alternative to user-defined packages in R. SAS Programming November 13, / 63

6 From Macro to PROC An example of an evolution from macros to PROCS is for bootstrapping. For several years, to perform bootstrapping, SAS users relied on macros often written by others to do the bootstrapping. In bootstrapping, you sample you data (or the rows of your data set) with replacement and get a new dataset with the same sample size but some of the values repeated and others omitted. For example if your data is bootstrap replicated datas set might be etc. SAS Programming November 13, / 63

7 From Macro to Proc Basically to generate the bootstrap data set, you generate random n random numbers from 1 to n, with replacement, and extract those values from your data. This was done using macros, but now can be done with PROC SURVEYSELECT. If you search on the web for bootstrapping, you still might run into one of those old macros. Newer methods might still be implemented using macros. A webpage from 2012 has a macro for Bootstrap bagging, a method of averaging results from multiple classification algorithms. There are also macros for searching the web to download movie reviews or extract data from social media. Try searching on SAS macro 2013 for interesting examples. SAS Programming November 13, / 63

8 Bootstrapping with PROC SURVEYSELECT SAS Programming November 13, / 63

9 samprate is the fraction of the sample you want, which here is 1 for 100% (i.e., we want the same original sample size). outhits gives the number of times each observation is selected, which isn t necessary, but interesting to observe. rep is the number of bootstrap datasets, which in this case is set to 1000, which is a typical number. I ve also seen 100 used a lot for genetics examples that require time-consuming maximum likelihood approaches. SAS Programming November 13, / 63 Bootstrapping with PROC SURVEYSELECT To explain the syntax, everything here is an option. There are no statements within the procedure, which is why there is only one semicolon before the RUN statement. We create an output dataset by whatever name we want, here outboot. seed is a random number seed. method refers to the type of sampling, which for the bootstrap should be sampling with replacement. If you sampled without replacement, you d be permuting your observations, but this would have no effect on the mean, median, etc.

10 Bootstrapping with PROC SURVEYSELECT Opening the outboot dataset. SAS Programming November 13, / 63

11 Bootstrapping with PROC SURVEYSELECT Things to note: the number of men versus women is a random variable in the replicated datasets. It is not fixed to be the same as the original data set, but will be the same on average. However, the data set seems to be sorted by sex. the number of times an observation is repeated is in the column NumberHits. If this number is 4, for example, the same row occurs four times in a row. If an observation is selected 0 times, it doesn t show up in this column. The replicate is indicated in a column called Replicate. This is similar to the structure of the data sets we used to simulate power analyses, with (conceptually) multiple datasets simulated within a single SAS dataset. SAS Programming November 13, / 63

12 Bootstrapping with PROC SURVEYSELECT The idea behind bootstrapping is that we can use the simulated data sets to get a simulated distribution of sample statistics: sample means, sample standard deviations, sample medians, sample coefficient of variation (s/x), interquartile range, etc. We have theory to tell us the distribution of X, which is normal in most cases with large sample sizes. The distribution of the sample median, the 95th percentile, the range, and so forth is more difficult theoretically and will depend on the underlying distribution, so bootstrapping can be useful for this purpose. SAS Programming November 13, / 63

13 Bootstrapping The idea behind bootstrapping is that if we don t know what the underlying population is, then our sample is our best guess at what the underlying population is. The idea then is to draw samples from our initial sample as if we were drawing multiple samples from a population. This should work well if our sample is representative of the population we are making inferences about. When shouldn t this work well? If the sample size is too small, then the sample won t do a good job of representing the entire population, particular the extremes of a distribution. Bootstrap samples don t extrapolate beyond the original sample, so a sample of 100 observations might not do a good job of estimating the 99th percentile or even 95th percentile of a distribution. A sample that is biased will also not be corrected by using a bootstrap. SAS Programming November 13, / 63

14 Bootstrapping We can now think about how to use the bootstrapped data that PROC SURVEYSELECT creates. Suppose we want to estimate the population median. A reasonable guess for the population median is the sample median, assuming that we don t know anything about the distribution. (Is this always the case? No what is the best guess for the population median when sampling from a normal distribution?) The more interesting applications of bootstrapping is to get some form of confidence interval around your estimate or estimated standard error. SAS Programming November 13, / 63

15 Bootstrapping confidence intervals: percentiles SAS Programming November 13, / 63

16 Bootstrapping confidence intervals: percentiles SAS Programming November 13, / 63

17 Bootstrapping confidence intervals: percentiles PROC UNIVARIATE gives enough information to give a 90% bootstrap interval and a 98% interval, but for the 95% interval, we need the 2.5% and 97.5% quantiles. The 90% interval is (98.2,98.4) for the median, which is pretty narrow. The 98% interval is (98.1,98.5). To get the 95% interval we need to do a little more work to get customized percentiles out of PROC UNIVARIATE (using the output with options shown), or we can generate them a different way. SAS Programming November 13, / 63

18 Customized percentiles in PROC UNIVARIATE The 95% interval is (98.15,98.4). Note that the 90% and 98% intervals were symmetric around the sample median of 98.3 but the 95% interval was not. SAS Programming November 13, / 63

19 Getting the percentiles by sorting Another way to get the percentiles is to sort the 1000 replicates and get the 25th and 976th ordered observations. SAS Programming November 13, / 63

20 What are the percentiles? It s a little tricky to get the right percentiles. Should it be observation 25 or 26, 975 or 976 or 974? My reasoning was that I wanted to get the middle 950 observations, so that there were 25 observations to the left of my interval and 25 observations to the right of my interval. The exact number people use varies, though, so sometimes you ll see people use the 25th and 975th observations in the sorted data. This usually won t make much difference. SAS Programming November 13, / 63

21 What are percentiles? There s a nice function in R to help you find the percentiles. It interpolates between numbers though and has 9 different algorithms (you can specify which) to define the quantile. The R function apparently includes the SAS interpretation as one of the types. Here are some examples > x <- 1:1000 > quantile(x,.025) 2.5\% > quantile(x,.975) 97.5\% > quantile(x,.025,type=3) # type 3 is for SAS 2.5% 25 > quantile(x,.975,type=3) 97.5% SAS Programming November 13, / 63

22 Interpreting Bootstrap CIs Interpreting Bootstrap CIs is a little unclear. Is it a probability? It is easiest to think of a bootstrap CI as a plausible range of values for the parameter. Of course the same might be said of frequentist CIs. If I say that a 95% CI (based on the theory for normal distributions) for µ is (1.3, 2.1), this does NOT mean that there is a 95% chance that µ is between 1.3 and 2.1 since this would be treating µ as a random variable rather than a parameter. SAS Programming November 13, / 63

23 Interpreting Bootstrap CIs What we hope to be the case for a frequentist CI is that 95% of the time, a 95% CI captures the population mean. The idea is that if there are many samples, then before you look at the data, you expect 95% of the CIs constructed from the different samples to capture the population mean. As an example, if there are multiple polls for the proportion of people who support say, Hilary Clinton for the next presidential election conducted by CNN, ABC, Fox News, MSNBC, The New York Times, etc., then hopefully 95% of those polls will have the true percentage in their confidence intervals, assuming those polls are independent. On the frequentist way of looking at things, once the intervals have been constructed, any individual poll either captures this true proportion or it does not. But probability statements don t make sense unless the proportion is a random variable. SAS Programming November 13, / 63

24 Interpreting Bootstrap CIs Bootstrap CIs are similar to frequentist CIs in this respect that if the sample is representative, then approximately (1 α)% of the time, they capture the parameter value they are estimating. SAS Programming November 13, / 63

25 Interpreting CIs How can we test how well a confidence interval (of any type) does? One thing we can do is estimate it s coverage probability. That is, we generate many samples, construct a CI for each of them, and test how often it covers the true parameter. A well constructed 95% confidence interval should cover the true parameter 95% of the time. For examples where you reject H 0 : µ = µ 0 if and only if the confidence interval include µ 0, the coverage probability is the flip side of thinking about the type 1 error. If the confidence interval include µ 0 95% of the time, then you reject the H 0 : µ = µ 0 exactly 5% of the time. SAS Programming November 13, / 63

26 Bootstrap estimate of the standard error The bootstrap estimate of the standard error is obtained by by taking the sample variance of your test statistic, where the sample variance is computed across bootstrap replicates. For example, if my bootstrap median values are m 1, m 2,..., m B where B = 1000 is the number of replicates. Then the bootstrap estimate of the standard error of the sample median is B (m i m) 2 i=1 where m is the arithmetic average of the sample medians. SAS Programming November 13, / 63

27 Bootstrap standard error SAS Programming November 13, / 63

28 Bootstrap standard error This is the outboot data set and the first PROC MEANS output. SAS Programming November 13, / 63

29 Bootstrap standard error This is the meanboot3 data set and second PROC MEANS output. meanboot3 SAS Programming November 13, / 63

30 Bootstrap standard error CI From this estimate of the standard error, we can construct a 95% interval, based on using 98.3 ± 1.96(.08619) = (98.13, 98.47) which is similar to the percentile-based estimator but slightly larger. In my experience I have mostly seen the percentile-based interval used for bootstrap CIs. SAS Programming November 13, / 63

31 Hypothesis Testing You can also use a bootrapping framework to do hypothesis testing. Suppose you want to test the difference in two medians for two populations. The null hypothesis is that the two populations have the same median, so H 0 : η 1 = η 2, while the alternative is H A : η 1 η 2. Using the bootstrap, we can estimate the standard errors for each group and the sample medians, m 1 and m 2 for each group. The standard error for the difference is the square root of the sum of the squared standard errors: se(m 1 m 2 ) = se(m 1 ) 2 + se(m 2 ) 2 Using bootstrap estimates of se(m 1 ) and se(m 2 ), this can be used to test whether the difference m 1 m 2 is significantly different from 0. SAS Programming November 13, / 63

32 Bootstrapping in R Perhaps not surprisingly, bootstrapping is a little easier in R, largely because of a function called sample(), which allows you to sample with or without replacement. Suppose my temperatures are in temperature. Then bootmedian <- 1:1000 for(b in 1:B) { bootmedian[i] <- median(sample(temperature,replace=t)) } bootmedian <- sort(bootmedian) ci <- c(bootmedian[26],bootmedian[975]) print(ci) This is is sufficient to generate your bootstrapped medians and the percentile interval. Of course you can also do sd(boot median) to get the bootstrap standard error. SAS Programming November 13, / 63

33 Why do we use the sample mean instead of the sample median? Suppose we are sampling from a symmetric distribution. Why do we use the sample mean instead of the sample median? Both are unbiased estimators of the center of the distribution. SAS Programming November 13, / 63

34 Why do we use the sample mean instead of the sample median? A basic answer is that for most distributions, the sample median is more variable than the sample mean, so for finite sample sizes, the mean is more precise. Since the variance of the sample median is difficult to determine, this could be investigated by simulation, either by simulating many samples from the same distribution and computing the standard deviations of the sample means and sample medians, or by using the bootstrap estimates of the standard errors if you are working with one sample. For the temperature data, it actually doesn t make much difference. A normal-based 95% confidence interval for the mean temperature is (98.12,98.38) (based on PROC TTEST), and the mean temperature is a bit lower than the median, being instead of SAS Programming November 13, / 63

35 What if there were no PROC SURVEYSELECT? PROC SURVEYSELECT was introduced in SAS version 6, and hasn t always been around. What would you do if it weren t available? Bootstrapping was invented in the 1970s, long before PROC SURVEYSELECT, and generally, statistical methods will be invented before there is a tidy SAS procedure for them. This is part of why being able to program can be important. In the past, people used macros to accomplish bootstrapping. How would you do this? First think about how you would generate one bootstrap replicate data set. SAS Programming November 13, / 63

36 Bootstrap by hand Here is one way to create a single bootstrap dataset. To create many, you could loop over this code with a macro. SAS Programming November 13, / 63

37 Bootstrapping with a macro In most cases, it is more efficient to create a giant dataset with all of your bootstrap replicates together, then summarize using a CLASS statement in PROC MEANS or some other PROC. However, if you had 1 million observations and wanted 1000 bootstrap replicates, this would create a dataset with 1 billion observations. With the macro approach, you can extract the information you need from each bootstrap data set (median, quantile, etc), then save that to a dataset, and rewrite your bootstrap data set, so that you never use more space than 1 million observations at a time. There is no need to save every bootstrap dataset. Sometimes there is a tradeoff between speed and memory. SAS Programming November 13, / 63

38 Bootstrapping and outliers One interesting thing to think about is what happens when you use bootstrapping and there is an outlier in your data? Ordinarily, we want our inferences to be good for the population we are sampling from, and not sensitive to the idiosyncrasies of the sample we happened to collect. In other words, if we collect a new sample from the same population, we d like our inferences to be stable. Bootstrapping simulates this idea of getting a slightly different sample from the same population. Some of the same values will be repeated, and some of them will be left out. This leads back to the question about outliers. Sometimes an outlier will be in your bootstrap replicate, sometimes it won t. If an outlier is seriously affecting your inferences, this can show up by your bootstrap inferences not being very stable. SAS Programming November 13, / 63

39 Bootstrapping and outliers What is the probability that an outlier is in one particular bootstrap replicate? Let s say you have a sample of size 5. The probability that the outlier is NOT chosen is (4/5)(4/5)(4/5)(4/5)(4/5), or ( ) What if the sample size is n? ( n 1 n ) n ( = 1 1 ) n n SAS Programming November 13, / 63

40 Bootstrapping and outliers What is this value for large n? ( lim 1 1 ) n = e 1 n n So for large n, the probability that the outlier IS in the dataset is approximately 1 e 1 = 0.632, a little less than two-thirds. SAS Programming November 13, / 63

41 Parametric bootstrapping The type of bootstrapping we ve been doing is often called nonparametric bootstrapping, whereas parametric bootstrapping involves simulating samples from a known distribution and calculating simulated test statistics from these samples. This is also a useful procedure. Often the parametric bootstrap is based on simulating from a distribution that is estimated from the data. Suppose you believe that your data is normally distributed, and you want to know what the distribution of the sample coefficient of variation should be from the population you sampled from. You could use the nonparametric bootstrap as we did for the median, or you could assume that your data is normal, use the mean and variance estimated from your data, and draw samples from a normal distribution to simulate the distribution of the coefficient of variation from this normal population with the same sample size as you obtained in your data. SAS Programming November 13, / 63

42 Parametric bootstrapping the coefficient of variation First, we ll just look at computing the coefficient of variation from the data in SAS for the temperature data. There are several ways to do this. We could use macro variables to store the mean and standard deviation using the output of PROC MEANS, or we can create a dataset that computes them. I ll use the second approach first, which will remind us how to use a RETAIN statement, but the macro variable approach is just as good. SAS Programming November 13, / 63

43 Computing the coefficient of variation SAS Programming November 13, / 63

44 Computing the coefficient of variation Note that the dataset cov just has one variable. SAS Programming November 13, / 63

45 Computing the coefficient of variation If we want to do a parametric bootstrap assuming that temperatures are normally distributed with the same mean and standard variation as we re observed in our data, then it would help to have the mean and standard deviation stored as macro variables so that we can we can use those values whenever we want, or we could just hard code it. SAS Programming November 13, / 63

46 Computing the coefficient of variation To use a macro variable, we use CALL SYMPUT. We used this in the notes for week 11, but that was probably easy to forget. (I forgot the syntax myself and had to look it up again...) First, we ll just modify the previous code to store the mean and standard deviation as macro variables. SAS Programming November 13, / 63

47 Computing the coefficient of variation Note that the dataset cov just has one variable. SAS Programming November 13, / 63

48 Simulating the coefficient of variation Next we want to simulate normally distributed samples of size 130 (the same size as our original data) from a normal distribution with mean and standard deviation Since we re using the actual values generated internally in SAS, we ll actually be using as many digits of precision as SAS uses. If you hard coded the values yourself, you d have to decide how many digits of precision you wanted to use for your parameters. SAS Programming November 13, / 63

49 Simulating the coefficient of variation Now that we have the mean and standard deviation saved, we can simulate from a normal distribution with those parameters. The point of doing this is to get an idea of how variable the sample coefficient of variation is when you have samples of size 130 from a normal with the same mean and variance as your data. If you simulate data from a normal distribution with this mean and standard deviation, and the coefficient of variation is never close to your sample coefficient of variation, this suggests that a normal distribution is not a good fit to your data. The parametric bootstrap is sometimes used as a kind of goodness-of-fit test. SAS Programming November 13, / 63

50 Simulating the coefficient of variation SAS Programming November 13, / 63

51 Simulating the coefficient of variation SAS Programming November 13, / 63

52 Simulating the coefficient of variation SAS Programming November 13, / 63

53 Simulating the coefficient of variation Where does our sample coefficient of variation lie on this plot? It is very close to the center, so it is not a surprising value for the sample c.o.v. for this normal distribution. This is probably not the most powerful test for whether or not the data is normal, but sometimes this kind of procedure suggests that the data are not consistent with a particular distribution. In any case, your main interest might lie in the c.o.v. rather than the normality of the data. We can highlight some things in the plot to make it more interesting. For example, we could add a point to the plot illustrating our sample c.o.v. and maybe a central 95% interval (a percentile interval) for the distribution of simulated coefficient of variation values. This is not really a confidence interval, but tells you where you expect the sample coefficient of variation to lie most of the time from this normal distribution. SAS Programming November 13, / 63

54 Simulating the coefficient of variation Here I just draw a refline at the sample c.o.v. Note that SAS Studio recognizes when I start to type a user-defined macro variable. SAS Programming November 13, / 63

55 Simulating the coefficient of variation In this case, the sample c.o.v. is near the center of the distribution of simulated c.o.v. values. This is not surprising since the original distribution was reasonably close to normal. In other cases, sample statistics are not necessarily near the average of their simulated distributions if the original distribution is different from the assumed distribution. SAS Programming November 13, / 63

56 Simulating the coefficient of variation In addition to seeing the distribution of the sample statistic, we can also get the standard error of the sample statistic, which is the standard deviation of s/x when n = 130. (A standard error is a standard deviation of a sample statistic). Here we just run PROC MEANS yet again. SAS Programming November 13, / 63

57 Parametric bootstrap estimate of the standard error The estimate of the standard error of the coefficient of variation is SAS Programming November 13, / 63

58 Simulation versus theory In some cases, theory can tell us what the standard error for some tricky statistic is. Sometimes we want standard errors for statistics that estimate odds, p/(1 p), odds ratios p 1/(1 p 1 ) p 2 /(1 p 2 ), precision 1/σ2, or other functions of parameters. If theory can give you a good answer, then this is often preferable to doing a simulation. However, theoretical expressions for this things often involve mathematical (if not numerical) approximations, so that a simulation, while approximate, might be just as good. The main disadvantage for simulation is often computation time and the fact that you need to do separate simulations for different parameter values. If I want the standard error for the c.o.v., I need to do separate simulations for different choices of n, µ and σ (assuming a normal distribution). If I have a function to give me the standard error, then I can quickly examine the effect of one or more parameters on the standard error as a function of the parameter(s). SAS Programming November 13, / 63

59 How much is bootstrapping used? SAS Programming November 13, / 63

60 How is bootstrapping used in phylogenetics? Bootstrapping is used in phylogenetics primary to help quantify uncertainty about maximum likelihood estimates. The idea is that bootstrap replicates are made of DNA sequences, and the best tree is constructed from these bootstrap replicates. Then the proportion of trees that have certain features in common is reported. This application is a bit different from the median example, because there is a discrete parameter being inferred. SAS Programming November 13, / 63

61 Non-parametric bootstrapping in phylogenetics SAS Programming November 13, / 63

62 Simulating the likelihood ratio statistic using parametric bootstrapping (Huelsenbeck and Bull, Systematic Biology, 1996) In this paper, the distribution of the likelihood ratio statistic δ = 2Λ is simulated under H 0 (in a case where it is not asymptotically χ 2 ) and compared to the observed likelihood ratio statistic. SAS Programming November 13, / 63

63 Parametric versus nonparametric bootstrapping In my experience, or in my area, nonparametric bootstrapping is used much more than parametric bootstrapping, although simulation from known distributions is used extensively that isn t called bootstrapping. We expect parametric bootstrapping to give more precise answers if we really know something about the distribution that the data comes from, and it is also useful in cases where we are testing a specific hypothesis. SAS Programming November 13, / 63

Macros and ODS. SAS Programming November 6, / 89

Macros and ODS. SAS Programming November 6, / 89 Macros and ODS The first part of these slides overlaps with last week a fair bit, but it doesn t hurt to review as this code might be a little harder to follow. SAS Programming November 6, 2014 1 / 89

More information

humor... May 3, / 56

humor... May 3, / 56 humor... May 3, 2017 1 / 56 Power As discussed previously, power is the probability of rejecting the null hypothesis when the null is false. Power depends on the effect size (how far from the truth the

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

The Bootstrap and Jackknife

The Bootstrap and Jackknife The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency Math 14 Introductory Statistics Summer 008 6-9-08 Class Notes Sections 3, 33 3: 1-1 odd 33: 7-13, 35-39 Measures of Central Tendency odd Notation: Let N be the size of the population, n the size of the

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Chapters 5-6: Statistical Inference Methods

Chapters 5-6: Statistical Inference Methods Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

1.7 Limit of a Function

1.7 Limit of a Function 1.7 Limit of a Function We will discuss the following in this section: 1. Limit Notation 2. Finding a it numerically 3. Right and Left Hand Limits 4. Infinite Limits Consider the following graph Notation:

More information

Confidence Intervals. Dennis Sun Data 301

Confidence Intervals. Dennis Sun Data 301 Dennis Sun Data 301 Statistical Inference probability Population / Box Sample / Data statistics The goal of statistics is to infer the unknown population from the sample. We ve already seen one mode of

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Lecture 3: Chapter 3

Lecture 3: Chapter 3 Lecture 3: Chapter 3 C C Moxley UAB Mathematics 12 September 16 3.2 Measurements of Center Statistics involves describing data sets and inferring things about them. The first step in understanding a set

More information

Week 4: Describing data and estimation

Week 4: Describing data and estimation Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate

More information

Earthquake data in geonet.org.nz

Earthquake data in geonet.org.nz Earthquake data in geonet.org.nz There is are large gaps in the 2012 and 2013 data, so let s not use it. Instead we ll use a previous year. Go to http://http://quakesearch.geonet.org.nz/ At the screen,

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

1. The Normal Distribution, continued

1. The Normal Distribution, continued Math 1125-Introductory Statistics Lecture 16 10/9/06 1. The Normal Distribution, continued Recall that the standard normal distribution is symmetric about z = 0, so the area to the right of zero is 0.5000.

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

Things to get out of conferences/workshops/minicourses

Things to get out of conferences/workshops/minicourses Things to get out of conferences/workshops/minicourses I started going to conferences outside of NM in my last 18 months of graduate school, and didn t have a very good idea of what to get out of them

More information

Fall 09, Homework 5

Fall 09, Homework 5 5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You

More information

Assignment 5.5. Nothing here to hand in

Assignment 5.5. Nothing here to hand in Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse:

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use

More information

Chapter 3: Describing, Exploring & Comparing Data

Chapter 3: Describing, Exploring & Comparing Data Chapter 3: Describing, Exploring & Comparing Data Section Title Notes Pages 1 Overview 1 2 Measures of Center 2 5 3 Measures of Variation 6 12 4 Measures of Relative Standing & Boxplots 13 16 3.1 Overview

More information

Section 0.3 The Order of Operations

Section 0.3 The Order of Operations Section 0.3 The Contents: Evaluating an Expression Grouping Symbols OPERATIONS The Distributive Property Answers Focus Exercises Let s be reminded of those operations seen thus far in the course: Operation

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

Chapter 1. Math review. 1.1 Some sets

Chapter 1. Math review. 1.1 Some sets Chapter 1 Math review This book assumes that you understood precalculus when you took it. So you used to know how to do things like factoring polynomials, solving high school geometry problems, using trigonometric

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

Unit 5: Estimating with Confidence

Unit 5: Estimating with Confidence Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Epidemic spreading on networks

Epidemic spreading on networks Epidemic spreading on networks Due date: Sunday October 25th, 2015, at 23:59. Always show all the steps which you made to arrive at your solution. Make sure you answer all parts of each question. Always

More information

STA Module 4 The Normal Distribution

STA Module 4 The Normal Distribution STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation Objectives: 1. Learn the meaning of descriptive versus inferential statistics 2. Identify bar graphs,

More information

Chapter 2 The SAS Environment

Chapter 2 The SAS Environment Chapter 2 The SAS Environment Abstract In this chapter, we begin to become familiar with the basic SAS working environment. We introduce the basic 3-screen layout, how to navigate the SAS Explorer window,

More information

Intro. Scheme Basics. scm> 5 5. scm>

Intro. Scheme Basics. scm> 5 5. scm> Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years. 3: Summary Statistics Notation Consider these 10 ages (in years): 1 4 5 11 30 50 8 7 4 5 The symbol n represents the sample size (n = 10). The capital letter X denotes the variable. x i represents the

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

1 Overview of Statistics; Essential Vocabulary

1 Overview of Statistics; Essential Vocabulary 1 Overview of Statistics; Essential Vocabulary Statistics: the science of collecting, organizing, analyzing, and interpreting data in order to make decisions Population and sample Population: the entire

More information

appstats6.notebook September 27, 2016

appstats6.notebook September 27, 2016 Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using

More information

Today s Topics. Percentile ranks and percentiles. Standardized scores. Using standardized scores to estimate percentiles

Today s Topics. Percentile ranks and percentiles. Standardized scores. Using standardized scores to estimate percentiles Today s Topics Percentile ranks and percentiles Standardized scores Using standardized scores to estimate percentiles Using µ and σ x to learn about percentiles Percentiles, standardized scores, and the

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

Lecture Notes 3: Data summarization

Lecture Notes 3: Data summarization Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &

More information

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). Suppose you have a one way design, and want to do an ANOVA, but discover that your data are seriously not normal? Just

More information

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bootstrap confidence intervals Class 24, 18.05 Jeremy Orloff and Jonathan Bloom 1. Be able to construct and sample from the empirical distribution of data. 2. Be able to explain the bootstrap

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers HW 34. Sketch

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015 STAT 113: Lab 9 Colin Reimer Dawson Last revised November 10, 2015 We will do some of the following together. The exercises with a (*) should be done and turned in as part of HW9. Before we start, let

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies. Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Administrivia. Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday.

Administrivia. Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday. Administrivia Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday. Lab notebooks will be due the week after Thanksgiving, when

More information

Title. Description. Menu. Remarks and examples. stata.com. stata.com. PSS Control Panel

Title. Description. Menu. Remarks and examples. stata.com. stata.com. PSS Control Panel Title stata.com GUI Graphical user interface for power and sample-size analysis Description Menu Remarks and examples Also see Description This entry describes the graphical user interface (GUI) for the

More information

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS. 1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts

More information

Lecture 1: Overview

Lecture 1: Overview 15-150 Lecture 1: Overview Lecture by Stefan Muller May 21, 2018 Welcome to 15-150! Today s lecture was an overview that showed the highlights of everything you re learning this semester, which also meant

More information

Getting to Know Your Data

Getting to Know Your Data Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss

More information

BIOS: 4120 Lab 11 Answers April 3-4, 2018

BIOS: 4120 Lab 11 Answers April 3-4, 2018 BIOS: 4120 Lab 11 Answers April 3-4, 2018 In today s lab we will briefly revisit Fisher s Exact Test, discuss confidence intervals for odds ratios, and review for quiz 3. Note: The material in the first

More information

Chapter 12: Statistics

Chapter 12: Statistics Chapter 12: Statistics Once you have imported your data or created a geospatial model, you may wish to calculate some simple statistics, run some simple tests, or see some traditional plots. On the main

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Understanding Recursion

Understanding Recursion Understanding Recursion sk, rob and dbtucker (modified for CS 536 by kfisler) 2002-09-20 Writing a Recursive Function Can we write the factorial function in AFunExp? Well, we currently don t have multiplication,

More information

+ Statistical Methods in

+ Statistical Methods in 9/4/013 Statistical Methods in Practice STA/MTH 379 Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics Sam Houston State University Discovering Statistics

More information

CREATING THE DISTRIBUTION ANALYSIS

CREATING THE DISTRIBUTION ANALYSIS Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184

More information

6.001 Notes: Section 4.1

6.001 Notes: Section 4.1 6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506. An Introduction to EViews The purpose of the computer assignments in BUEC 333 is to give you some experience using econometric software to analyse real-world data. Along the way, you ll become acquainted

More information

An Introduction to Markov Chain Monte Carlo

An Introduction to Markov Chain Monte Carlo An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

Fractional. Design of Experiments. Overview. Scenario

Fractional. Design of Experiments. Overview. Scenario Design of Experiments Overview We are going to learn about DOEs. Specifically, you ll learn what a DOE is, as well as, what a key concept known as Confounding is all about. Finally, you ll learn what the

More information

KS4 3 Year scheme of Work Year 10 Higher

KS4 3 Year scheme of Work Year 10 Higher KS4 3 Year scheme of Work Year 10 Higher Review: Equations A2 Substitute numerical values into formulae and expressions, including scientific formulae unfamiliar formulae will be given in the question

More information

Introduction to Counting, Some Basic Principles

Introduction to Counting, Some Basic Principles Introduction to Counting, Some Basic Principles These are the class notes for week. Before we begin, let me just say something about the structure of the class notes. I wrote up all of these class notes

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance

More information

Using Large Data Sets Workbook Version A (MEI)

Using Large Data Sets Workbook Version A (MEI) Using Large Data Sets Workbook Version A (MEI) 1 Index Key Skills Page 3 Becoming familiar with the dataset Page 3 Sorting and filtering the dataset Page 4 Producing a table of summary statistics with

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

5b. Descriptive Statistics - Part II

5b. Descriptive Statistics - Part II 5b. Descriptive Statistics - Part II In this lab we ll cover how you can calculate descriptive statistics that we discussed in class. We also learn how to summarize large multi-level databases efficiently,

More information

Optimization and least squares. Prof. Noah Snavely CS1114

Optimization and least squares. Prof. Noah Snavely CS1114 Optimization and least squares Prof. Noah Snavely CS1114 http://cs1114.cs.cornell.edu Administrivia A5 Part 1 due tomorrow by 5pm (please sign up for a demo slot) Part 2 will be due in two weeks (4/17)

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2 Xi Wang and Ronald K. Hambleton University of Massachusetts Amherst Introduction When test forms are administered to

More information

14.1 Encoding for different models of computation

14.1 Encoding for different models of computation Lecture 14 Decidable languages In the previous lecture we discussed some examples of encoding schemes, through which various objects can be represented by strings over a given alphabet. We will begin this

More information