Macros and ODS. SAS Programming November 6, / 89

Size: px
Start display at page:

Download "Macros and ODS. SAS Programming November 6, / 89"

Transcription

1 Macros and ODS The first part of these slides overlaps with last week a fair bit, but it doesn t hurt to review as this code might be a little harder to follow. SAS Programming November 6, / 89

2 ODS in SAS studio We re now ready to try combining macros with ODS. Here is an example of a simulation study that we can do with what we have. Suppose X 1,..., X 10 are i.i.d. (independent and identically distributed) exponential random variables with rate 1. If you perform a t-test at the α = 0.05 level for whether or not the mean is 1, what is the type 1 error? If X 1,..., X 10 are normal with mean 1, standard deviation σ, then you expect that you will reject H 0 5% of the time when H 0 is true. In this case, H 0 is true (the mean is 1), but the assumption of normality in the t-test is violated, and this might effect the type 1 error rate. SAS Programming November 6, / 89

3 ODS in SAS studio First we ll do one data set and run PROC TTEST once with TRACE ON to figure out what table we want to save. SAS Programming November 6, / 89

4 ODS in SAS studio It looks like we want the third table, which was called TTests SAS Programming November 6, / 89

5 ODS in SAS studio If you look at the data set in Work Folder, the name of the variable with the p-value is again Probt, although when PROC TTEST runs, it labels the variable by Pr > t. SAS Programming November 6, / 89

6 ODS in a macro Here s a start, I run PROC TTEST 3 times on 3 generated data sets. This creates 3 small data sets with p-values. SAS Programming November 6, / 89

7 ODS in a macro: using concatenation merge to put all p-values in one data set SAS Programming November 6, / 89

8 ODS dataset of p-values You should be able to now use your dataset of p-values to analyze your p-values. Particular questions of interest would be (1) how many p-values are below 0.05 (This is the type I error rate), and (2) the distribution of the p-values. SAS Programming November 6, / 89

9 ODS dataset of p-values In my case, I had trouble doing anything directly with the dataset pvalues. Nothing seemed to print in my output (for example PROC MEANS). So, in my case, I just output my dataset pvalues to an external file and then read it in again using a new SAS program. This is slightly inelegant, but it means I can start from scratch in case any of my ODS settings changed things and caused problems.// This approach could also be useful in case I wanted to generate p-values in SAS and analyze them or plot them in another program like R, or if I just want to save those p-values for later reference. SAS Programming November 6, / 89

10 ODS dataset of p-values SAS Programming November 6, / 89

11 ODS dataset of p-values Here I analyze the output of a SAS program using a second SAS program. pvalues2.txt is a cleaned up version of pvalues.txt that removed header information and so forth. SAS Programming November 6, / 89

12 ODS dataset of p-values Note that the MEANS procedure calculated the mean of the 0s and 1s and got This means there were 84 (out of 1000) observations that had a p-value less than How did it get the standard deviation? Note that each observation is a Bernoulli trial, and the standard deviation of a Bernoulli trial is p(1 p), so we would estimate this to be.084(1.084) = Why is this (slightly) different from the standard deviation reported by PROC MEANS? Also, is there evidence that there is an inflated type I error? Is significantly higher than α = 0.05? SAS Programming November 6, / 89

13 Statistical inference and simulations Sometimes we find ourselves using statistical inference just to interpret our own simulations, rather than for interpreting data. Some scientists have the attitude that if a phenomenon is real, then you shouldn t need to statistics to see it in the data. Although I don t share this point of view, because a lot of data is too complicated to interpret by eye, I sort of feel this way with simulations, though. If you re not sure whether is significantly higher than 0.05 (meaning there really is inflated type 1 error), you could either get a confidence interval around 0.084, or you could just do a larger simulation so that you could be really sure without having to construct confidence intervals. In this case, the confidence interval does exclude 0.05, so there is evidence to think that the type 1 error rate is somewhat inflated due to violation of the assumption of normally distributed samples. SAS Programming November 6, / 89

14 What about the distribution of p-values? The p-value is a random variable. It is a function of your data, much like X and σ 2, so it is a sample statistic. What should the distribution of the p-value be under the null hypothesis for a test statistic? If you use α = 0.05, this means that 5% of the time the p-value should be below α. More generally, for any α, P(p-value < α) = α. The p-value therefore has the same CDF as a uniform random variable. So p-values should be uniformly distributed for appropriate statistical tests when the null hypothesis and all assumptions are true. This is true for tests based on continuous test-statistics. For discrete problems, it might not be possible for P(p-value < α) = α. SAS Programming November 6, / 89

15 The distribution of p-values Here is a more technical explanation of why p-values are uniformly distributed for continuous test statistics, when the null is true, and for hypotheses H 0 : µ = µ 0, H A : µ > µ 0 (i.e., I ll just consider a one-sided test). For this one-sided test, the p-value is P(T t), where T is the test-statistic. 1 F (t) = P(T > t) = P(F (T ) > F (t)) = 1 P(F (T ) F (t)) P(F (T ) F (t)) = F (t) Because 0 F (t) 1, this means that F (T ) has a uniform distribution (since it has the same CDF). If U is uniform(0,1), then so is 1 U, so 1 F (t) is also uniform(0,1). But note that 1 F (t) = P(T t), which is the p-value. SAS Programming November 6, / 89

16 ODS dataset of p-values Here is the distribution of the p-values represented by a histogram. Typically uniform distributions have flatter looking histograms with 1000 observations, so the p-values here do not look uniformly distributed. Again, this would be clearer if we did more than just 1000 simulations. P- value SAS Programming November 6, / 89

17 A different way to simulate this problem Instead of simulating 1000 data sets of 10 observations, I could have just simulated all of the data all once and indexed the 1000 sets of 10 observations (similar to what I did for the Central Limit Theorem example. In this case, I would want to use PROC TTEST separately on each of the 1000 experiments. Moral: There s more than one way to do things. SAS Programming November 6, / 89

18 PROC TTEST using a BY statement P- value SAS Programming November 6, / 89

19 SAS Programming November44rr , / 89 PROC TTEST using a BY statement P- value

20 How to do this in R? It s a little easier in R. Here s how I would do it: x <- rexp(10000) x <- matrix(x,ncol=1000) pvalue <- 1:1000 for(j in 1:1000) { pvalue[j] <- t.test(x[,j],mu=1)$p.value } sum(pvalue<=.05)/1000 # this is the type I error rate hist(pvalue) SAS Programming November 6, / 89

21 Another example is a test for homogeneity of variances. Textbooks often warn that Bartlett s test can be used to test for equality of variances, but that it is extremely sensitive to the assumption of normality, even though many procedures, such as t-tests and ANOVA are reasonably robust to assumptions of normality. It is instructive to do a simulation to find out just how sensitive Bartlett s test is to the assumption of normality. In this example, we ll again create samples from two independent exponential distributions with rate λ = 1, so that both have equal variances of size λ 2 = 1. This time, we ll let the sample size vary from n = 10, 20, 30,..., 100 and see how the test does with increasing sample sizes. Again we ll look at the type I error rate for the test. For t-tests, we expect that as the sample size increases, the Central Limit Theorem tells us that X n is becoming increasingly similar to a normal distribution, so we expect the type I error rate to improve (get closer to α) as n increases. The Central Limit Theorem doesn t apply to S 2, the estimate of the variance, so the result of increasing n isn t as clear here. SAS Programming November 6, / 89

22 Testing type 1 error for Bartlett s test There are different ways to test for homogeneity of variance in SAS depending on the procedure that you are using. To get a statistic for Bartlett s test, you can use PROC GLM, which can use a two-sample case as a special case, although PROC GLM is much more general. PROC GLM can also be used for doing ANOVA, including with unbalanced designs (PROC ANOVA is for balanced designs), MANOVA (multivariate ANOVA with multiple response variables as well as multiple independent variables), polynomial regression, random effects models, repeated measures, etc. SAS Programming November 6, / 89

23 Testing type 1 error for Bartlett s test First, we ll generate example data, keeping in mind that we want to generalize our parameters. SAS Programming November 6, / 89

24 Testing type 1 error for Bartlett s test Note that PROC GLM wants data in the narrow style, NOT two columns, one for group A, one for group B. The data doesn t have to be sorted by group. SAS Programming November 6, / 89

25 Testing type 1 error for Bartlett s test Note that we could have generated the data in a wide format or with group A generated first followed by group B. To generate in a wide format, we could have done this with only one output statement and different variables for the two groups. data sim; do i=1 to &iter; do j=1 to &n; x = ranexp(2014*&n + &iter); y = ranexp(2013*&n + &iter); output; end; end; keep group x y i; run; SAS Programming November 6, / 89

26 Testing type 1 error for Bartlett s test To generate the As first then the Bs, we could have done this instead with an extra do loop, which still generates a narrow data set. data sim; /* generate two exponentials for each combination of i and j */ do i=1 to &iter; do j=1 to &n; x = ranexp(2014*&n + &iter); group = "A"; output; end; do j=1 to &n; x = ranexp(2013*&n + &iter); group = "B"; output; end; end; /* i is the iteration */ SAS Programming November 6, / 89

27 Testing type 1 error for Bartlett s test Back to the original data. We ll look at the output from PROC GLM and PROC TTEST to compare them. SAS Programming November 6, / 89

28 Testing type 1 error for Bartlett s test Here s output from PROC GLM. SAS Programming November 6, / 89

29 Testing type 1 error for Bartlett s test SAS Programming November 6, / 89

30 Testing type 1 error for Bartlett s test By default, PROC TTEST does a test for equal variances using the Folded F-test, whereas PROC GLM does not. Note that the p-value from PROC GLM matches the p-value from PROC TTEST when equal variances are assumed. SAS Programming November 6, / 89

31 Testing type 1 error for Bartlett s test Put the trace on to figure out how to save the right table. SAS Programming November 6, / 89

32 Testing type 1 error for Bartlett s test Look in the log file for the table. SAS Programming November 6, / 89

33 Testing type 1 error for Bartlett s test SAS Programming November 6, / 89

34 Testing type 1 error for Bartlett s test Now we can extend to more iterations and use BY to get them in one data set. SAS Programming November 6, / 89

35 Testing type 1 error for Bartlett s test SAS Programming November 6, / 89

36 Testing type 1 error for Bartlett s test We can scale this up to as many iterations as we want. Then we want to keep track of the number of p-values below 0.05 to get the type 1 error rate. SAS Programming November 6, / 89

37 Testing type 1 error for Bartlett s test SAS Programming November 6, / 89

38 Testing type 1 error for Bartlett s test Now we want to repeat this same idea but for different sample sizes n. Of course we could just repeat the code over and over and over again, changing the value of n. Or we can loop over different values of n. This creates a 3-level loop instead of 2-levels. SAS Programming November 6, / 89

39 Testing type 1 error for Bartlett s test SAS Programming November 6, / 89

40 Testing type 1 error for Bartlett s test SAS Programming November 6, / 89

41 DO loops versus Macros Instead of having an additional DO loop, I could have created a macro, say %macro bartlett(n,iter) and then run the macro multiple times for different values of n: %bartlett(10,1000) %bartlett(20,1000) %bartlett(30,1000) And so on. If you re data step is getting too complicated, then this might be reasonable. Also if you want different combinations of parameters, the macro approach is more flexible. For example, if you want 1 million iterations for n = 10 but 1000 iterations for n = 100 due to time constraints, then the macro approach is more flexible. SAS Programming November 6, / 89

42 Suppressing output? Unfortunately, these simulations create a lot of output as they are. If you want to run a procedure to create an output data set, but not generate output, then you can do this for most procedures, for example using the NOPRINT option in the same line as the procedure, but still creating an output data set, such as OUTPUT out= in PROC MEANS. Unfortunately, the NOPRINT option in the procedure means that nothing is printed for ODS to use. As a result, when using ODS, you end up with lots of output. I m not sure of a good way around this, but it is pretty annoying and slows SAS down to do a lot of I/O. You can reduce the output by only selecting what you will need to save: SAS Programming November 6, / 89

43 Testing type 1 error for Bartlett s test SAS Programming November 6, / 89

44 Results SAS Programming November 6, / 89

45 Results: interpretation These results are based on 10,000 iterations. Since we are generating essentially Bernoulli random variables (reject or don t reject H 0 ), we can think of the mean of the as the proportion of rejections. This should be α = 0.05, but is higher, with 23% for n = 10, 27% for n = 20, and 29% for n = 50. For 10,000 iterations, a confidence interval for this proportions has a margin of error of roughly 2 (.25)(.75)/10000 = A handy rule of thumb is that for a binomial, the 95% confidence interval has margin of error less than or equal to 1/ n = 1/100 for n = 10, 000, the point being that 29% appears to be significantly larger than 27%, so that type 1 error rates are increasing as the sample size increases. SAS Programming November 6, / 89

46 The moral of the story The moral of the story is that increasing your sample size doesn t always improve your inferences. In this case, the method is sufficiently non-robust that increasing the sample size makes it perform worse. So when textbooks say that Bartlett s test isn t very good for testing equality of variances, they really mean it, although they rarely explain why it is so bad. So what exactly is Bartlett s test testing? The usual description is that it tests H 0 : σ1 2 = σ2 2 assuming that the two samples are from normally distributed populations. However, considering that the test is likely to reject H 0 when σ1 2 = σ2 2 but the data are not normal, you could instead think of the null hypothesis as H 0 : X 1,..., X n iid N(µ1, σ 2 ), Y 1,..., Y m iid N(µ2, σ 2 ) i.e., the normality is part of what is being tested. In this case a rejection of H 0 could mean either that the data are not normal or that variances are unequal. SAS Programming November 6, / 89

47 Statistical inconsistency A related issue regarding increasing sample sizes is statistical inconsistency. An estimator θ n for a parameter θ (which might be an ordered-tuple of parameters), is said to be statistically consistent if for any ɛ > 0 and for any θ Θ (the parameter space), lim P( θ n θ > ɛ) = 0 n where n is the sample size. In other words, the estimator gets close to the actual parameter with high probability. You can have estimators that don t have this property, so that increasing the sample size doesn t increase the probability of your estimate being close to the true value. If you work with a discrete parameter space, then statistical consistency requires that the probability approaches 1 of making the correct inference for the parameter. SAS Programming November 6, / 89

48 Statistical power So far, we ve focused on type 1 error. How about power? Power is nearly identical from the point of view of simulation. In this case, you simulate for some values for which H 0 is false, and again count the number of times the null is rejected. In this case, the more frequently H 0 is rejected, the better the method (assuming it has good type 1 error as well). As an example, we ll consider the power for testing H 0 : µ 1 = µ 2 in a t-test when X 1..., X n iid N(0, 1), Y1,..., Y n iid N(1, 1). In this case the variances are equal, but the means are different. SAS Programming November 6, / 89

49 Statistical Power SAS Programming November 6, / 89

50 Statistical Power SAS Programming November 6, / 89

51 Statistical Power Power of the t-test for rejecting H 0 when X s are i.i.d N(0,1) and Y s are i.i.d N(1,1). 1 SAS Programming November 6, / 89

52 Uses of Power Analyses Why is it useful to study power? The main reasons for studying power for a particular problem are sample size determination for study design determining the effect size detectable for a given sample size choosing between different methods investigating robustness of methods to model violation SAS Programming November 6, / 89

53 Power: sample size determination A power analysis can be useful for determining the sample size you need to have a good chance of rejecting H 0 when H 0 is false. This is useful for initially trying to decide what sort of sample size to aim for when designing a study, and can be useful in grant applications. In the previous t-test example, if we believe that treatment 2 results in values an average of 1 point higher than treatment 1, then we can estimate that we d need a sample size of roughly 20 people per group to have about 80% power to detect a difference. If you wanted a better than 80% chance of being able to reject H 0, you d want larger samples. SAS Programming November 6, / 89

54 Power: sample size determination Often for grant proposals, you might try to justify your grant budget based on an estimated effect size (i.e., µ 1 µ 2 ) based on preliminary data. The idea is that if you think the effect size might be some particular value, then you want the sample size to be large enough to have a reasonable chance (80% is often used) to reject the null hypothesis. Since larger samples require more money, this can be used to justify how much money you need to request for your study. Similarly, if you don t justify your proposed sample size, then a reviewer might complain that your study is likely to be underpowered, meaning it is unlikely to detect anything if there is a difference (i.e., if a new drug is more effective). SAS Programming November 6, / 89

55 Power: sample size determination Sample sizes for studies aren t usually completely under the researcher s control, but they are analyzed as though they are fixed parameters rather than random variables. If you recruit people to be in a study for example using flyers around campus, the hospital, etc., then you might have historical data to predict what a typical sample size would be based on how long and widely you advertise. Study designers can therefore often indirectly control the sample size. SAS Programming November 6, / 89

56 Power: sample size determination Random sample sizes might be worth considering, however, For the t-test example, you might have better power to reject the null hypothesis if your sample sizes are equal for the two groups than if they are unequal. For example, suppose you are recruiting for testing whether a drug reduces headaches, and you recruit both men and women. Suppose you suspect that the drug is more effective for men than women. If you recruit people for the study, you might not be in direct control of how many men versus women volunteer to be in the study. Suppose 55 women volunteer to be in the study and 45 men volunteer. You could make the sample sizes equal by randomly dropping data from 10 of the women, but this would be throwing away information. It is better to use information from all 90 study participants, although you might have less power with 45 men versus 55 women than with 50 d for each sex. SAS Programming November 6, / 89

57 Power: sample size determination On the other hand, if for your study, you are collecting expensive information, such as doing MRIs for each participant, you might decide to accept the first n women volunteers and the first n men volunteers. A power analysis could help you decide whether it was important to have a balanced design or not. SAS Programming November 6, / 89

58 Power: effect of unbalanced designs How could we simulate the effect of unbalanced versus balanced designs? Assuming we knew that there were a fixed number of participants (say n = 100), we could compare the effect of a particular unbalanced design (for example 45 versus 55) versus the balanced design (50 per group). We could also let the number of men versus women in each iteration of a simulation be a binomial random variable, so that the degree of imbalance is random. SAS Programming November 6, / 89

59 Power: determining effect size In addition to graphing power as a function of sample size, it is common to plot power as a function of the effect size for a fixed sample size. Ultimately, power depends on three variables: α, n, and the effect size such as µ 1 µ 2 for the two-sample t-test example. We usually fix two of these variables and plot power as a function of the other variable. The t-test example is easy to modify to plot power as a function of the effect size for a given sample size (say, n = 20). SAS Programming November 6, / 89

60 Power: determining effect size SAS Programming November 6, / 89

61 Power: determining effect size SAS Programming November 6, / 89

62 Power: determining effect size 1 SAS Programming November 6, / 89

63 Power: plotting both sample size and effect size SAS Programming November 6, / 89

64 Power: plotting both sample size and effect size SAS Programming November 6, / 89

65 Power: plotting both sample size and effect size SAS Programming November 6, / 89

66 Power: determining effect size 1 SAS Programming November 6, / 89

67 Power: determining effect size Note that the data set sim that has all of my simulated data has 840,000 observations. SAS is still reasonably fast, and the log file gives information about how long it took. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA NOTE: The SAS System used: real time seconds cpu time 9.43 seconds We could make the plots smoother by incrementing the effect size by a smaller value (say.01), although this will generate 50 times as many observations. When simulations get this big, you start having to plan them how long will they take (instead of 30s, will it take 25min?, 25 days?), how much memory will they use, and so on, even though this is a very simple simulation. SAS Programming November 6, / 89

68 Length of simulations The log file also breaks down how long each procedure took. Much of the time was actually due to generating the PDF file with ODS. From the log file: NOTE: The data set WORK.SIM has observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.19 seconds cpu time 0.18 seconds NOTE: The data set WORK.PVALUES has observations and 9 variabl NOTE: The PROCEDURE TTEST printed pages NOTE: PROCEDURE TTEST used (Total process time): real time 9.12 seconds cpu time 8.97 seconds... NOTE: PROCEDURE SGPLOT used (Total process time): real time seconds cpu time 0.19 seconds SAS Programming November 6, / 89

69 Length of simulations When designing simulations, there are usually tradeoffs. For example, suppose I don t want my simulation to take any longer than it already has. If I want smoother curves, I could double the number of effect sizes I used, but then to keep the simulation the length of time, I might have to use fewer iterations (say 500 instead of 1000). This would increase the number of data points at the expense of possibly making the curve more jittery, or even not monotonically increasing. There will usually be a tradeoff between the number of iterations and the number of parameters you can try in your simulation. SAS Programming November 6, / 89

70 Length of simulations for R If you want to time R doing simulations, the easiest way is to run R in batch mode. In Linux or Mac OS X, you can go to a terminal, and at the shell prompt, type time R CMD BATCH myprogram.r and it will give a similar print out of real time versus cpu time for your R run. SAS Programming November 6, / 89

71 Power: determining effect size Power n=30 n=20 n= mu1 mu2 SAS Programming November 6, / 89

72 Power: determining effect size SAS Programming November 6, / 89

73 Power: determining effect size SAS Programming November 6, / 89

74 Power:tradeoff between number of parameters and number of iterations (500 vs 100 iterations) Power n=30 n=20 n=10 Power n=30 n=20 n= mu1 mu2 mu1 mu2 SAS Programming November 6, / 89

75 Using Power to select methods As mentioned before, power analyses are useful for determining which method is preferable when there are multiple methods available to analyze data. As an example, to consider the two sample t-test again when we have exponential data. Suppose we wish test H 0 : µ = 2 when λ = 1, so that the null hypothesis is false. Since the assumptions of the test are false, researchers might prefer using a nonparametric test. SAS Programming November 6, / 89

76 Using Power to select methods As an alternative, you can use a permutation test or other nonparametric test. Here we might wish to see which method is most powerful. If you can live with the inflated type 1 error for the t-test (or adjust for it by using a smaller α-level, then you might prefer it if is more powerful. A number of nonparametric procedures are implemented in PROC NPAR1WAY, as well as PROC MULTTEST. In addition, there are macros floating around the web that can do permutation tests without using these procedures. SAS Programming November 6, / 89

77 Using power to select methods Here we ll try PROC NPAR1WAY and just one nanparametric method, the Wilcoxon rank-sum test (also called the Mann-Whitney test). The idea is to pool all of the data, then rank them. Then calculate the sum of the ranks for group A versus group B. The two sums should be approximately equal, with greater differences in the sums of the ranks being evidence that the mean for one group is larger than the mean for the other group. SAS Programming November 6, / 89

78 Using power to select methods Note that there are many other methods we could have selected such as a median test or a permutation test. This is just to illustrate, and we are not necessarily finding the most powerful method. SAS Programming November 6, / 89

79 Power: comparing methods SAS Programming November 6, / 89

80 Power: comparing methods SAS Programming November 6, / 89

81 Power: comparing methods SAS Programming November 6, / 89

82 Power: comparing methods SAS Programming November 6, / 89

83 Power: comparing methods For these parameter values (exponentials with means of 1 and 2), the t-test was more powerful than the Wilcoxon test at all sample sizes. The Wikipedia article on the Mann-Whitney test says: It [The Wilcoxon or Mann-Whitney test] has greater efficiency than the t-test on non-normal distributions, such as a mixture of normal distributions, and it is nearly as efficient as the t-test on normal distributions. Given our limited simulation, we have some reason to be a little bit skeptical of this claim. Still, we only tried one combination of parameters. It is possible that for other parameters or other distributions, the t-test is less powerful. Also, the t-test has inflated type 1 error, so the comparison might be a little unfair. We could re-run the experiment using α =.01 for the t-test and α =.05 for the Wilcoxon to make sure that both had controlled type 1 error rates. SAS Programming November 6, / 89

84 Power: comparing methods Here s an example from an empirical paper, SAS Programming November 6, / 89

85 Power: comparing methods SAS Programming November 6, / 89

86 Speed: comparing methods For large analyses, speed and/or memory might be an issue for choosing between methods and/or algorithms. This paper compared using different methods within SAS based on speed for doing permutation tests. SAS Programming November 6, / 89

87 Use of macros for simulations The author of the previous paper provides an appendix with lengthy macros to use as more efficient substitutes to use as replacements for SAS procedures such as PROC NPAR1WAY and PROC MULTTEST, which from his data could crash or not terminate in a reasonable time. In addition to developing your own macros, a common use of macros is to use macros written by someone else that have not been incorporated into the SAS language. You might just copy and paste the macro into your code, possibly with some modification, and you can use the macro even if you cannot understand it. Popular macros might eventually get replaced by new PROCs or new functionality within SAS. This is sort of the SAS alternative to user-defined packages in R. SAS Programming November 6, / 89

88 From Macro to PROC An example of an evolution from macros to PROCS is for bootstrapping. For several years, to perform bootstrapping, SAS users relied on macros often written by others to do the bootstrapping. In bootstrapping, you sample you data (or the rows of your data set) with replacement and get a new dataset with the same sample size but some of the values repeated and others omitted. For example if your data is bootstrap replicated datas set might be etc. SAS Programming November 6, / 89

89 From Macro to Proc Basically to generate the bootstrap data set, you generate random n random numbers from 1 to n, with replacement, and extract those values from your data. This was done using macros, but now can be done with PROC SURVEYSELECT. If you search on the web for bootstrapping, you still might run into one of those old macros. Newer methods might still be implemented using macros. A webpage from 2012 has a macro for Bootstrap bagging, a method of averaging results from multiple classification algorithms. There are also macros for searching the web to download movie reviews or extract data from social media. Try searching on SAS macro 2013 for interesting examples. SAS Programming November 6, / 89

humor... May 3, / 56

humor... May 3, / 56 humor... May 3, 2017 1 / 56 Power As discussed previously, power is the probability of rejecting the null hypothesis when the null is false. Power depends on the effect size (how far from the truth the

More information

Notes on Simulations in SAS Studio

Notes on Simulations in SAS Studio Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write

More information

Things to get out of conferences/workshops/minicourses

Things to get out of conferences/workshops/minicourses Things to get out of conferences/workshops/minicourses I started going to conferences outside of NM in my last 18 months of graduate school, and didn t have a very good idea of what to get out of them

More information

Introduction to Statistical Analyses in SAS

Introduction to Statistical Analyses in SAS Introduction to Statistical Analyses in SAS Programming Workshop Presented by the Applied Statistics Lab Sarah Janse April 5, 2017 1 Introduction Today we will go over some basic statistical analyses in

More information

Table Of Contents. Table Of Contents

Table Of Contents. Table Of Contents Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

In this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics:

In this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics: UPPSALA UNIVERSITY Department of Mathematics Måns Thulin, thulin@math.uu.se Analysis of regression and variance Fall 2011 COMPUTER EXERCISE 2: One-way ANOVA In this computer exercise we will work with

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Nonparametric and Simulation-Based Tests. Stat OSU, Autumn 2018 Dalpiaz

Nonparametric and Simulation-Based Tests. Stat OSU, Autumn 2018 Dalpiaz Nonparametric and Simulation-Based Tests Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 What is Parametric Testing? 2 Warmup #1, Two Sample Test for p 1 p 2 Ohio Issue 1, the Drug and Criminal Justice Policies

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

The Power and Sample Size Application

The Power and Sample Size Application Chapter 72 The Power and Sample Size Application Contents Overview: PSS Application.................................. 6148 SAS Power and Sample Size............................... 6148 Getting Started:

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

A Quick Introduction to R

A Quick Introduction to R Math 4501 Fall 2012 A Quick Introduction to R The point of these few pages is to give you a quick introduction to the possible uses of the free software R in statistical analysis. I will only expect you

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests

More information

Nonparametric and Simulation-Based Tests. STAT OSU, Spring 2019 Dalpiaz

Nonparametric and Simulation-Based Tests. STAT OSU, Spring 2019 Dalpiaz Nonparametric and Simulation-Based Tests STAT 3202 @ OSU, Spring 2019 Dalpiaz 1 What is Parametric Testing? 2 Warmup #1, Two Sample Test for p 1 p 2 Ohio Issue 1, the Drug and Criminal Justice Policies

More information

Chapters 5-6: Statistical Inference Methods

Chapters 5-6: Statistical Inference Methods Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Cluster Randomization Create Cluster Means Dataset

Cluster Randomization Create Cluster Means Dataset Chapter 270 Cluster Randomization Create Cluster Means Dataset Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. Examples of such clusters

More information

Pair-Wise Multiple Comparisons (Simulation)

Pair-Wise Multiple Comparisons (Simulation) Chapter 580 Pair-Wise Multiple Comparisons (Simulation) Introduction This procedure uses simulation analyze the power and significance level of three pair-wise multiple-comparison procedures: Tukey-Kramer,

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

R-Square Coeff Var Root MSE y Mean

R-Square Coeff Var Root MSE y Mean STAT:50 Applied Statistics II Exam - Practice 00 possible points. Consider a -factor study where each of the factors has 3 levels. The factors are Diet (,,3) and Drug (A,B,C) and there are n = 3 observations

More information

Multiple Comparisons of Treatments vs. a Control (Simulation)

Multiple Comparisons of Treatments vs. a Control (Simulation) Chapter 585 Multiple Comparisons of Treatments vs. a Control (Simulation) Introduction This procedure uses simulation to analyze the power and significance level of two multiple-comparison procedures that

More information

SAS/STAT 13.1 User s Guide. The Power and Sample Size Application

SAS/STAT 13.1 User s Guide. The Power and Sample Size Application SAS/STAT 13.1 User s Guide The Power and Sample Size Application This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as

More information

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Liping Huang, Center for Home Care Policy and Research, Visiting Nurse Service of New York, NY, NY ABSTRACT The

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Slides 11: Verification and Validation Models

Slides 11: Verification and Validation Models Slides 11: Verification and Validation Models Purpose and Overview The goal of the validation process is: To produce a model that represents true behaviour closely enough for decision making purposes.

More information

Assignment 5.5. Nothing here to hand in

Assignment 5.5. Nothing here to hand in Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse:

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). Suppose you have a one way design, and want to do an ANOVA, but discover that your data are seriously not normal? Just

More information

EXST SAS Lab Lab #6: More DATA STEP tasks

EXST SAS Lab Lab #6: More DATA STEP tasks EXST SAS Lab Lab #6: More DATA STEP tasks Objectives 1. Working from an current folder 2. Naming the HTML output data file 3. Dealing with multiple observations on an input line 4. Creating two SAS work

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

appstats6.notebook September 27, 2016

appstats6.notebook September 27, 2016 Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

General Factorial Models

General Factorial Models In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 34 It is possible to have many factors in a factorial experiment. In DDD we saw an example of a 3-factor study with ball size, height, and surface

More information

Difference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn

Difference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn Difference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn Problem Write and test a Scheme program to compute how many days are spanned by two given days. The program will include a procedure

More information

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 8 1 Example: Data augmentation / Auxiliary variables A commonly-used

More information

Sections 4.3 and 4.4

Sections 4.3 and 4.4 Sections 4.3 and 4.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 32 4.3 Areas under normal densities Every

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Reliable programming

Reliable programming Reliable programming How to write programs that work Think about reliability during design and implementation Test systematically When things break, fix them correctly Make sure everything stays fixed

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Week 6, Week 7 and Week 8 Analyses of Variance

Week 6, Week 7 and Week 8 Analyses of Variance Week 6, Week 7 and Week 8 Analyses of Variance Robyn Crook - 2008 In the next few weeks we will look at analyses of variance. This is an information-heavy handout so take your time reading it, and don

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

Lab 5 - Risk Analysis, Robustness, and Power

Lab 5 - Risk Analysis, Robustness, and Power Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors

More information

Earthquake data in geonet.org.nz

Earthquake data in geonet.org.nz Earthquake data in geonet.org.nz There is are large gaps in the 2012 and 2013 data, so let s not use it. Instead we ll use a previous year. Go to http://http://quakesearch.geonet.org.nz/ At the screen,

More information

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bootstrap confidence intervals Class 24, 18.05 Jeremy Orloff and Jonathan Bloom 1. Be able to construct and sample from the empirical distribution of data. 2. Be able to explain the bootstrap

More information

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change Chapter01.fm Page 1 Monday, August 23, 2004 1:52 PM Part I The Mechanics of Change The Mechanics of Change Chapter01.fm Page 2 Monday, August 23, 2004 1:52 PM Chapter01.fm Page 3 Monday, August 23, 2004

More information

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

General Factorial Models

General Factorial Models In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 1 1 / 31 It is possible to have many factors in a factorial experiment. We saw some three-way factorials earlier in the DDD book (HW 1 with 3 factors:

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Ch6: The Normal Distribution

Ch6: The Normal Distribution Ch6: The Normal Distribution Introduction Review: A continuous random variable can assume any value between two endpoints. Many continuous random variables have an approximately normal distribution, which

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Intro. Scheme Basics. scm> 5 5. scm>

Intro. Scheme Basics. scm> 5 5. scm> Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Introductory Guide to SAS:

Introductory Guide to SAS: Introductory Guide to SAS: For UVM Statistics Students By Richard Single Contents 1 Introduction and Preliminaries 2 2 Reading in Data: The DATA Step 2 2.1 The DATA Statement............................................

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

These are notes for the third lecture; if statements and loops.

These are notes for the third lecture; if statements and loops. These are notes for the third lecture; if statements and loops. 1 Yeah, this is going to be the second slide in a lot of lectures. 2 - Dominant language for desktop application development - Most modern

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

CS281 Section 3: Practical Optimization

CS281 Section 3: Practical Optimization CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical

More information

Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31

Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31 Statistics: Interpreting Data and Making Predictions Visual Displays of Data 1/31 Last Time Last time we discussed central tendency; that is, notions of the middle of data. More specifically we discussed

More information

EXST SAS Lab Lab #8: More data step and t-tests

EXST SAS Lab Lab #8: More data step and t-tests EXST SAS Lab Lab #8: More data step and t-tests Objectives 1. Input a text file in column input 2. Output two data files from a single input 3. Modify datasets with a KEEP statement or option 4. Prepare

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences 1 RefresheR Figure 1.1: Soy ice cream flavor preferences 2 The Shape of Data Figure 2.1: Frequency distribution of number of carburetors in mtcars dataset Figure 2.2: Daily temperature measurements from

More information

CSC 2515 Introduction to Machine Learning Assignment 2

CSC 2515 Introduction to Machine Learning Assignment 2 CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

CS 370 The Pseudocode Programming Process D R. M I C H A E L J. R E A L E F A L L

CS 370 The Pseudocode Programming Process D R. M I C H A E L J. R E A L E F A L L CS 370 The Pseudocode Programming Process D R. M I C H A E L J. R E A L E F A L L 2 0 1 5 Introduction At this point, you are ready to beginning programming at a lower level How do you actually write your

More information

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Recitation 4: Elimination algorithm, reconstituted graph, triangulation Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 4: Elimination algorithm, reconstituted graph, triangulation

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Confidence Intervals. Dennis Sun Data 301

Confidence Intervals. Dennis Sun Data 301 Dennis Sun Data 301 Statistical Inference probability Population / Box Sample / Data statistics The goal of statistics is to infer the unknown population from the sample. We ve already seen one mode of

More information

4.5 The smoothed bootstrap

4.5 The smoothed bootstrap 4.5. THE SMOOTHED BOOTSTRAP 47 F X i X Figure 4.1: Smoothing the empirical distribution function. 4.5 The smoothed bootstrap In the simple nonparametric bootstrap we have assumed that the empirical distribution

More information

Data 8 Final Review #1

Data 8 Final Review #1 Data 8 Final Review #1 Topics we ll cover: Visualizations Arrays and Table Manipulations Programming constructs (functions, for loops, conditional statements) Chance, Simulation, Sampling and Distributions

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

2

2 1 2 3 4 5 All resources: how fast, how many? If all the CPUs are pegged, that s as fast as you can go. CPUs have followed Moore s law, the rest of the system hasn t. Not everything can be made threaded,

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

How Rust views tradeoffs. Steve Klabnik

How Rust views tradeoffs. Steve Klabnik How Rust views tradeoffs Steve Klabnik 03.04.2019 What is a tradeoff? Bending the Curve Overview Design is about values Case Studies BDFL vs Design By Committee Stability Without Stagnation Acceptable

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit

More information

UNIT 1A EXPLORING UNIVARIATE DATA

UNIT 1A EXPLORING UNIVARIATE DATA A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM 1 Instructions Labs may be done in groups of 2 or 3 (i.e., not alone). You may use any programming language you wish but MATLAB is highly suggested.

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

9.2 Types of Errors in Hypothesis testing

9.2 Types of Errors in Hypothesis testing 9.2 Types of Errors in Hypothesis testing 1 Mistakes we could make As I mentioned, when we take a sample we won t be 100% sure of something because we do not take a census (we only look at information

More information

BECOME A LOAD TESTING ROCK STAR

BECOME A LOAD TESTING ROCK STAR 3 EASY STEPS TO BECOME A LOAD TESTING ROCK STAR Replicate real life conditions to improve application quality Telerik An Introduction Software load testing is generally understood to consist of exercising

More information

COSC 311: ALGORITHMS HW1: SORTING

COSC 311: ALGORITHMS HW1: SORTING COSC 311: ALGORITHMS HW1: SORTIG Solutions 1) Theoretical predictions. Solution: On randomly ordered data, we expect the following ordering: Heapsort = Mergesort = Quicksort (deterministic or randomized)

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information