Macros and ODS. SAS Programming November 6, / 89

Similar documents
humor... May 3, / 56

Notes on Simulations in SAS Studio

Things to get out of conferences/workshops/minicourses

Introduction to Statistical Analyses in SAS

Table Of Contents. Table Of Contents

STATS PAD USER MANUAL

In this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics:

Lab #9: ANOVA and TUKEY tests

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Nonparametric and Simulation-Based Tests. Stat OSU, Autumn 2018 Dalpiaz

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Section 2.3: Simple Linear Regression: Predictions and Inference

The Power and Sample Size Application

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

A Quick Introduction to R

Introduction to hypothesis testing

Nonparametric and Simulation-Based Tests. STAT OSU, Spring 2019 Dalpiaz

Chapters 5-6: Statistical Inference Methods

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Cluster Randomization Create Cluster Means Dataset

Pair-Wise Multiple Comparisons (Simulation)

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

R-Square Coeff Var Root MSE y Mean

Multiple Comparisons of Treatments vs. a Control (Simulation)

SAS/STAT 13.1 User s Guide. The Power and Sample Size Application

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

10.4 Linear interpolation method Newton s method

Week 4: Simple Linear Regression II

Slides 11: Verification and Validation Models

Assignment 5.5. Nothing here to hand in

Tips and Guidance for Analyzing Data. Executive Summary

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

EXST SAS Lab Lab #6: More DATA STEP tasks

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

appstats6.notebook September 27, 2016

Economics Nonparametric Econometrics

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

General Factorial Models

Difference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example

Sections 4.3 and 4.4

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Reliable programming

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Week 6, Week 7 and Week 8 Analyses of Variance

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Lab 5 - Risk Analysis, Robustness, and Power

Earthquake data in geonet.org.nz

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

General Factorial Models

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Ch6: The Normal Distribution

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Intro. Scheme Basics. scm> 5 5. scm>

Lecture 3: Linear Classification

The exam is closed book, closed notes except your one-page cheat sheet.

Introductory Guide to SAS:

Missing Data Analysis for the Employee Dataset

These are notes for the third lecture; if statements and loops.

SPSS INSTRUCTION CHAPTER 9

CS281 Section 3: Practical Optimization

Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31

EXST SAS Lab Lab #8: More data step and t-tests

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

CSC 2515 Introduction to Machine Learning Assignment 2

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

CS 370 The Pseudocode Programming Process D R. M I C H A E L J. R E A L E F A L L

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Estimation of Item Response Models

Data Analyst Nanodegree Syllabus

Confidence Intervals. Dennis Sun Data 301

4.5 The smoothed bootstrap

Data 8 Final Review #1

36-402/608 HW #1 Solutions 1/21/2010

2

3 Graphical Displays of Data

Tree-based methods for classification and regression

How Rust views tradeoffs. Steve Klabnik

CMSC424: Database Design. Instructor: Amol Deshpande

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

UNIT 1A EXPLORING UNIVARIATE DATA

3 Graphical Displays of Data

Image Manipulation in MATLAB Due Monday, July 17 at 5:00 PM

Bootstrapping Methods

STA 4273H: Statistical Machine Learning

More Summer Program t-shirts

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

Averages and Variation

9.2 Types of Errors in Hypothesis testing

BECOME A LOAD TESTING ROCK STAR

COSC 311: ALGORITHMS HW1: SORTING

Descriptive Statistics, Standard Deviation and Standard Error

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

Transcription:

Macros and ODS The first part of these slides overlaps with last week a fair bit, but it doesn t hurt to review as this code might be a little harder to follow. SAS Programming November 6, 2014 1 / 89

ODS in SAS studio We re now ready to try combining macros with ODS. Here is an example of a simulation study that we can do with what we have. Suppose X 1,..., X 10 are i.i.d. (independent and identically distributed) exponential random variables with rate 1. If you perform a t-test at the α = 0.05 level for whether or not the mean is 1, what is the type 1 error? If X 1,..., X 10 are normal with mean 1, standard deviation σ, then you expect that you will reject H 0 5% of the time when H 0 is true. In this case, H 0 is true (the mean is 1), but the assumption of normality in the t-test is violated, and this might effect the type 1 error rate. SAS Programming November 6, 2014 2 / 89

ODS in SAS studio First we ll do one data set and run PROC TTEST once with TRACE ON to figure out what table we want to save. SAS Programming November 6, 2014 3 / 89

ODS in SAS studio It looks like we want the third table, which was called TTests SAS Programming November 6, 2014 4 / 89

ODS in SAS studio If you look at the data set in Work Folder, the name of the variable with the p-value is again Probt, although when PROC TTEST runs, it labels the variable by Pr > t. SAS Programming November 6, 2014 5 / 89

ODS in a macro Here s a start, I run PROC TTEST 3 times on 3 generated data sets. This creates 3 small data sets with p-values. SAS Programming November 6, 2014 6 / 89

ODS in a macro: using concatenation merge to put all p-values in one data set SAS Programming November 6, 2014 7 / 89

ODS dataset of p-values You should be able to now use your dataset of p-values to analyze your p-values. Particular questions of interest would be (1) how many p-values are below 0.05 (This is the type I error rate), and (2) the distribution of the p-values. SAS Programming November 6, 2014 8 / 89

ODS dataset of p-values In my case, I had trouble doing anything directly with the dataset pvalues. Nothing seemed to print in my output (for example PROC MEANS). So, in my case, I just output my dataset pvalues to an external file and then read it in again using a new SAS program. This is slightly inelegant, but it means I can start from scratch in case any of my ODS settings changed things and caused problems.// This approach could also be useful in case I wanted to generate p-values in SAS and analyze them or plot them in another program like R, or if I just want to save those p-values for later reference. SAS Programming November 6, 2014 9 / 89

ODS dataset of p-values SAS Programming November 6, 2014 10 / 89

ODS dataset of p-values Here I analyze the output of a SAS program using a second SAS program. pvalues2.txt is a cleaned up version of pvalues.txt that removed header information and so forth. SAS Programming November 6, 2014 11 / 89

ODS dataset of p-values Note that the MEANS procedure calculated the mean of the 0s and 1s and got 0.084. This means there were 84 (out of 1000) observations that had a p-value less than 0.05. How did it get the standard deviation? Note that each observation is a Bernoulli trial, and the standard deviation of a Bernoulli trial is p(1 p), so we would estimate this to be.084(1.084) =.2773878. Why is this (slightly) different from the standard deviation reported by PROC MEANS? Also, is there evidence that there is an inflated type I error? Is 0.084 significantly higher than α = 0.05? SAS Programming November 6, 2014 12 / 89

Statistical inference and simulations Sometimes we find ourselves using statistical inference just to interpret our own simulations, rather than for interpreting data. Some scientists have the attitude that if a phenomenon is real, then you shouldn t need to statistics to see it in the data. Although I don t share this point of view, because a lot of data is too complicated to interpret by eye, I sort of feel this way with simulations, though. If you re not sure whether 0.084 is significantly higher than 0.05 (meaning there really is inflated type 1 error), you could either get a confidence interval around 0.084, or you could just do a larger simulation so that you could be really sure without having to construct confidence intervals. In this case, the confidence interval does exclude 0.05, so there is evidence to think that the type 1 error rate is somewhat inflated due to violation of the assumption of normally distributed samples. SAS Programming November 6, 2014 13 / 89

What about the distribution of p-values? The p-value is a random variable. It is a function of your data, much like X and σ 2, so it is a sample statistic. What should the distribution of the p-value be under the null hypothesis for a test statistic? If you use α = 0.05, this means that 5% of the time the p-value should be below α. More generally, for any α, P(p-value < α) = α. The p-value therefore has the same CDF as a uniform random variable. So p-values should be uniformly distributed for appropriate statistical tests when the null hypothesis and all assumptions are true. This is true for tests based on continuous test-statistics. For discrete problems, it might not be possible for P(p-value < α) = α. SAS Programming November 6, 2014 14 / 89

The distribution of p-values Here is a more technical explanation of why p-values are uniformly distributed for continuous test statistics, when the null is true, and for hypotheses H 0 : µ = µ 0, H A : µ > µ 0 (i.e., I ll just consider a one-sided test). For this one-sided test, the p-value is P(T t), where T is the test-statistic. 1 F (t) = P(T > t) = P(F (T ) > F (t)) = 1 P(F (T ) F (t)) P(F (T ) F (t)) = F (t) Because 0 F (t) 1, this means that F (T ) has a uniform distribution (since it has the same CDF). If U is uniform(0,1), then so is 1 U, so 1 F (t) is also uniform(0,1). But note that 1 F (t) = P(T t), which is the p-value. SAS Programming November 6, 2014 15 / 89

ODS dataset of p-values Here is the distribution of the p-values represented by a histogram. Typically uniform distributions have flatter looking histograms with 1000 observations, so the p-values here do not look uniformly distributed. Again, this would be clearer if we did more than just 1000 simulations. P- value SAS Programming November 6, 2014 16 / 89

A different way to simulate this problem Instead of simulating 1000 data sets of 10 observations, I could have just simulated all of the data all once and indexed the 1000 sets of 10 observations (similar to what I did for the Central Limit Theorem example. In this case, I would want to use PROC TTEST separately on each of the 1000 experiments. Moral: There s more than one way to do things. SAS Programming November 6, 2014 17 / 89

PROC TTEST using a BY statement P- value SAS Programming November 6, 2014 18 / 89

SAS Programming November44rr555588 6, 2014 19 / 89 PROC TTEST using a BY statement P- value

How to do this in R? It s a little easier in R. Here s how I would do it: x <- rexp(10000) x <- matrix(x,ncol=1000) pvalue <- 1:1000 for(j in 1:1000) { pvalue[j] <- t.test(x[,j],mu=1)$p.value } sum(pvalue<=.05)/1000 # this is the type I error rate hist(pvalue) SAS Programming November 6, 2014 20 / 89

Another example is a test for homogeneity of variances. Textbooks often warn that Bartlett s test can be used to test for equality of variances, but that it is extremely sensitive to the assumption of normality, even though many procedures, such as t-tests and ANOVA are reasonably robust to assumptions of normality. It is instructive to do a simulation to find out just how sensitive Bartlett s test is to the assumption of normality. In this example, we ll again create samples from two independent exponential distributions with rate λ = 1, so that both have equal variances of size λ 2 = 1. This time, we ll let the sample size vary from n = 10, 20, 30,..., 100 and see how the test does with increasing sample sizes. Again we ll look at the type I error rate for the test. For t-tests, we expect that as the sample size increases, the Central Limit Theorem tells us that X n is becoming increasingly similar to a normal distribution, so we expect the type I error rate to improve (get closer to α) as n increases. The Central Limit Theorem doesn t apply to S 2, the estimate of the variance, so the result of increasing n isn t as clear here. SAS Programming November 6, 2014 21 / 89

Testing type 1 error for Bartlett s test There are different ways to test for homogeneity of variance in SAS depending on the procedure that you are using. To get a statistic for Bartlett s test, you can use PROC GLM, which can use a two-sample case as a special case, although PROC GLM is much more general. PROC GLM can also be used for doing ANOVA, including with unbalanced designs (PROC ANOVA is for balanced designs), MANOVA (multivariate ANOVA with multiple response variables as well as multiple independent variables), polynomial regression, random effects models, repeated measures, etc. SAS Programming November 6, 2014 22 / 89

Testing type 1 error for Bartlett s test First, we ll generate example data, keeping in mind that we want to generalize our parameters. SAS Programming November 6, 2014 23 / 89

Testing type 1 error for Bartlett s test Note that PROC GLM wants data in the narrow style, NOT two columns, one for group A, one for group B. The data doesn t have to be sorted by group. SAS Programming November 6, 2014 24 / 89

Testing type 1 error for Bartlett s test Note that we could have generated the data in a wide format or with group A generated first followed by group B. To generate in a wide format, we could have done this with only one output statement and different variables for the two groups. data sim; do i=1 to &iter; do j=1 to &n; x = ranexp(2014*&n + &iter); y = ranexp(2013*&n + &iter); output; end; end; keep group x y i; run; SAS Programming November 6, 2014 25 / 89

Testing type 1 error for Bartlett s test To generate the As first then the Bs, we could have done this instead with an extra do loop, which still generates a narrow data set. data sim; /* generate two exponentials for each combination of i and j */ do i=1 to &iter; do j=1 to &n; x = ranexp(2014*&n + &iter); group = "A"; output; end; do j=1 to &n; x = ranexp(2013*&n + &iter); group = "B"; output; end; end; /* i is the iteration */ SAS Programming November 6, 2014 26 / 89

Testing type 1 error for Bartlett s test Back to the original data. We ll look at the output from PROC GLM and PROC TTEST to compare them. SAS Programming November 6, 2014 27 / 89

Testing type 1 error for Bartlett s test Here s output from PROC GLM. SAS Programming November 6, 2014 28 / 89

Testing type 1 error for Bartlett s test SAS Programming November 6, 2014 29 / 89

Testing type 1 error for Bartlett s test By default, PROC TTEST does a test for equal variances using the Folded F-test, whereas PROC GLM does not. Note that the p-value from PROC GLM matches the p-value from PROC TTEST when equal variances are assumed. SAS Programming November 6, 2014 30 / 89

Testing type 1 error for Bartlett s test Put the trace on to figure out how to save the right table. SAS Programming November 6, 2014 31 / 89

Testing type 1 error for Bartlett s test Look in the log file for the table. SAS Programming November 6, 2014 32 / 89

Testing type 1 error for Bartlett s test SAS Programming November 6, 2014 33 / 89

Testing type 1 error for Bartlett s test Now we can extend to more iterations and use BY to get them in one data set. SAS Programming November 6, 2014 34 / 89

Testing type 1 error for Bartlett s test SAS Programming November 6, 2014 35 / 89

Testing type 1 error for Bartlett s test We can scale this up to as many iterations as we want. Then we want to keep track of the number of p-values below 0.05 to get the type 1 error rate. SAS Programming November 6, 2014 36 / 89

Testing type 1 error for Bartlett s test SAS Programming November 6, 2014 37 / 89

Testing type 1 error for Bartlett s test Now we want to repeat this same idea but for different sample sizes n. Of course we could just repeat the code over and over and over again, changing the value of n. Or we can loop over different values of n. This creates a 3-level loop instead of 2-levels. SAS Programming November 6, 2014 38 / 89

Testing type 1 error for Bartlett s test SAS Programming November 6, 2014 39 / 89

Testing type 1 error for Bartlett s test SAS Programming November 6, 2014 40 / 89

DO loops versus Macros Instead of having an additional DO loop, I could have created a macro, say %macro bartlett(n,iter) and then run the macro multiple times for different values of n: %bartlett(10,1000) %bartlett(20,1000) %bartlett(30,1000) And so on. If you re data step is getting too complicated, then this might be reasonable. Also if you want different combinations of parameters, the macro approach is more flexible. For example, if you want 1 million iterations for n = 10 but 1000 iterations for n = 100 due to time constraints, then the macro approach is more flexible. SAS Programming November 6, 2014 41 / 89

Suppressing output? Unfortunately, these simulations create a lot of output as they are. If you want to run a procedure to create an output data set, but not generate output, then you can do this for most procedures, for example using the NOPRINT option in the same line as the procedure, but still creating an output data set, such as OUTPUT out= in PROC MEANS. Unfortunately, the NOPRINT option in the procedure means that nothing is printed for ODS to use. As a result, when using ODS, you end up with lots of output. I m not sure of a good way around this, but it is pretty annoying and slows SAS down to do a lot of I/O. You can reduce the output by only selecting what you will need to save: SAS Programming November 6, 2014 42 / 89

Testing type 1 error for Bartlett s test SAS Programming November 6, 2014 43 / 89

Results SAS Programming November 6, 2014 44 / 89

Results: interpretation These results are based on 10,000 iterations. Since we are generating essentially Bernoulli random variables (reject or don t reject H 0 ), we can think of the mean of the as the proportion of rejections. This should be α = 0.05, but is higher, with 23% for n = 10, 27% for n = 20, and 29% for n = 50. For 10,000 iterations, a confidence interval for this proportions has a margin of error of roughly 2 (.25)(.75)/10000 = 0.009. A handy rule of thumb is that for a binomial, the 95% confidence interval has margin of error less than or equal to 1/ n = 1/100 for n = 10, 000, the point being that 29% appears to be significantly larger than 27%, so that type 1 error rates are increasing as the sample size increases. SAS Programming November 6, 2014 45 / 89

The moral of the story The moral of the story is that increasing your sample size doesn t always improve your inferences. In this case, the method is sufficiently non-robust that increasing the sample size makes it perform worse. So when textbooks say that Bartlett s test isn t very good for testing equality of variances, they really mean it, although they rarely explain why it is so bad. So what exactly is Bartlett s test testing? The usual description is that it tests H 0 : σ1 2 = σ2 2 assuming that the two samples are from normally distributed populations. However, considering that the test is likely to reject H 0 when σ1 2 = σ2 2 but the data are not normal, you could instead think of the null hypothesis as H 0 : X 1,..., X n iid N(µ1, σ 2 ), Y 1,..., Y m iid N(µ2, σ 2 ) i.e., the normality is part of what is being tested. In this case a rejection of H 0 could mean either that the data are not normal or that variances are unequal. SAS Programming November 6, 2014 46 / 89

Statistical inconsistency A related issue regarding increasing sample sizes is statistical inconsistency. An estimator θ n for a parameter θ (which might be an ordered-tuple of parameters), is said to be statistically consistent if for any ɛ > 0 and for any θ Θ (the parameter space), lim P( θ n θ > ɛ) = 0 n where n is the sample size. In other words, the estimator gets close to the actual parameter with high probability. You can have estimators that don t have this property, so that increasing the sample size doesn t increase the probability of your estimate being close to the true value. If you work with a discrete parameter space, then statistical consistency requires that the probability approaches 1 of making the correct inference for the parameter. SAS Programming November 6, 2014 47 / 89

Statistical power So far, we ve focused on type 1 error. How about power? Power is nearly identical from the point of view of simulation. In this case, you simulate for some values for which H 0 is false, and again count the number of times the null is rejected. In this case, the more frequently H 0 is rejected, the better the method (assuming it has good type 1 error as well). As an example, we ll consider the power for testing H 0 : µ 1 = µ 2 in a t-test when X 1..., X n iid N(0, 1), Y1,..., Y n iid N(1, 1). In this case the variances are equal, but the means are different. SAS Programming November 6, 2014 48 / 89

Statistical Power SAS Programming November 6, 2014 49 / 89

Statistical Power SAS Programming November 6, 2014 50 / 89

Statistical Power Power of the t-test for rejecting H 0 when X s are i.i.d N(0,1) and Y s are i.i.d N(1,1). 1 SAS Programming November 6, 2014 51 / 89

Uses of Power Analyses Why is it useful to study power? The main reasons for studying power for a particular problem are sample size determination for study design determining the effect size detectable for a given sample size choosing between different methods investigating robustness of methods to model violation SAS Programming November 6, 2014 52 / 89

Power: sample size determination A power analysis can be useful for determining the sample size you need to have a good chance of rejecting H 0 when H 0 is false. This is useful for initially trying to decide what sort of sample size to aim for when designing a study, and can be useful in grant applications. In the previous t-test example, if we believe that treatment 2 results in values an average of 1 point higher than treatment 1, then we can estimate that we d need a sample size of roughly 20 people per group to have about 80% power to detect a difference. If you wanted a better than 80% chance of being able to reject H 0, you d want larger samples. SAS Programming November 6, 2014 53 / 89

Power: sample size determination Often for grant proposals, you might try to justify your grant budget based on an estimated effect size (i.e., µ 1 µ 2 ) based on preliminary data. The idea is that if you think the effect size might be some particular value, then you want the sample size to be large enough to have a reasonable chance (80% is often used) to reject the null hypothesis. Since larger samples require more money, this can be used to justify how much money you need to request for your study. Similarly, if you don t justify your proposed sample size, then a reviewer might complain that your study is likely to be underpowered, meaning it is unlikely to detect anything if there is a difference (i.e., if a new drug is more effective). SAS Programming November 6, 2014 54 / 89

Power: sample size determination Sample sizes for studies aren t usually completely under the researcher s control, but they are analyzed as though they are fixed parameters rather than random variables. If you recruit people to be in a study for example using flyers around campus, the hospital, etc., then you might have historical data to predict what a typical sample size would be based on how long and widely you advertise. Study designers can therefore often indirectly control the sample size. SAS Programming November 6, 2014 55 / 89

Power: sample size determination Random sample sizes might be worth considering, however, For the t-test example, you might have better power to reject the null hypothesis if your sample sizes are equal for the two groups than if they are unequal. For example, suppose you are recruiting for testing whether a drug reduces headaches, and you recruit both men and women. Suppose you suspect that the drug is more effective for men than women. If you recruit people for the study, you might not be in direct control of how many men versus women volunteer to be in the study. Suppose 55 women volunteer to be in the study and 45 men volunteer. You could make the sample sizes equal by randomly dropping data from 10 of the women, but this would be throwing away information. It is better to use information from all 90 study participants, although you might have less power with 45 men versus 55 women than with 50 d for each sex. SAS Programming November 6, 2014 56 / 89

Power: sample size determination On the other hand, if for your study, you are collecting expensive information, such as doing MRIs for each participant, you might decide to accept the first n women volunteers and the first n men volunteers. A power analysis could help you decide whether it was important to have a balanced design or not. SAS Programming November 6, 2014 57 / 89

Power: effect of unbalanced designs How could we simulate the effect of unbalanced versus balanced designs? Assuming we knew that there were a fixed number of participants (say n = 100), we could compare the effect of a particular unbalanced design (for example 45 versus 55) versus the balanced design (50 per group). We could also let the number of men versus women in each iteration of a simulation be a binomial random variable, so that the degree of imbalance is random. SAS Programming November 6, 2014 58 / 89

Power: determining effect size In addition to graphing power as a function of sample size, it is common to plot power as a function of the effect size for a fixed sample size. Ultimately, power depends on three variables: α, n, and the effect size such as µ 1 µ 2 for the two-sample t-test example. We usually fix two of these variables and plot power as a function of the other variable. The t-test example is easy to modify to plot power as a function of the effect size for a given sample size (say, n = 20). SAS Programming November 6, 2014 59 / 89

Power: determining effect size SAS Programming November 6, 2014 60 / 89

Power: determining effect size SAS Programming November 6, 2014 61 / 89

Power: determining effect size 1 SAS Programming November 6, 2014 62 / 89

Power: plotting both sample size and effect size SAS Programming November 6, 2014 63 / 89

Power: plotting both sample size and effect size SAS Programming November 6, 2014 64 / 89

Power: plotting both sample size and effect size SAS Programming November 6, 2014 65 / 89

Power: determining effect size 1 SAS Programming November 6, 2014 66 / 89

Power: determining effect size Note that the data set sim that has all of my simulated data has 840,000 observations. SAS is still reasonably fast, and the log file gives information about how long it took. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513 NOTE: The SAS System used: real time 22.28 seconds cpu time 9.43 seconds We could make the plots smoother by incrementing the effect size by a smaller value (say.01), although this will generate 50 times as many observations. When simulations get this big, you start having to plan them how long will they take (instead of 30s, will it take 25min?, 25 days?), how much memory will they use, and so on, even though this is a very simple simulation. SAS Programming November 6, 2014 67 / 89

Length of simulations The log file also breaks down how long each procedure took. Much of the time was actually due to generating the PDF file with ODS. From the log file: NOTE: The data set WORK.SIM has 840000 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.19 seconds cpu time 0.18 seconds NOTE: The data set WORK.PVALUES has 42000 observations and 9 variabl NOTE: The PROCEDURE TTEST printed pages 1-21000. NOTE: PROCEDURE TTEST used (Total process time): real time 9.12 seconds cpu time 8.97 seconds... NOTE: PROCEDURE SGPLOT used (Total process time): real time 12.44 seconds cpu time 0.19 seconds SAS Programming November 6, 2014 68 / 89

Length of simulations When designing simulations, there are usually tradeoffs. For example, suppose I don t want my simulation to take any longer than it already has. If I want smoother curves, I could double the number of effect sizes I used, but then to keep the simulation the length of time, I might have to use fewer iterations (say 500 instead of 1000). This would increase the number of data points at the expense of possibly making the curve more jittery, or even not monotonically increasing. There will usually be a tradeoff between the number of iterations and the number of parameters you can try in your simulation. SAS Programming November 6, 2014 69 / 89

Length of simulations for R If you want to time R doing simulations, the easiest way is to run R in batch mode. In Linux or Mac OS X, you can go to a terminal, and at the shell prompt, type time R CMD BATCH myprogram.r and it will give a similar print out of real time versus cpu time for your R run. SAS Programming November 6, 2014 70 / 89

Power: determining effect size Power 0.0 0.2 0.4 0.6 0.8 1.0 n=30 n=20 n=10 0.0 0.5 1.0 1.5 2.0 2.5 3.0 mu1 mu2 SAS Programming November 6, 2014 71 / 89

Power: determining effect size SAS Programming November 6, 2014 72 / 89

Power: determining effect size SAS Programming November 6, 2014 73 / 89

Power:tradeoff between number of parameters and number of iterations (500 vs 100 iterations) Power 0.0 0.2 0.4 0.6 0.8 1.0 n=30 n=20 n=10 Power 0.0 0.2 0.4 0.6 0.8 1.0 n=30 n=20 n=10 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 mu1 mu2 mu1 mu2 SAS Programming November 6, 2014 74 / 89

Using Power to select methods As mentioned before, power analyses are useful for determining which method is preferable when there are multiple methods available to analyze data. As an example, to consider the two sample t-test again when we have exponential data. Suppose we wish test H 0 : µ = 2 when λ = 1, so that the null hypothesis is false. Since the assumptions of the test are false, researchers might prefer using a nonparametric test. SAS Programming November 6, 2014 75 / 89

Using Power to select methods As an alternative, you can use a permutation test or other nonparametric test. Here we might wish to see which method is most powerful. If you can live with the inflated type 1 error for the t-test (or adjust for it by using a smaller α-level, then you might prefer it if is more powerful. A number of nonparametric procedures are implemented in PROC NPAR1WAY, as well as PROC MULTTEST. In addition, there are macros floating around the web that can do permutation tests without using these procedures. SAS Programming November 6, 2014 76 / 89

Using power to select methods Here we ll try PROC NPAR1WAY and just one nanparametric method, the Wilcoxon rank-sum test (also called the Mann-Whitney test). The idea is to pool all of the data, then rank them. Then calculate the sum of the ranks for group A versus group B. The two sums should be approximately equal, with greater differences in the sums of the ranks being evidence that the mean for one group is larger than the mean for the other group. SAS Programming November 6, 2014 77 / 89

Using power to select methods Note that there are many other methods we could have selected such as a median test or a permutation test. This is just to illustrate, and we are not necessarily finding the most powerful method. SAS Programming November 6, 2014 78 / 89

Power: comparing methods SAS Programming November 6, 2014 79 / 89

Power: comparing methods SAS Programming November 6, 2014 80 / 89

Power: comparing methods SAS Programming November 6, 2014 81 / 89

Power: comparing methods SAS Programming November 6, 2014 82 / 89

Power: comparing methods For these parameter values (exponentials with means of 1 and 2), the t-test was more powerful than the Wilcoxon test at all sample sizes. The Wikipedia article on the Mann-Whitney test says: It [The Wilcoxon or Mann-Whitney test] has greater efficiency than the t-test on non-normal distributions, such as a mixture of normal distributions, and it is nearly as efficient as the t-test on normal distributions. Given our limited simulation, we have some reason to be a little bit skeptical of this claim. Still, we only tried one combination of parameters. It is possible that for other parameters or other distributions, the t-test is less powerful. Also, the t-test has inflated type 1 error, so the comparison might be a little unfair. We could re-run the experiment using α =.01 for the t-test and α =.05 for the Wilcoxon to make sure that both had controlled type 1 error rates. SAS Programming November 6, 2014 83 / 89

Power: comparing methods Here s an example from an empirical paper, SAS Programming November 6, 2014 84 / 89

Power: comparing methods SAS Programming November 6, 2014 85 / 89

Speed: comparing methods For large analyses, speed and/or memory might be an issue for choosing between methods and/or algorithms. This paper compared using different methods within SAS based on speed for doing permutation tests. SAS Programming November 6, 2014 86 / 89

Use of macros for simulations The author of the previous paper provides an appendix with lengthy macros to use as more efficient substitutes to use as replacements for SAS procedures such as PROC NPAR1WAY and PROC MULTTEST, which from his data could crash or not terminate in a reasonable time. In addition to developing your own macros, a common use of macros is to use macros written by someone else that have not been incorporated into the SAS language. You might just copy and paste the macro into your code, possibly with some modification, and you can use the macro even if you cannot understand it. Popular macros might eventually get replaced by new PROCs or new functionality within SAS. This is sort of the SAS alternative to user-defined packages in R. SAS Programming November 6, 2014 87 / 89

From Macro to PROC An example of an evolution from macros to PROCS is for bootstrapping. For several years, to perform bootstrapping, SAS users relied on macros often written by others to do the bootstrapping. In bootstrapping, you sample you data (or the rows of your data set) with replacement and get a new dataset with the same sample size but some of the values repeated and others omitted. For example if your data is -3-2 0 1 2 5 6 9 bootstrap replicated datas set might be -2-2 1 5 6 9 9 9-3 0 1 1 2 5 5 6 etc. SAS Programming November 6, 2014 88 / 89

From Macro to Proc Basically to generate the bootstrap data set, you generate random n random numbers from 1 to n, with replacement, and extract those values from your data. This was done using macros, but now can be done with PROC SURVEYSELECT. If you search on the web for bootstrapping, you still might run into one of those old macros. Newer methods might still be implemented using macros. A webpage from 2012 has a macro for Bootstrap bagging, a method of averaging results from multiple classification algorithms. http://statcompute.wordpress.com/2012/07/14/a-sas-macro-for-bootstrap-aggregating-bagging/ There are also macros for searching the web to download movie reviews or extract data from social media. Try searching on SAS macro 2013 for interesting examples. SAS Programming November 6, 2014 89 / 89