Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016

Size: px

Start display at page:

Download "Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016"

Darcy Daniels
5 years ago
Views:

1 Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Randomization Distributions in Two-Proportion Settings By calling our setting a two proportion one, I mean that the data frame has two binary categorical variables, when the one that delineates which of two groups a subject comes from serves as the explanatory variable, and the other, the response variable, also has just two outcomes. In the cocaine addiction data, we have an explanatory variable, treatment, which has three levels: Desipramine, Lithium, and Placebo. We cut that back to two by ignoring one set of patients, perhaps those receiving Desipramine, thereby giving us just two groups to consider. The response variable is relapsed or not? which has just two values, yes or no. We focus on the relapsers. Natural hypotheses for a study to see if Lithium helps to decrease the chance of relapse are H 0 : p L p P = 0, H a : p L p P < 0. The sample proportions among the lithium and placebo groups are ˆp L = 18/24 and ˆp P = 20/24, giving us test statistic ˆp L ˆp P = = 2. = Like the study about tapping fingers under the influence of caffeine, this study is an experiment, where the treatment (Lithium or Placebo) was randomly assigned to patients. When we generate a randomization distribution, we want to be faithful to this process, even as we take the null hypothesis into account. That is, the mental image of dropping slips of paper into two bags, one bag containing the 48 relapse results (38 yes and 10 no ) and the other containing the 48 treatments (24 Lithiums and 24 Placebos ), and randomly assigning the latter to the former as we select our randomization sample, is achieving both goals. Generating a randomization distribution, however, is trickier in RStudio for this situation than in earlier scenarios, primarily because of the work we must do to prepare data for randomization samples. You may well prefer to use StatKey, the software meant to accompany the textbook, over RStudio, for cases involving two proportions. I will, however, provide details in RStudio for your perusal. The main difficulty, as indicated above, is preparing data. Here are two approaches. Approach 1: Recreate the data from scratch We have done this sort of thing once before, back in Section 2.1. Perhaps you recall the commands. part1 <- do(6) * data.frame(drug="lithium", Relapse="no") part2 <- do(18) * data.frame(drug="lithium", Relapse="yes") part3 <- do(4) * data.frame(drug="placebo", Relapse="no") part4 <- do(20) * data.frame(drug="placebo", Relapse="yes") addicttreatments <- rbind(part1, part2, part3, part4) Approach 2: Filtering the supplied data frame It turns out we don t actually need to recreate the data, as it has been supplied to us as part of the Lock5withR package in a data frame called CocaineTreatment. But working with it is not so straightforward as it 1

2 would at first seem, because this data frame contains all the patients, including those who received the drug called Desipramine. We can select the desired subset by leaving out these subjects: myfiltereddata <- subset(cocainetreatment, Drug!= "Desipramine") However, there seems to be a lingering memory that there were three levels for the Drug variable. You see this, for instance, when you produce a frequency table on Drug: tally(~drug, myfiltereddata) ## Drug ## Desipramine Lithium Placebo ## While the count of Desipramine patients is 0, we would prefer that our filtered data frame not know Desipramine is part of this study. One way to make it forget is to combine the removal of Desipramine patients with the droplevels() command. myfiltereddata <- droplevels(subset(cocainetreatment, Drug!= "Desipramine")) tally(~drug, myfiltereddata) ## Drug ## Lithium Placebo ## Now our Drug variable truly has just two levels in the myfiltereddata data frame. Once data has been prepared... If you carried out the commands above, you now have two data frames, addicttreatments and myfiltereddata, which can be used for our analysis. Either will work, but I will use myfiltereddata. head(myfiltereddata) ## Drug Relapse ## 25 Lithium no ## 26 Lithium yes ## 27 Lithium yes ## 28 Lithium yes ## 29 Lithium yes ## 30 Lithium no We obtain our test statistic from the sample itself: diff(prop(relapse~drug, data=myfiltereddata)) ## no.placebo ## As when dealing with the difference of two means (see the example using data from CaffeineTaps in a prior handout), our null hypothesis dictates that the drug received (Lithium vs. Placebo) is not actually a factor, and we should generate many randomization statistics by shuffling values of the explanatory variable. One randomization statistic is obtained with the command diff(prop(relapse~shuffle(drug), data=myfiltereddata)) ## no.placebo ## and this may be repeated many times to obtain a randomization distribution: 2

3 manydiffs <- do(5000) * diff(prop(relapse~shuffle(drug), data=myfiltereddata)) head(manydiffs) ## no.placebo ## ## ## ## ## ## The column, containing 5000 randomization statistics, has been given the curious name no.placebo. We may view a histogram and mark the region corresponding to our P -value: histogram(~no.placebo, data=manydiffs, groups = no.placebo <= , width=.1) no.placebo nrow(subset(manydiffs, no.placebo <= )) / 5000 ## [1] This P -value, here approximately 0.36, represents the probability, in a world where Lithium does not help deter relapse into cocaine addiction, of obtaining a sample with a test statistic (difference in sample proportions) of or more. This P -value is not statistically significant under any of the usual significance levels α = 0.1, 0.05 or In fact, such samples statistics would arise about 36% of the time, which makes our sample statistic appear consistent with the null hypothesis. We fail to reject the null hypothesis. Example: Hypothesis Test for Positive Correlation (NFL Malevolence) The hypotheses (explained in the text, Section 4.4): The test statistic: H 0 : ρ = 0, H a : ρ > 0. cor(zpenyds ~ NFL_Malevolence, data=malevolentuniformsnfl) ## [1] Generation of many randomization statistics: 3

4 manycors <- do(5000) * cor(zpenyds ~ shuffle(nfl_malevolence), data=malevolentuniformsnfl) head(manycors) ## cor ## ## ## ## ## ## histogram(~cor, data=manycors, groups=cor>= ) cor The P -value: nrow(subset(manycors, cor>= )) / 5000 ## [1] In the case where the significance level α = 0.05, this result is statistically signficant, and we would reject the null hypothesis in favor of the alternative, concluding that there is a positive correlation. Example: Is the mean body temperature really 98.6? The hypotheses: The test statistic: mean(~bodytemp, data=bodytemp50) ## [1] H 0 : µ = 98.6, H a : µ The natural thing would be to simulate the bootstrap distribution for x, as when we constructed a confidence interval for the population mean µ: manymeans = do(5000) * mean(~bodytemp, data=resample(bodytemp50)) head(manymeans) ## mean ## ##

5 ## ## ## ## histogram(~mean, data=manymeans) mean But this cannot be an proper simulation of the null distribution, as it is not centered at the right place. It appears the center is about 98.26, the value of our point estimate x, not at the hypothesized (population) mean of 98.6, which is what happens whenever we bootstrap a mean. Our randomization statistics should not be the same as bootstrap statistics here, but need to be modified so that they are centered on the proposed mean The modification can simply be that we add to each of our sample means the difference between the intended center (98.6) and where they were centered above (at the sample mean x = 98.26): that is, we should add = 0.34: manymeans = do(5000) * (mean(~bodytemp, data=resample(bodytemp50)) ) names(manymeans) ## [1] "result" histogram(~result, data=manymeans, groups = abs(result-98.6)>=0.34) result We see this modified test statistic has a randomization distribution centered where it ought to be if serving as the null distribution. We have attempted to shade those regions in both tails corresponding to randomization statistics at least as extreme as ours, though there are very few. We obtain the approximate P -value by calculating the area in one tail and doubling it: 5

6 nrow(subset(manymeans, result <= 98.26)) * 2 / 5000 ## [1] Given this small P -value, we reject the null hypothesis and conclude that the actual (population) mean body temperature is something other than Example 4.34: A New Wrinkle on Finger Tapping and Caffeine This example has already been done adequately. Since it was a controlled, randomized experiment in which one treatment, either caffeine or placebo, was assigned randomly to each subject, we obtained our randomization distribution in a manner that also randomly assigned treatment values while adhering to the null hypothesis that treatment doesn t matter. We obtained one randomization statistic with the command diff(mean(taps ~ shuffle(caffeine), data=caffeinetaps)) and an entire distribution of such statistics by repeating this command often. Example 4.34 challenges us to imagine different ways of studying the question: Does caffeine increase tapping rates? Surely there are other approaches besides a controlled randomized experiment. The Locks have us consider two different studies one might undertake. 1. An observational study: Instead of assigning treatments, we find subjects who have already selfselected their own treatments, some having had caffeine (probably as part of a daily routine, drinking coffee in the morning), and others who have not. Subjects from both groups have their tap rates measured, and results of both variables are again recorded. 2. A matched pairs study: This time, subjects undergo both treatments, having their tap rates measured under each. The order of the treatments is assigned randomly, so that some receive caffeine first, while for others it is the placebo first. Each subject is, then, the source of two numbers, the caffeine tap rate and the placebo tap rate. Our effective data for each subject, however, would be the difference: (caffeine tap rate) (placebo tap rate). In each of these scenarios, the change in the manner in which data is collected calls for a change in the manner in which randomization statistics are produced. The easier of these two alternate study paradigms to handle in RStudio is the matched pairs case, which we discuss next. We will not delve into the observational study case, but suffice it to say that our treatment should be something like the approach suggested by Hunter Pham (see earlier course notes), but modified so that the null hypothesis is respected. So, imagine that we have gathered a random sample of 10 people for a matched pairs study on whether caffeine causes higher tapping rates. We randomly select 5 to undergo the caffeine treatment first followed by placebo, while the other 5 will receive placebo first and then caffeine. (For a blind study, which is preferred, subjects still do not know which treatment they receive first.) Here, displayed below, are some pretend data from a matched pairs experiment. This data frame, matchedpairscafftaps, is not part of any package you can load. Commands that generate it are given below. set.seed(50) matchedpairscafftaps = data.frame(placebo=round(runif(10,234,255), 1), caffeine=round(runif(10,241,258), 1), first=sample(c(rep("c",5),rep("p",5)))) matchedpairscafftaps$obsdiff = matchedpairscafftaps$caffeine - matchedpairscafftaps$placebo The resulting data set is displayed here. 6

7 matchedpairscafftaps ## placebo caffeine first obsdiff ## P -1.3 ## C 2.4 ## P 13.7 ## P -7.8 ## P 0.9 ## C 17.6 ## C 6.5 ## C -0.4 ## C 7.4 ## P 7.6 The null and alternative hypotheses, which should be understood before data has been collected are these H 0 : µ Diff = 0, H a : µ Diff > 0. From our data, we obtain the sample mean of observed differences in the usual way. mean(~obsdiff, data=matchedpairscafftaps) ## [1] 4.66 This is our test statistic. In generating randomization statistics, we want to use the data we have, but adhere to the null hypothesis, which implies that caffeine should not dictate which tap rate, the one under caffeine or placebo, is larger. This would mean that the sign of the difference is random, coming out positive or negative like flips of a coin come out heads or tails. The command sample(c(-1,1), 10, replace=true) ## [1] acts like 10 coin flips, except that it produces (-1) and 1 rather than H or T. We simulate one randomization statistic by this command mean(~ obsdiff*sample(c(-1,1), 10, replace=true), data=matchedpairscafftaps) ## [1] 3.64 and obtain a randomization distribution by repeating it multiple times: manympmeans = do(3000) * mean(~obsdiff*sample(c(-1,1),10,replace=true), data=matchedpairscafftaps) head(manympmeans) ## mean ## ## ## ## ## ## histogram(~mean, manympmeans, groups = mean >= 4.66) 7

8 mean Our approximate P -value is nrow(subset(manympmeans, mean>=4.66)) / 3000 ## [1]

Introduction to Hypothesis Testing T.Scofield 10/03/2016

Introduction to Hypothesis Testing T.Scofield 10/03/016 Hypothesis Testing: the steps 1. Identify the research question, along with relevant variables.. Formulate hypotheses (null and alternative) appropriate