Introduction to Hypothesis Testing T.Scofield 10/03/2016

Size: px

Start display at page:

Download "Introduction to Hypothesis Testing T.Scofield 10/03/2016"

Belinda Ward
5 years ago
Views:

1 Introduction to Hypothesis Testing T.Scofield 10/03/016 Hypothesis Testing: the steps 1. Identify the research question, along with relevant variables.. Formulate hypotheses (null and alternative) appropriate to the research question. 3. Obtain a random sample, and a corresponding sample statistic (called the test statistic).. Determine the null distribution i.e., the sampling distribution of the test statistic in a world where the null hypothesis is true. 5. Determine the P -value i.e., the likelihood that your test statistic, or something even more extreme, occurs under the null hypothesis. An Example: Do Dogs Resemble Their Owners? In the text, we are told how 5 photos of dog owners were placed in front of panelists, along with photos of dogs, one of which the person s pet. The panelist was asked to say which dog looked more like the person. In 16 out of 5 tries, the dog who was chosen was, in fact, the owner s pet. 1. A research question we could write for this scenario is: Do people tend to choose pets that look like them? The variable used here is whether the dog that looks more like the owner is, in fact, the owner s dog," a categorical variable.. Let p stand for the proportion, among the entire list of people who own dogs, whose dog could be identified correctly based on similarity in appearance to the owner. One presumes that, if owner s do not tend to choose dogs that look more like themselves, this identification process would be like coin flips, with correct results 50% of the time. So, we have hypotheses: H 0 : p = 0.5, H a : p > A sample has already been obtained, incorporating 5 dog owners. The sample statistic could be either the count of correct identifications (16), or the proportion of correct identifications (16/5 = 0.6). If we use sample proportion (ˆp = 0.6) as our test statistic, the null distribution is the sampling distribution of ˆp-values in a world where the null hypothesis (p = 0.5) is true. In other words, we should draw many samples of size 5 where the probability of a successful identification is 50%. Here is one way to do this. manyphats = do(3000) * prop(sample(c("correctid","incorrectid"), size=5, replace=true)) head(manyphats) CorrectID histogram(~correctid, data=manyphats, width=0.0) 1

2 Density CorrectID From this view, we see that the value ˆp = 0.6, along with values higher than that, are somewhat on the rarer side. To see just how rare, we might do this: nrow(subset(manyphats, CorrectID >= 0.6)) / 3000 [1] The result is a number we call the P -value. It tells us how often a result at least as extreme as ours occurs in a world where the null hypothesis is true. When this P -value is small enough, we say our result is statistically significant (akin to saying the case has been proved beyond reasonable doubt), and reject the null hypothesis in favor of the alternative. Our P -value is probably not small enough to reach this conclusion. Instead, we would probably fail to reject the null hypothesis. More on this later. The same null distribution using rflip() The complex command used to produce the null distribution was this: manyphats = do(3000) * prop(sample(c("correctid","incorrectid"), size=5, replace=true)) There is, as often is the case with RStudio, more than one way to achieve this same purpose. One way involves use of the rflip() command, which simulates tosses of a coin. rflip(5) # gives a sequence of 5 simulated coin tosses Flipping 5 coins [ Prob(Heads) = 0.5 ]... T T T T H H T T H T T T H T T T H H H H T T H H H Number of Heads: 11 [Proportion Heads: 0.] In tandem with the do() command, one can achieve the null distribution with less typing: manyruns = do(3000) * rflip(5) head(manyruns) n heads tails prop

3 This simpler command produces a data frame with more columns, but the heads column contains the number of heads out of 5, while the prop column contains the corresponding sample proportions ˆp. You could view a histogram of this (simulated) null distribution in the usual way: histogram(~prop, data=manyruns, width=.0) The same null distribution using rbinom() An even simpler-looking command that produces a simulation of the null distribution is this one: manyphats = rbinom(5000, size=5, prob=.5) / 5 head(manyphats) [1] Note that manyphats is a vector, not a data frame. (It does not have any columns.) If you compare the run time of this command, you will note that it is much, much faster than the others, because it does not employ the do() command. To view the null distribution using this method, type histogram(~manyphats, breaks=(0:5)/5, v=.6, groups=manyphats >.6) I ve added a couple switches to this command which are probably unfamiliar to you, but help to depict the P -value as the area (shaded differently) in the right tail. Another Example: Caffeine Taps (hypotheses concerning the difference of population means) The data is found in the data frame CaffeineTaps, and the question concerns whether caffeine ingestion influences the count of taps in some time frame for subjects in the study. We have hypotheses From the sample, we have mean(taps ~ Caffeine, data=caffeinetaps) No Yes H 0 : µ C µ N = 0, H a : µ C µ N > 0. yielding a test statistic x C x N of 3.5, which we compute using a diff() command surrounding the previous one: diff(mean(taps ~ Caffeine, data=caffeinetaps)) Yes 3.5 The way proposed in the text for producing randomization statistics was to shuffle assignment of Caffeine, consistent with the idea (proposed in the null hypothesis) that caffeine has no influence: diff(mean(taps ~ shuffle(caffeine), data=caffeinetaps)) Doing this once produces one instance of a difference x C X N one might see in a world where the null hypothesis is true. Doing it many times allows us the opportunity to simulate the distribution of ( x C x N )- values (what we take to be the approximate null distribution). 3

4 manydiffs = do(5000) * diff(mean(taps ~ shuffle(caffeine), data=caffeinetaps)) head(manydiffs) Yes histogram(~yes, data=manydiffs, breaks=seq(-.5,.5,.), groups=yes>=3.5) Percent of Total Yes To obtain an approximate P -value, we need to figure out how frequently values as extreme (or more extreme) than ours of 3.5 occur. One method: nrow(subset(manydiffs, Yes>=3.5)) / 5000 [1] 0.00 Here, we divided by 5000 because we seek the proportion of all our tries that produced results 3.5 or larger. Another command, slightly shorter, that achieves the same calculation is sum(manydiffs$yes >= 3.5) / 5000 [1] 0.00 Very small P -values such as this are strong evidence for rejecting the null hypothesis in favor of the alternative. Thus, we conclude that µ C µ N > 0, or that the mean tap rate for recipients of caffeine is higher than that for nonrecipients. Hunter s question (skip if not interested) You asked whether one would obtain the same distribution if we used bootstrapping to extract bootstrap samples from each group the caffeine group and the non-caffeine one and then subtracted them. More specifically, you asked if the distribution of these differences might not look the same as the randomization distribution we obtained above. I answered, no, and below we give a demonstration of this. First, to compute a bootstrapped mean from individuals who received caffeine, we might use this command: mean(~taps, data=resample(subset(caffeinetaps, Caffeine=="Yes")))

5 [1] 7.3 Of course, we want a bootstrapped mean from participants who did not receive caffeine, and then we want to subtract the resulting means: mean(~taps, data=resample(subset(caffeinetaps, Caffeine=="Yes"))) [1] mean(~taps, data=resample(subset(caffeinetaps, Caffeine=="No"))) [1] -5 This (long) command (there may be ways to shorten it) produces one difference of bootstrapped means. We would require many, so would iterate the command many times: newmanydiffs = do(5000) * (mean(~taps, data=resample(subset(caffeinetaps, Caffeine=="Yes"))) - mean(~taps, data=resample(subset(caffeinetaps, Caffeine=="No")))) head(newmanydiffs) result histogram(~result, data=newmanydiffs) Density result If you compare this histogram with the one above, it is not so much the shape or spread that differs, but the center differs very much. 5

Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016

Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Randomization Distributions in Two-Proportion Settings By calling our setting a two proportion one, I mean that the data frame