BIOS: 4120 Lab 11 Answers April 3-4, 2018

Similar documents
1. The Normal Distribution, continued

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Section 2.3: Simple Linear Regression: Predictions and Inference

Chapter 8. Interval Estimation

Confidence Intervals. Dennis Sun Data 301

Week 4: Simple Linear Regression III

Lab #9: ANOVA and TUKEY tests

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

2.9 Linear Approximations and Differentials

Week 4: Simple Linear Regression II

A. Using the data provided above, calculate the sampling variance and standard error for S for each week s data.

STA215 Inference about comparing two populations

Notes on Simulations in SAS Studio

Chapters 5-6: Statistical Inference Methods

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

appstats6.notebook September 27, 2016

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Week 7: The normal distribution and sample means

Problem Set #8. Econ 103

University of South Carolina Math 222: Math for Elementary Educators II Instructor: Austin Mohr Section 002 Fall 2010.

Confidence Interval of a Proportion

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Individual Covariates

STA 570 Spring Lecture 5 Tuesday, Feb 1

Robust Methods of Performing One Way RM ANOVA in R

9.2 Types of Errors in Hypothesis testing

Descriptive Statistics, Standard Deviation and Standard Error

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

Continuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit.

Binary Diagnostic Tests Clustered Samples

Chapter 2: The Normal Distributions

CS 237 Fall 2018, Homework 08 Solution

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc.

Stat 204 Sample Exam 1 Name:

Chapter 2 Modeling Distributions of Data

3 Graphical Displays of Data

Solutions to Some Examination Problems MATH 300 Monday 25 April 2016

Statistical Tests for Variable Discrimination

AXIOMS FOR THE INTEGERS

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Unit 5: Estimating with Confidence

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Minitab Guide for MA330

Let s go through some examples of applying the normal distribution in practice.

Here are some of the more basic curves that we ll need to know how to do as well as limits on the parameter if they are required.

Lecture 6: Chapter 6 Summary

Lesson 20: Every Line is a Graph of a Linear Equation

Regression Analysis and Linear Regression Models

The Power and Sample Size Application

Age & Stage Structure: Elephant Model

Screening Design Selection

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

The Bootstrap and Jackknife

Dealing with Data Gradients: Backing Out & Calibration

Slides 11: Verification and Validation Models

Section 2.2 Normal Distributions. Normal Distributions

3 Graphical Displays of Data

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals

Lab 3: Sampling Distributions

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Lecture 31 Sections 9.4. Tue, Mar 17, 2009

6-1 THE STANDARD NORMAL DISTRIBUTION

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Lab 5 - Risk Analysis, Robustness, and Power

We have seen that as n increases, the length of our confidence interval decreases, the confidence interval will be more narrow.

Chapter 6: DESCRIPTIVE STATISTICS

Lecture 3: Recursion; Structural Induction

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Table of Laplace Transforms

Modelling Proportions and Count Data

Bulgarian Math Olympiads with a Challenge Twist

More Summer Program t-shirts

3.5 Equations of Lines and Planes

Generating Functions

36-402/608 HW #1 Solutions 1/21/2010

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Modelling Proportions and Count Data

1. y = f(x) y = f(x + 3) 3. y = f(x) y = f(x 1) 5. y = 3f(x) 6. y = f(3x) 7. y = f(x) 8. y = f( x) 9. y = f(x 3) + 1

10.4 Linear interpolation method Newton s method

we wish to minimize this function; to make life easier, we may minimize

MEASURES OF CENTRAL TENDENCY

Statistics Case Study 2000 M. J. Clancy and M. C. Linn

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

4.7 Approximate Integration

STATA 13 INTRODUCTION

Multiple Linear Regression Excel 2010Tutorial For use when at least one independent variable is qualitative

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.

nquery Sample Size & Power Calculation Software Validation Guidelines

SL1.Trig.1718.notebook. April 15, /26 End Q3 Pep talk. Fractal Friday: April 6. I want to I will I can I do

5.6 Rational Equations

Department of Industrial Engineering. Chap. 8: Process Capability Presented by Dr. Eng. Abed Schokry

Simple Graph. General Graph

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms

Lecture Notes 3: Data summarization

Averages and Variation

Transcription:

BIOS: 4120 Lab 11 Answers April 3-4, 2018 In today s lab we will briefly revisit Fisher s Exact Test, discuss confidence intervals for odds ratios, and review for quiz 3. Note: The material in the first part of this lab will not be on quiz 3. Remember that Fisher s Exact Test can be performed using the fisher.test() function. In lab last week using the lister dataset, we saw that this function not only gives you the p-value but also the odds ratio and a 95% CI for the odds ratio. lister <- read.delim("http://myweb.uiowa.edu/pbreheny/data/lister.txt") lister.table <- table(lister) print(lister.table) ## Outcome ## Group Died Survived ## Control 16 19 ## Sterile 6 34 fisher.test(lister.table) ## ## Fisher's Exact Test for Count Data ## ## data: lister.table ## p-value = 0.005018 ## alternative hypothesis: true odds ratio is not equal to 1 ## 95 percent confidence interval: ## 1.437621 17.166416 ## sample estimates: ## odds ratio ## 4.666849 Note that this Fisher s Exact Test odds ratio is calculated using the formula: a/b c/d = ad bc, where a, b, c, and d are defined by their location according to the following table: ## Success Failure ## Option 1 a b ## Option 2 c d The easiest way to think of this odds ratio is to say that the odds of Success are 100*(OR-1) percent higher if we do Option 1 than if we do Option 2. Be careful when calculating the OR that you interpret what you calculated. For instance, if I were to switch my Success and Failure column names, I d be calculating the OR for Failure given Option 1 instead of Success. 1

CI for OR Calculating a confidence interval for an odds ratio is slightly more complicated than what we ve done so far because it s on a different scale. Here are the steps: ad 1. Find the odds ratio: bc 2. Find the log odds ratio: log (OR) 1 3. Find the error term: SE logor = a + 1 b + 1 c + 1 d 4. Calculate the CI on this scale: CI logor = log (OR) ± z α/2 SE logor 5. Calculate the CI on the original scale: CI = exp (CI logor ) To change things up using the same lister dataset, let s find a CI for the OR for survival given non-sterile procedure. This means that now our OR is going to be bc ad. ## Outcome ## Group Died Survived ## Control 16 19 ## Sterile 6 34 ## Outcome ## Group Died Survived ## Control a b ## Sterile c d 1. Find the OR. OR <- (19*6)/(16*34) print(or) ## [1] 0.2095588 2. Find the log(or) logor<-log(or) print(logor) ## [1] -1.562751 3. Find the error term. SElogOR<-sqrt((1/19)+(1/6)+(1/16)+(1/34)) print(selogor) ## [1] 0.557862 4. Calculate the CI on this scale. logci <- logor + qnorm(c(.025,.975)) * SElogOR print(logci) ## [1] -2.6561402-0.4693614 5. Calculate the CI on the original scale. CI <- exp(logci) print(ci) ## [1] 0.07021873 0.62540154 2

Interpretation: We can say with 95% confidence that the true odds ratio for survival with non-sterile procedure is (0.07, 0.625), indicating significantly reduced odds of survival on the non-sterile procedure compared to the sterile procedure. (Significant because the CI does not include the value 1.) Note that this is the same as the relative odds of dying with the sterile surgery (calculated in the lecture notes). This happens because the OR is symmetric. Quiz 3 Review Firstly, a flowchart for when to use each distribution: Figure 1: Flowchart General approach to statistics problems: Step 1: Figure out and list the given information. Write out the values of n, x, s, etc. for everything given in the prompt. Step 2: Figure out what the problem is asking. Write the null and alternative hypotheses (if applicable). What is the population we re looking at? What kind of distributions can be used with the information given? Which is most appropriate in this situation? (see chart) Do we want a test or an interval? Most importantly, what is the question being asked? If Defining Hypotheses: The null case is the one based on no change, difference, or improvement. The alternative is what we want to show or what we are looking to prove. Remember that the hypotheses describe and apply to a population parameter. 3

Step 3: Write out the equation for the test statistic or interval. In this class, most test statistics will similar to the form: z = ( ˆX X 0) SE, and most confidence intervals will look ˆX similar to this: ˆX z SE ˆX < X < ˆX + z SE ˆX, where: X is the parameter of interest ˆX is the estimate of the parameter X 0 is the value of the parameter under the null hypothesis SE ˆX is the standard error of the estimate z is the critical value for the interval Step 4: Plug in the values. Calculate the statistic or interval using the given values. Find the p-value (if applicable). This means that if we re conducting a test, compare the test statistic to the distribution we picked in step 2, and find the probability of being as or more extreme than the value we observed. Step 5: Interpret in the context of the problem. This might very well be the most important step. Our interpretation is dependent on our study, so it varies. Below are some examples. Situation: Paired study looking at the effect of oat bran consumption compared to cornflakes where the calculated p-value was 0.005 There is strong evidence that eating oat bran as opposed to corn flakes leads to lower cholesterol levels. Situation: Weights of U.S. Males (mean believed to be 172.2) where observed mean was 180, and the calculated p-value was 0.09 This study provides is borderline evidence to suggest that the true mean weight of males in the United States is greater than 172.2 pounds. Modifiers based on p-value: p Evidence against null >.1 not.1 borderline.05 moderately.01 strongly.001 overwhelmingly Other Notes: Effect size: Recall the Nexium/Prilosec example. If we have a large enough n, we can find statistical significance in the smallest true difference in means. Here, the difference, while real, was only 3%. That s effectively REALLY close to being the same. For practical purposes, this difference doesn t matter, but the p-value doesn t tell us that the actual difference is little, just that it is significant. 4

Paired data: Paired data can be analyzed using the binomial distribution (proportion of patients that saw any improvement) or using continuous methods (using 1-sample methods on a set of the differences between the two). Practice Problems 1. We are interested in testing whether a certain at-risk population for diabetes has a daily sugar intake that is equal to the general population, which is equal to 77 grams/day. A sample of size 37 was taken from this at-risk population, and we obtained a sample mean of 80 and sample standard deviation of 11 grams. Perform a hypothesis test to test whether this population has a significantly different mean sugar intake from 77 grams. 2. The distribution of LDL cholesterol levels in a certain population is approximately normal with mean 90 mg/dl and standard deviation 8 mg/dl. (a) What is the probability an individual will have a LDL cholesterol level above 95 mg/dl? (b) Suppose we have a sample of 10 people from this population. What is the probability of exactly 3 of them being above 95 mg/dl? (c) Take the sample of size 10, as in part b. What is the probability that the sample mean will be above 95 mg/dl? (d) Suppose we take 5 samples of size 10 from the population. What is the probability that at least one of the sample means will be greater than 95 mg/dl? 3. In the following scenarios, identify what will happen to the power of a hypothesis test: (a) We increase the sample size. (b) The standard deviation of the sample is larger than what we expected. (c) Our effect size moves from 5 units to 10 units. 5

Solutions Problem 1 mu <- 77 xbar <- 80 s <- 11 n <- 37 # Use a t-test for the sample t <- (xbar-mu)/(s/sqrt(n)) 2*pt(abs(t),n-1,lower.tail=FALSE) ## [1] 0.1058177 We do not have significant evidence to conclude that the at-risk mean sugar intake is different from the general population mean sugar intake (p = 0.11). Problem 2 ## Part 2a mu <- 90 sigma <- 8 (p <- pnorm(95,mu,sigma,lower.tail=false)) ## [1] 0.2659855 ## Part 2b dbinom(3,10,p) ## [1] 0.2592314 ## Part 2c z <- (95-90)/(sigma/sqrt(10)) (p <- pnorm(z,lower.tail=false)) ## [1] 0.02405341 ## Part 2d 1-dbinom(0,5,p) ## [1] 0.1146189 Problem 3 Part A: Power increases Part B: Power decreases Part C: Power increases 6