Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016

Size: px
Start display at page:

Download "Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016"

Transcription

1 Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Randomization Distributions in Two-Proportion Settings By calling our setting a two proportion one, I mean that the data frame has two binary categorical variables, when the one that delineates which of two groups a subject comes from serves as the explanatory variable, and the other, the response variable, also has just two outcomes. In the cocaine addiction data, we have an explanatory variable, treatment, which has three levels: Desipramine, Lithium, and Placebo. We cut that back to two by ignoring one set of patients, perhaps those receiving Desipramine, thereby giving us just two groups to consider. The response variable is relapsed or not? which has just two values, yes or no. We focus on the relapsers. Natural hypotheses for a study to see if Lithium helps to decrease the chance of relapse are H 0 : p L p P = 0, H a : p L p P < 0. The sample proportions among the lithium and placebo groups are ˆp L = 18/24 and ˆp P = 20/24, giving us test statistic ˆp L ˆp P = = 2. = Like the study about tapping fingers under the influence of caffeine, this study is an experiment, where the treatment (Lithium or Placebo) was randomly assigned to patients. When we generate a randomization distribution, we want to be faithful to this process, even as we take the null hypothesis into account. That is, the mental image of dropping slips of paper into two bags, one bag containing the 48 relapse results (38 yes and 10 no ) and the other containing the 48 treatments (24 Lithiums and 24 Placebos ), and randomly assigning the latter to the former as we select our randomization sample, is achieving both goals. Generating a randomization distribution, however, is trickier in RStudio for this situation than in earlier scenarios, primarily because of the work we must do to prepare data for randomization samples. You may well prefer to use StatKey, the software meant to accompany the textbook, over RStudio, for cases involving two proportions. I will, however, provide details in RStudio for your perusal. The main difficulty, as indicated above, is preparing data. Here are two approaches. Approach 1: Recreate the data from scratch We have done this sort of thing once before, back in Section 2.1. Perhaps you recall the commands. part1 <- do(6) * data.frame(drug="lithium", Relapse="no") part2 <- do(18) * data.frame(drug="lithium", Relapse="yes") part3 <- do(4) * data.frame(drug="placebo", Relapse="no") part4 <- do(20) * data.frame(drug="placebo", Relapse="yes") addicttreatments <- rbind(part1, part2, part3, part4) Approach 2: Filtering the supplied data frame It turns out we don t actually need to recreate the data, as it has been supplied to us as part of the Lock5withR package in a data frame called CocaineTreatment. But working with it is not so straightforward as it 1

2 would at first seem, because this data frame contains all the patients, including those who received the drug called Desipramine. We can select the desired subset by leaving out these subjects: myfiltereddata <- subset(cocainetreatment, Drug!= "Desipramine") However, there seems to be a lingering memory that there were three levels for the Drug variable. You see this, for instance, when you produce a frequency table on Drug: tally(~drug, myfiltereddata) ## Drug ## Desipramine Lithium Placebo ## While the count of Desipramine patients is 0, we would prefer that our filtered data frame not know Desipramine is part of this study. One way to make it forget is to combine the removal of Desipramine patients with the droplevels() command. myfiltereddata <- droplevels(subset(cocainetreatment, Drug!= "Desipramine")) tally(~drug, myfiltereddata) ## Drug ## Lithium Placebo ## Now our Drug variable truly has just two levels in the myfiltereddata data frame. Once data has been prepared... If you carried out the commands above, you now have two data frames, addicttreatments and myfiltereddata, which can be used for our analysis. Either will work, but I will use myfiltereddata. head(myfiltereddata) ## Drug Relapse ## 25 Lithium no ## 26 Lithium yes ## 27 Lithium yes ## 28 Lithium yes ## 29 Lithium yes ## 30 Lithium no We obtain our test statistic from the sample itself: diff(prop(relapse~drug, data=myfiltereddata)) ## no.placebo ## As when dealing with the difference of two means (see the example using data from CaffeineTaps in a prior handout), our null hypothesis dictates that the drug received (Lithium vs. Placebo) is not actually a factor, and we should generate many randomization statistics by shuffling values of the explanatory variable. One randomization statistic is obtained with the command diff(prop(relapse~shuffle(drug), data=myfiltereddata)) ## no.placebo ## and this may be repeated many times to obtain a randomization distribution: 2

3 manydiffs <- do(5000) * diff(prop(relapse~shuffle(drug), data=myfiltereddata)) head(manydiffs) ## no.placebo ## ## ## ## ## ## The column, containing 5000 randomization statistics, has been given the curious name no.placebo. We may view a histogram and mark the region corresponding to our P -value: histogram(~no.placebo, data=manydiffs, groups = no.placebo <= , width=.1) no.placebo nrow(subset(manydiffs, no.placebo <= )) / 5000 ## [1] This P -value, here approximately 0.36, represents the probability, in a world where Lithium does not help deter relapse into cocaine addiction, of obtaining a sample with a test statistic (difference in sample proportions) of or more. This P -value is not statistically significant under any of the usual significance levels α = 0.1, 0.05 or In fact, such samples statistics would arise about 36% of the time, which makes our sample statistic appear consistent with the null hypothesis. We fail to reject the null hypothesis. Example: Hypothesis Test for Positive Correlation (NFL Malevolence) The hypotheses (explained in the text, Section 4.4): The test statistic: H 0 : ρ = 0, H a : ρ > 0. cor(zpenyds ~ NFL_Malevolence, data=malevolentuniformsnfl) ## [1] Generation of many randomization statistics: 3

4 manycors <- do(5000) * cor(zpenyds ~ shuffle(nfl_malevolence), data=malevolentuniformsnfl) head(manycors) ## cor ## ## ## ## ## ## histogram(~cor, data=manycors, groups=cor>= ) cor The P -value: nrow(subset(manycors, cor>= )) / 5000 ## [1] In the case where the significance level α = 0.05, this result is statistically signficant, and we would reject the null hypothesis in favor of the alternative, concluding that there is a positive correlation. Example: Is the mean body temperature really 98.6? The hypotheses: The test statistic: mean(~bodytemp, data=bodytemp50) ## [1] H 0 : µ = 98.6, H a : µ The natural thing would be to simulate the bootstrap distribution for x, as when we constructed a confidence interval for the population mean µ: manymeans = do(5000) * mean(~bodytemp, data=resample(bodytemp50)) head(manymeans) ## mean ## ##

5 ## ## ## ## histogram(~mean, data=manymeans) mean But this cannot be an proper simulation of the null distribution, as it is not centered at the right place. It appears the center is about 98.26, the value of our point estimate x, not at the hypothesized (population) mean of 98.6, which is what happens whenever we bootstrap a mean. Our randomization statistics should not be the same as bootstrap statistics here, but need to be modified so that they are centered on the proposed mean The modification can simply be that we add to each of our sample means the difference between the intended center (98.6) and where they were centered above (at the sample mean x = 98.26): that is, we should add = 0.34: manymeans = do(5000) * (mean(~bodytemp, data=resample(bodytemp50)) ) names(manymeans) ## [1] "result" histogram(~result, data=manymeans, groups = abs(result-98.6)>=0.34) result We see this modified test statistic has a randomization distribution centered where it ought to be if serving as the null distribution. We have attempted to shade those regions in both tails corresponding to randomization statistics at least as extreme as ours, though there are very few. We obtain the approximate P -value by calculating the area in one tail and doubling it: 5

6 nrow(subset(manymeans, result <= 98.26)) * 2 / 5000 ## [1] Given this small P -value, we reject the null hypothesis and conclude that the actual (population) mean body temperature is something other than Example 4.34: A New Wrinkle on Finger Tapping and Caffeine This example has already been done adequately. Since it was a controlled, randomized experiment in which one treatment, either caffeine or placebo, was assigned randomly to each subject, we obtained our randomization distribution in a manner that also randomly assigned treatment values while adhering to the null hypothesis that treatment doesn t matter. We obtained one randomization statistic with the command diff(mean(taps ~ shuffle(caffeine), data=caffeinetaps)) and an entire distribution of such statistics by repeating this command often. Example 4.34 challenges us to imagine different ways of studying the question: Does caffeine increase tapping rates? Surely there are other approaches besides a controlled randomized experiment. The Locks have us consider two different studies one might undertake. 1. An observational study: Instead of assigning treatments, we find subjects who have already selfselected their own treatments, some having had caffeine (probably as part of a daily routine, drinking coffee in the morning), and others who have not. Subjects from both groups have their tap rates measured, and results of both variables are again recorded. 2. A matched pairs study: This time, subjects undergo both treatments, having their tap rates measured under each. The order of the treatments is assigned randomly, so that some receive caffeine first, while for others it is the placebo first. Each subject is, then, the source of two numbers, the caffeine tap rate and the placebo tap rate. Our effective data for each subject, however, would be the difference: (caffeine tap rate) (placebo tap rate). In each of these scenarios, the change in the manner in which data is collected calls for a change in the manner in which randomization statistics are produced. The easier of these two alternate study paradigms to handle in RStudio is the matched pairs case, which we discuss next. We will not delve into the observational study case, but suffice it to say that our treatment should be something like the approach suggested by Hunter Pham (see earlier course notes), but modified so that the null hypothesis is respected. So, imagine that we have gathered a random sample of 10 people for a matched pairs study on whether caffeine causes higher tapping rates. We randomly select 5 to undergo the caffeine treatment first followed by placebo, while the other 5 will receive placebo first and then caffeine. (For a blind study, which is preferred, subjects still do not know which treatment they receive first.) Here, displayed below, are some pretend data from a matched pairs experiment. This data frame, matchedpairscafftaps, is not part of any package you can load. Commands that generate it are given below. set.seed(50) matchedpairscafftaps = data.frame(placebo=round(runif(10,234,255), 1), caffeine=round(runif(10,241,258), 1), first=sample(c(rep("c",5),rep("p",5)))) matchedpairscafftaps$obsdiff = matchedpairscafftaps$caffeine - matchedpairscafftaps$placebo The resulting data set is displayed here. 6

7 matchedpairscafftaps ## placebo caffeine first obsdiff ## P -1.3 ## C 2.4 ## P 13.7 ## P -7.8 ## P 0.9 ## C 17.6 ## C 6.5 ## C -0.4 ## C 7.4 ## P 7.6 The null and alternative hypotheses, which should be understood before data has been collected are these H 0 : µ Diff = 0, H a : µ Diff > 0. From our data, we obtain the sample mean of observed differences in the usual way. mean(~obsdiff, data=matchedpairscafftaps) ## [1] 4.66 This is our test statistic. In generating randomization statistics, we want to use the data we have, but adhere to the null hypothesis, which implies that caffeine should not dictate which tap rate, the one under caffeine or placebo, is larger. This would mean that the sign of the difference is random, coming out positive or negative like flips of a coin come out heads or tails. The command sample(c(-1,1), 10, replace=true) ## [1] acts like 10 coin flips, except that it produces (-1) and 1 rather than H or T. We simulate one randomization statistic by this command mean(~ obsdiff*sample(c(-1,1), 10, replace=true), data=matchedpairscafftaps) ## [1] 3.64 and obtain a randomization distribution by repeating it multiple times: manympmeans = do(3000) * mean(~obsdiff*sample(c(-1,1),10,replace=true), data=matchedpairscafftaps) head(manympmeans) ## mean ## ## ## ## ## ## histogram(~mean, manympmeans, groups = mean >= 4.66) 7

8 mean Our approximate P -value is nrow(subset(manympmeans, mean>=4.66)) / 3000 ## [1]

Introduction to Hypothesis Testing T.Scofield 10/03/2016

Introduction to Hypothesis Testing T.Scofield 10/03/2016 Introduction to Hypothesis Testing T.Scofield 10/03/016 Hypothesis Testing: the steps 1. Identify the research question, along with relevant variables.. Formulate hypotheses (null and alternative) appropriate

More information

9.2 Types of Errors in Hypothesis testing

9.2 Types of Errors in Hypothesis testing 9.2 Types of Errors in Hypothesis testing 1 Mistakes we could make As I mentioned, when we take a sample we won t be 100% sure of something because we do not take a census (we only look at information

More information

Confidence Intervals. Dennis Sun Data 301

Confidence Intervals. Dennis Sun Data 301 Dennis Sun Data 301 Statistical Inference probability Population / Box Sample / Data statistics The goal of statistics is to infer the unknown population from the sample. We ve already seen one mode of

More information

7.2: Chi-Square Test for Association T.Scofield Nov. 17, 2016

7.2: Chi-Square Test for Association T.Scofield Nov. 17, 2016 72: Chi-Square Test for Association TScofield Nov 17, 2016 The goal of this section is to provide means for investigating whether there is an association between two categorical variables Before proceeding,

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 14: Introduction to hypothesis testing (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 10 Hypotheses 2 / 10 Quantifying uncertainty Recall the two key goals of inference:

More information

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Data 8 Final Review #1

Data 8 Final Review #1 Data 8 Final Review #1 Topics we ll cover: Visualizations Arrays and Table Manipulations Programming constructs (functions, for loops, conditional statements) Chance, Simulation, Sampling and Distributions

More information

Announcements. Unit 2: Probability and distributions Lecture 1: Probability and conditional probability. Statistics 101. Survey.

Announcements. Unit 2: Probability and distributions Lecture 1: Probability and conditional probability. Statistics 101. Survey. Anuncements Anuncements Unit 2: and distributions Lecture 1: and conditional probability Statistics 101 Mine Çetinkaya-Rundel September 10, 2013 PS1 due Thursday. TA office hours: Christine: Monday 5-7pm

More information

STA215 Inference about comparing two populations

STA215 Inference about comparing two populations STA215 Inference about comparing two populations Al Nosedal. University of Toronto. Summer 2017 June 22, 2017 Two-sample problems The goal of inference is to compare the responses to two treatments or

More information

Minitab Guide for MA330

Minitab Guide for MA330 Minitab Guide for MA330 The purpose of this guide is to show you how to use the Minitab statistical software to carry out the statistical procedures discussed in your textbook. The examples usually are

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Hypothesis Test Exercises from Class, Oct. 12, 2018

Hypothesis Test Exercises from Class, Oct. 12, 2018 Hypothesis Test Exercises from Class, Oct. 12, 218 Question 1: Is there a difference in mean sepal length between virsacolor irises and setosa ones? Worked on by Victoria BienAime and Pearl Park Null Hypothesis:

More information

2) In the formula for the Confidence Interval for the Mean, if the Confidence Coefficient, z(α/2) = 1.65, what is the Confidence Level?

2) In the formula for the Confidence Interval for the Mean, if the Confidence Coefficient, z(α/2) = 1.65, what is the Confidence Level? Pg.431 1)The mean of the sampling distribution of means is equal to the mean of the population. T-F, and why or why not? True. If you were to take every possible sample from the population, and calculate

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

Introductory Applied Statistics: A Variable Approach TI Manual

Introductory Applied Statistics: A Variable Approach TI Manual Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How

More information

Goodness-of-Fit Testing T.Scofield Nov. 16, 2016

Goodness-of-Fit Testing T.Scofield Nov. 16, 2016 Goodness-of-Fit Testing T.Scofield Nov. 16, 2016 We do goodness-of-fit testing with a single categorical variable, to see if the distribution of its sampled values fits a specified probability model. The

More information

Lecture 31 Sections 9.4. Tue, Mar 17, 2009

Lecture 31 Sections 9.4. Tue, Mar 17, 2009 s for s for Lecture 31 Sections 9.4 Hampden-Sydney College Tue, Mar 17, 2009 Outline s for 1 2 3 4 5 6 7 s for Exercise 9.17, page 582. It is believed that 20% of all university faculty would be willing

More information

Chapter 8. Interval Estimation

Chapter 8. Interval Estimation Chapter 8 Interval Estimation We know how to get point estimate, so this chapter is really just about how to get the Introduction Move from generating a single point estimate of a parameter to generating

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Condence Intervals about a Single Parameter:

Condence Intervals about a Single Parameter: Chapter 9 Condence Intervals about a Single Parameter: 9.1 About a Population Mean, known Denition 9.1.1 A point estimate of a parameter is the value of a statistic that estimates the value of the parameter.

More information

BIOS: 4120 Lab 11 Answers April 3-4, 2018

BIOS: 4120 Lab 11 Answers April 3-4, 2018 BIOS: 4120 Lab 11 Answers April 3-4, 2018 In today s lab we will briefly revisit Fisher s Exact Test, discuss confidence intervals for odds ratios, and review for quiz 3. Note: The material in the first

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015 STAT 113: Lab 9 Colin Reimer Dawson Last revised November 10, 2015 We will do some of the following together. The exercises with a (*) should be done and turned in as part of HW9. Before we start, let

More information

TI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock

TI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock TI-83 Users Guide to accompany by Lock, Lock, Lock, Lock, and Lock TI-83 Users Guide- 1 Getting Started Entering Data Use the STAT menu, then select EDIT and hit Enter. Enter data for a single variable

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Binary Diagnostic Tests Clustered Samples

Binary Diagnostic Tests Clustered Samples Chapter 538 Binary Diagnostic Tests Clustered Samples Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. In the twogroup case, each cluster

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Using Large Data Sets Workbook Version A (MEI)

Using Large Data Sets Workbook Version A (MEI) Using Large Data Sets Workbook Version A (MEI) 1 Index Key Skills Page 3 Becoming familiar with the dataset Page 3 Sorting and filtering the dataset Page 4 Producing a table of summary statistics with

More information

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Probability and Statistics. Copyright Cengage Learning. All rights reserved. Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.6 Descriptive Statistics (Graphical) Copyright Cengage Learning. All rights reserved. Objectives Data in Categories Histograms

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Notes on Simulations in SAS Studio

Notes on Simulations in SAS Studio Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write

More information

Minitab on the Math OWL Computers (Windows NT)

Minitab on the Math OWL Computers (Windows NT) STAT 100, Spring 2001 Minitab on the Math OWL Computers (Windows NT) (This is an incomplete revision by Mike Boyle of the Spring 1999 Brief Introduction of Benjamin Kedem) Department of Mathematics, UMCP

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

The Power and Sample Size Application

The Power and Sample Size Application Chapter 72 The Power and Sample Size Application Contents Overview: PSS Application.................................. 6148 SAS Power and Sample Size............................... 6148 Getting Started:

More information

Statistical Analysis of MRI Data

Statistical Analysis of MRI Data Statistical Analysis of MRI Data Shelby Cummings August 1, 2012 Abstract Every day, numerous people around the country go under medical testing with the use of MRI technology. Developed in the late twentieth

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Minimize bias: Minimize random noise: Randomize Conceal allocation Blind. Standardization of measurements

Minimize bias: Minimize random noise: Randomize Conceal allocation Blind. Standardization of measurements Minimize bias: Randomize Conceal allocation lind Minimize random noise: Standardization of measurements What are the problems with non random allocation of assignment? Systematic assignment date of birth

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N Math 3: CSM Tutorial: Probability, Statistics, and Navels Fall 2 In this worksheet, we look at navel ratios, means, standard deviations, relative frequency density histograms, and probability density functions.

More information

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments; A. Goals of Exercise Biology 164 Laboratory Using Comparative Statistics in Biology "Statistics" is a mathematical tool for analyzing and making generalizations about a population from a number of individual

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Variables and Data Representation

Variables and Data Representation You will recall that a computer program is a set of instructions that tell a computer how to transform a given set of input into a specific output. Any program, procedural, event driven or object oriented

More information

April 3, 2012 T.C. Havens

April 3, 2012 T.C. Havens April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate

More information

Macros and ODS. SAS Programming November 6, / 89

Macros and ODS. SAS Programming November 6, / 89 Macros and ODS The first part of these slides overlaps with last week a fair bit, but it doesn t hurt to review as this code might be a little harder to follow. SAS Programming November 6, 2014 1 / 89

More information

CHAPTER 2. Morphometry on rodent brains. A.E.H. Scheenstra J. Dijkstra L. van der Weerd

CHAPTER 2. Morphometry on rodent brains. A.E.H. Scheenstra J. Dijkstra L. van der Weerd CHAPTER 2 Morphometry on rodent brains A.E.H. Scheenstra J. Dijkstra L. van der Weerd This chapter was adapted from: Volumetry and other quantitative measurements to assess the rodent brain, In vivo NMR

More information

Evaluating Machine Learning Methods: Part 1

Evaluating Machine Learning Methods: Part 1 Evaluating Machine Learning Methods: Part 1 CS 760@UW-Madison Goals for the lecture you should understand the following concepts bias of an estimator learning curves stratified sampling cross validation

More information

How to use FSBforecast Excel add in for regression analysis

How to use FSBforecast Excel add in for regression analysis How to use FSBforecast Excel add in for regression analysis FSBforecast is an Excel add in for data analysis and regression that was developed here at the Fuqua School of Business over the last 3 years

More information

Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su

Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su Radiologists and researchers spend countless hours tediously segmenting white matter lesions to diagnose and study brain diseases.

More information

Evaluating Robot Systems

Evaluating Robot Systems Evaluating Robot Systems November 6, 2008 There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Three Types of Probability

Three Types of Probability CHAPTER Three Types of Probability This article is not so much about particular problems or problem solving tactics as it is about labels. If you think about it, labels are a big key to the way we organize

More information

Optimal designs for comparing curves

Optimal designs for comparing curves Optimal designs for comparing curves Holger Dette, Ruhr-Universität Bochum Maria Konstantinou, Ruhr-Universität Bochum Kirsten Schorning, Ruhr-Universität Bochum FP7 HEALTH 2013-602552 Outline 1 Motivation

More information

Table Of Contents. Table Of Contents

Table Of Contents. Table Of Contents Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store

More information

Written by Donna Hiestand-Tupper CCBC - Essex TI 83 TUTORIAL. Version 3.0 to accompany Elementary Statistics by Mario Triola, 9 th edition

Written by Donna Hiestand-Tupper CCBC - Essex TI 83 TUTORIAL. Version 3.0 to accompany Elementary Statistics by Mario Triola, 9 th edition TI 83 TUTORIAL Version 3.0 to accompany Elementary Statistics by Mario Triola, 9 th edition Written by Donna Hiestand-Tupper CCBC - Essex 1 2 Math 153 - Introduction to Statistical Methods TI 83 (PLUS)

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Package PTE. October 10, 2017

Package PTE. October 10, 2017 Type Package Title Personalized Treatment Evaluator Version 1.6 Date 2017-10-9 Package PTE October 10, 2017 Author Adam Kapelner, Alina Levine & Justin Bleich Maintainer Adam Kapelner

More information

StatsMate. User Guide

StatsMate. User Guide StatsMate User Guide Overview StatsMate is an easy-to-use powerful statistical calculator. It has been featured by Apple on Apps For Learning Math in the App Stores around the world. StatsMate comes with

More information

Confidence Interval of a Proportion

Confidence Interval of a Proportion Confidence Interval of a Proportion FPP 20-21 Using the sample to learn about the box Box models and CLT assume we know the contents of the box (the population). In real-world problems, we do not. In random

More information

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015 predict and Friends: Common Methods for Predictive Models in R 36-402, Spring 2015 Handout No. 1, 25 January 2015 R has lots of functions for working with different sort of predictive models. This handout

More information

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

CPSC 536N: Randomized Algorithms Term 2. Lecture 5

CPSC 536N: Randomized Algorithms Term 2. Lecture 5 CPSC 536N: Randomized Algorithms 2011-12 Term 2 Prof. Nick Harvey Lecture 5 University of British Columbia In this lecture we continue to discuss applications of randomized algorithms in computer networking.

More information

, etc. Let s work with the last one. We can graph a few points determined by this equation.

, etc. Let s work with the last one. We can graph a few points determined by this equation. 1. Lines By a line, we simply mean a straight curve. We will always think of lines relative to the cartesian plane. Consider the equation 2x 3y 4 = 0. We can rewrite it in many different ways : 2x 3y =

More information

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom

Bootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bootstrap confidence intervals Class 24, 18.05 Jeremy Orloff and Jonathan Bloom 1. Be able to construct and sample from the empirical distribution of data. 2. Be able to explain the bootstrap

More information

Georgia Institute of Technology College of Engineering School of Electrical and Computer Engineering

Georgia Institute of Technology College of Engineering School of Electrical and Computer Engineering Georgia Institute of Technology College of Engineering School of Electrical and Computer Engineering ECE 8832 Summer 2002 Floorplanning by Simulated Annealing Adam Ringer Todd M c Kenzie Date Submitted:

More information

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Multiple Comparisons of Treatments vs. a Control (Simulation)

Multiple Comparisons of Treatments vs. a Control (Simulation) Chapter 585 Multiple Comparisons of Treatments vs. a Control (Simulation) Introduction This procedure uses simulation to analyze the power and significance level of two multiple-comparison procedures that

More information

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences 1 RefresheR Figure 1.1: Soy ice cream flavor preferences 2 The Shape of Data Figure 2.1: Frequency distribution of number of carburetors in mtcars dataset Figure 2.2: Daily temperature measurements from

More information

How mobile is changing and what publishers need to do about it

How mobile is changing  and what publishers need to do about it How mobile is changing email and what publishers need to do about it BY ADESTRA The mobile channel has produced a culture of information on-demand. We can now view our emails as and when they come through

More information

Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys

Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Steven Pedlow 1, Kanru Xia 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603

More information

9. MATHEMATICIANS ARE FOND OF COLLECTIONS

9. MATHEMATICIANS ARE FOND OF COLLECTIONS get the complete book: http://wwwonemathematicalcatorg/getfulltextfullbookhtm 9 MATHEMATICIANS ARE FOND OF COLLECTIONS collections Collections are extremely important in life: when we group together objects

More information

Evaluating Machine-Learning Methods. Goals for the lecture

Evaluating Machine-Learning Methods. Goals for the lecture Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

Pair-Wise Multiple Comparisons (Simulation)

Pair-Wise Multiple Comparisons (Simulation) Chapter 580 Pair-Wise Multiple Comparisons (Simulation) Introduction This procedure uses simulation analyze the power and significance level of three pair-wise multiple-comparison procedures: Tukey-Kramer,

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Math in MIPS. Subtracting a binary number from another binary number also bears an uncanny resemblance to the way it s done in decimal.

Math in MIPS. Subtracting a binary number from another binary number also bears an uncanny resemblance to the way it s done in decimal. Page < 1 > Math in MIPS Adding and Subtracting Numbers Adding two binary numbers together is very similar to the method used with decimal numbers, except simpler. When you add two binary numbers together,

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

We have seen that as n increases, the length of our confidence interval decreases, the confidence interval will be more narrow.

We have seen that as n increases, the length of our confidence interval decreases, the confidence interval will be more narrow. {Confidence Intervals for Population Means} Now we will discuss a few loose ends. Before moving into our final discussion of confidence intervals for one population mean, let s review a few important results

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14 CSE341T/CSE549T 10/20/2014 Lecture 14 Graph Contraction Graph Contraction So far we have mostly talking about standard techniques for solving problems on graphs that were developed in the context of sequential

More information