Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016
|
|
- Darcy Daniels
- 5 years ago
- Views:
Transcription
1 Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Randomization Distributions in Two-Proportion Settings By calling our setting a two proportion one, I mean that the data frame has two binary categorical variables, when the one that delineates which of two groups a subject comes from serves as the explanatory variable, and the other, the response variable, also has just two outcomes. In the cocaine addiction data, we have an explanatory variable, treatment, which has three levels: Desipramine, Lithium, and Placebo. We cut that back to two by ignoring one set of patients, perhaps those receiving Desipramine, thereby giving us just two groups to consider. The response variable is relapsed or not? which has just two values, yes or no. We focus on the relapsers. Natural hypotheses for a study to see if Lithium helps to decrease the chance of relapse are H 0 : p L p P = 0, H a : p L p P < 0. The sample proportions among the lithium and placebo groups are ˆp L = 18/24 and ˆp P = 20/24, giving us test statistic ˆp L ˆp P = = 2. = Like the study about tapping fingers under the influence of caffeine, this study is an experiment, where the treatment (Lithium or Placebo) was randomly assigned to patients. When we generate a randomization distribution, we want to be faithful to this process, even as we take the null hypothesis into account. That is, the mental image of dropping slips of paper into two bags, one bag containing the 48 relapse results (38 yes and 10 no ) and the other containing the 48 treatments (24 Lithiums and 24 Placebos ), and randomly assigning the latter to the former as we select our randomization sample, is achieving both goals. Generating a randomization distribution, however, is trickier in RStudio for this situation than in earlier scenarios, primarily because of the work we must do to prepare data for randomization samples. You may well prefer to use StatKey, the software meant to accompany the textbook, over RStudio, for cases involving two proportions. I will, however, provide details in RStudio for your perusal. The main difficulty, as indicated above, is preparing data. Here are two approaches. Approach 1: Recreate the data from scratch We have done this sort of thing once before, back in Section 2.1. Perhaps you recall the commands. part1 <- do(6) * data.frame(drug="lithium", Relapse="no") part2 <- do(18) * data.frame(drug="lithium", Relapse="yes") part3 <- do(4) * data.frame(drug="placebo", Relapse="no") part4 <- do(20) * data.frame(drug="placebo", Relapse="yes") addicttreatments <- rbind(part1, part2, part3, part4) Approach 2: Filtering the supplied data frame It turns out we don t actually need to recreate the data, as it has been supplied to us as part of the Lock5withR package in a data frame called CocaineTreatment. But working with it is not so straightforward as it 1
2 would at first seem, because this data frame contains all the patients, including those who received the drug called Desipramine. We can select the desired subset by leaving out these subjects: myfiltereddata <- subset(cocainetreatment, Drug!= "Desipramine") However, there seems to be a lingering memory that there were three levels for the Drug variable. You see this, for instance, when you produce a frequency table on Drug: tally(~drug, myfiltereddata) ## Drug ## Desipramine Lithium Placebo ## While the count of Desipramine patients is 0, we would prefer that our filtered data frame not know Desipramine is part of this study. One way to make it forget is to combine the removal of Desipramine patients with the droplevels() command. myfiltereddata <- droplevels(subset(cocainetreatment, Drug!= "Desipramine")) tally(~drug, myfiltereddata) ## Drug ## Lithium Placebo ## Now our Drug variable truly has just two levels in the myfiltereddata data frame. Once data has been prepared... If you carried out the commands above, you now have two data frames, addicttreatments and myfiltereddata, which can be used for our analysis. Either will work, but I will use myfiltereddata. head(myfiltereddata) ## Drug Relapse ## 25 Lithium no ## 26 Lithium yes ## 27 Lithium yes ## 28 Lithium yes ## 29 Lithium yes ## 30 Lithium no We obtain our test statistic from the sample itself: diff(prop(relapse~drug, data=myfiltereddata)) ## no.placebo ## As when dealing with the difference of two means (see the example using data from CaffeineTaps in a prior handout), our null hypothesis dictates that the drug received (Lithium vs. Placebo) is not actually a factor, and we should generate many randomization statistics by shuffling values of the explanatory variable. One randomization statistic is obtained with the command diff(prop(relapse~shuffle(drug), data=myfiltereddata)) ## no.placebo ## and this may be repeated many times to obtain a randomization distribution: 2
3 manydiffs <- do(5000) * diff(prop(relapse~shuffle(drug), data=myfiltereddata)) head(manydiffs) ## no.placebo ## ## ## ## ## ## The column, containing 5000 randomization statistics, has been given the curious name no.placebo. We may view a histogram and mark the region corresponding to our P -value: histogram(~no.placebo, data=manydiffs, groups = no.placebo <= , width=.1) no.placebo nrow(subset(manydiffs, no.placebo <= )) / 5000 ## [1] This P -value, here approximately 0.36, represents the probability, in a world where Lithium does not help deter relapse into cocaine addiction, of obtaining a sample with a test statistic (difference in sample proportions) of or more. This P -value is not statistically significant under any of the usual significance levels α = 0.1, 0.05 or In fact, such samples statistics would arise about 36% of the time, which makes our sample statistic appear consistent with the null hypothesis. We fail to reject the null hypothesis. Example: Hypothesis Test for Positive Correlation (NFL Malevolence) The hypotheses (explained in the text, Section 4.4): The test statistic: H 0 : ρ = 0, H a : ρ > 0. cor(zpenyds ~ NFL_Malevolence, data=malevolentuniformsnfl) ## [1] Generation of many randomization statistics: 3
4 manycors <- do(5000) * cor(zpenyds ~ shuffle(nfl_malevolence), data=malevolentuniformsnfl) head(manycors) ## cor ## ## ## ## ## ## histogram(~cor, data=manycors, groups=cor>= ) cor The P -value: nrow(subset(manycors, cor>= )) / 5000 ## [1] In the case where the significance level α = 0.05, this result is statistically signficant, and we would reject the null hypothesis in favor of the alternative, concluding that there is a positive correlation. Example: Is the mean body temperature really 98.6? The hypotheses: The test statistic: mean(~bodytemp, data=bodytemp50) ## [1] H 0 : µ = 98.6, H a : µ The natural thing would be to simulate the bootstrap distribution for x, as when we constructed a confidence interval for the population mean µ: manymeans = do(5000) * mean(~bodytemp, data=resample(bodytemp50)) head(manymeans) ## mean ## ##
5 ## ## ## ## histogram(~mean, data=manymeans) mean But this cannot be an proper simulation of the null distribution, as it is not centered at the right place. It appears the center is about 98.26, the value of our point estimate x, not at the hypothesized (population) mean of 98.6, which is what happens whenever we bootstrap a mean. Our randomization statistics should not be the same as bootstrap statistics here, but need to be modified so that they are centered on the proposed mean The modification can simply be that we add to each of our sample means the difference between the intended center (98.6) and where they were centered above (at the sample mean x = 98.26): that is, we should add = 0.34: manymeans = do(5000) * (mean(~bodytemp, data=resample(bodytemp50)) ) names(manymeans) ## [1] "result" histogram(~result, data=manymeans, groups = abs(result-98.6)>=0.34) result We see this modified test statistic has a randomization distribution centered where it ought to be if serving as the null distribution. We have attempted to shade those regions in both tails corresponding to randomization statistics at least as extreme as ours, though there are very few. We obtain the approximate P -value by calculating the area in one tail and doubling it: 5
6 nrow(subset(manymeans, result <= 98.26)) * 2 / 5000 ## [1] Given this small P -value, we reject the null hypothesis and conclude that the actual (population) mean body temperature is something other than Example 4.34: A New Wrinkle on Finger Tapping and Caffeine This example has already been done adequately. Since it was a controlled, randomized experiment in which one treatment, either caffeine or placebo, was assigned randomly to each subject, we obtained our randomization distribution in a manner that also randomly assigned treatment values while adhering to the null hypothesis that treatment doesn t matter. We obtained one randomization statistic with the command diff(mean(taps ~ shuffle(caffeine), data=caffeinetaps)) and an entire distribution of such statistics by repeating this command often. Example 4.34 challenges us to imagine different ways of studying the question: Does caffeine increase tapping rates? Surely there are other approaches besides a controlled randomized experiment. The Locks have us consider two different studies one might undertake. 1. An observational study: Instead of assigning treatments, we find subjects who have already selfselected their own treatments, some having had caffeine (probably as part of a daily routine, drinking coffee in the morning), and others who have not. Subjects from both groups have their tap rates measured, and results of both variables are again recorded. 2. A matched pairs study: This time, subjects undergo both treatments, having their tap rates measured under each. The order of the treatments is assigned randomly, so that some receive caffeine first, while for others it is the placebo first. Each subject is, then, the source of two numbers, the caffeine tap rate and the placebo tap rate. Our effective data for each subject, however, would be the difference: (caffeine tap rate) (placebo tap rate). In each of these scenarios, the change in the manner in which data is collected calls for a change in the manner in which randomization statistics are produced. The easier of these two alternate study paradigms to handle in RStudio is the matched pairs case, which we discuss next. We will not delve into the observational study case, but suffice it to say that our treatment should be something like the approach suggested by Hunter Pham (see earlier course notes), but modified so that the null hypothesis is respected. So, imagine that we have gathered a random sample of 10 people for a matched pairs study on whether caffeine causes higher tapping rates. We randomly select 5 to undergo the caffeine treatment first followed by placebo, while the other 5 will receive placebo first and then caffeine. (For a blind study, which is preferred, subjects still do not know which treatment they receive first.) Here, displayed below, are some pretend data from a matched pairs experiment. This data frame, matchedpairscafftaps, is not part of any package you can load. Commands that generate it are given below. set.seed(50) matchedpairscafftaps = data.frame(placebo=round(runif(10,234,255), 1), caffeine=round(runif(10,241,258), 1), first=sample(c(rep("c",5),rep("p",5)))) matchedpairscafftaps$obsdiff = matchedpairscafftaps$caffeine - matchedpairscafftaps$placebo The resulting data set is displayed here. 6
7 matchedpairscafftaps ## placebo caffeine first obsdiff ## P -1.3 ## C 2.4 ## P 13.7 ## P -7.8 ## P 0.9 ## C 17.6 ## C 6.5 ## C -0.4 ## C 7.4 ## P 7.6 The null and alternative hypotheses, which should be understood before data has been collected are these H 0 : µ Diff = 0, H a : µ Diff > 0. From our data, we obtain the sample mean of observed differences in the usual way. mean(~obsdiff, data=matchedpairscafftaps) ## [1] 4.66 This is our test statistic. In generating randomization statistics, we want to use the data we have, but adhere to the null hypothesis, which implies that caffeine should not dictate which tap rate, the one under caffeine or placebo, is larger. This would mean that the sign of the difference is random, coming out positive or negative like flips of a coin come out heads or tails. The command sample(c(-1,1), 10, replace=true) ## [1] acts like 10 coin flips, except that it produces (-1) and 1 rather than H or T. We simulate one randomization statistic by this command mean(~ obsdiff*sample(c(-1,1), 10, replace=true), data=matchedpairscafftaps) ## [1] 3.64 and obtain a randomization distribution by repeating it multiple times: manympmeans = do(3000) * mean(~obsdiff*sample(c(-1,1),10,replace=true), data=matchedpairscafftaps) head(manympmeans) ## mean ## ## ## ## ## ## histogram(~mean, manympmeans, groups = mean >= 4.66) 7
8 mean Our approximate P -value is nrow(subset(manympmeans, mean>=4.66)) / 3000 ## [1]
Introduction to Hypothesis Testing T.Scofield 10/03/2016
Introduction to Hypothesis Testing T.Scofield 10/03/016 Hypothesis Testing: the steps 1. Identify the research question, along with relevant variables.. Formulate hypotheses (null and alternative) appropriate
More information9.2 Types of Errors in Hypothesis testing
9.2 Types of Errors in Hypothesis testing 1 Mistakes we could make As I mentioned, when we take a sample we won t be 100% sure of something because we do not take a census (we only look at information
More informationConfidence Intervals. Dennis Sun Data 301
Dennis Sun Data 301 Statistical Inference probability Population / Box Sample / Data statistics The goal of statistics is to infer the unknown population from the sample. We ve already seen one mode of
More information7.2: Chi-Square Test for Association T.Scofield Nov. 17, 2016
72: Chi-Square Test for Association TScofield Nov 17, 2016 The goal of this section is to provide means for investigating whether there is an association between two categorical variables Before proceeding,
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 14: Introduction to hypothesis testing (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 10 Hypotheses 2 / 10 Quantifying uncertainty Recall the two key goals of inference:
More informationPsychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding
Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using
More informationPredictive Analysis: Evaluation and Experimentation. Heejun Kim
Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training
More informationData 8 Final Review #1
Data 8 Final Review #1 Topics we ll cover: Visualizations Arrays and Table Manipulations Programming constructs (functions, for loops, conditional statements) Chance, Simulation, Sampling and Distributions
More informationAnnouncements. Unit 2: Probability and distributions Lecture 1: Probability and conditional probability. Statistics 101. Survey.
Anuncements Anuncements Unit 2: and distributions Lecture 1: and conditional probability Statistics 101 Mine Çetinkaya-Rundel September 10, 2013 PS1 due Thursday. TA office hours: Christine: Monday 5-7pm
More informationSTA215 Inference about comparing two populations
STA215 Inference about comparing two populations Al Nosedal. University of Toronto. Summer 2017 June 22, 2017 Two-sample problems The goal of inference is to compare the responses to two treatments or
More informationMinitab Guide for MA330
Minitab Guide for MA330 The purpose of this guide is to show you how to use the Minitab statistical software to carry out the statistical procedures discussed in your textbook. The examples usually are
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More informationSTATS PAD USER MANUAL
STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationHypothesis Test Exercises from Class, Oct. 12, 2018
Hypothesis Test Exercises from Class, Oct. 12, 218 Question 1: Is there a difference in mean sepal length between virsacolor irises and setosa ones? Worked on by Victoria BienAime and Pearl Park Null Hypothesis:
More information2) In the formula for the Confidence Interval for the Mean, if the Confidence Coefficient, z(α/2) = 1.65, what is the Confidence Level?
Pg.431 1)The mean of the sampling distribution of means is equal to the mean of the population. T-F, and why or why not? True. If you were to take every possible sample from the population, and calculate
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationMore Summer Program t-shirts
ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling
More informationIntroductory Applied Statistics: A Variable Approach TI Manual
Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright
More informationChapter 6 Normal Probability Distributions
Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How
More informationGoodness-of-Fit Testing T.Scofield Nov. 16, 2016
Goodness-of-Fit Testing T.Scofield Nov. 16, 2016 We do goodness-of-fit testing with a single categorical variable, to see if the distribution of its sampled values fits a specified probability model. The
More informationLecture 31 Sections 9.4. Tue, Mar 17, 2009
s for s for Lecture 31 Sections 9.4 Hampden-Sydney College Tue, Mar 17, 2009 Outline s for 1 2 3 4 5 6 7 s for Exercise 9.17, page 582. It is believed that 20% of all university faculty would be willing
More informationChapter 8. Interval Estimation
Chapter 8 Interval Estimation We know how to get point estimate, so this chapter is really just about how to get the Introduction Move from generating a single point estimate of a parameter to generating
More informationBootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping
Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests
More informationStatistical Analysis of List Experiments
Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation
More informationCondence Intervals about a Single Parameter:
Chapter 9 Condence Intervals about a Single Parameter: 9.1 About a Population Mean, known Denition 9.1.1 A point estimate of a parameter is the value of a statistic that estimates the value of the parameter.
More informationBIOS: 4120 Lab 11 Answers April 3-4, 2018
BIOS: 4120 Lab 11 Answers April 3-4, 2018 In today s lab we will briefly revisit Fisher s Exact Test, discuss confidence intervals for odds ratios, and review for quiz 3. Note: The material in the first
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationSTAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015
STAT 113: Lab 9 Colin Reimer Dawson Last revised November 10, 2015 We will do some of the following together. The exercises with a (*) should be done and turned in as part of HW9. Before we start, let
More informationTI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock
TI-83 Users Guide to accompany by Lock, Lock, Lock, Lock, and Lock TI-83 Users Guide- 1 Getting Started Entering Data Use the STAT menu, then select EDIT and hit Enter. Enter data for a single variable
More informationMean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationBinary Diagnostic Tests Clustered Samples
Chapter 538 Binary Diagnostic Tests Clustered Samples Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. In the twogroup case, each cluster
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationUsing Large Data Sets Workbook Version A (MEI)
Using Large Data Sets Workbook Version A (MEI) 1 Index Key Skills Page 3 Becoming familiar with the dataset Page 3 Sorting and filtering the dataset Page 4 Producing a table of summary statistics with
More informationProbability and Statistics. Copyright Cengage Learning. All rights reserved.
Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.6 Descriptive Statistics (Graphical) Copyright Cengage Learning. All rights reserved. Objectives Data in Categories Histograms
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationNotes on Simulations in SAS Studio
Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write
More informationMinitab on the Math OWL Computers (Windows NT)
STAT 100, Spring 2001 Minitab on the Math OWL Computers (Windows NT) (This is an incomplete revision by Mike Boyle of the Spring 1999 Brief Introduction of Benjamin Kedem) Department of Mathematics, UMCP
More informationUse of Extreme Value Statistics in Modeling Biometric Systems
Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision
More informationThe Power and Sample Size Application
Chapter 72 The Power and Sample Size Application Contents Overview: PSS Application.................................. 6148 SAS Power and Sample Size............................... 6148 Getting Started:
More informationStatistical Analysis of MRI Data
Statistical Analysis of MRI Data Shelby Cummings August 1, 2012 Abstract Every day, numerous people around the country go under medical testing with the use of MRI technology. Developed in the late twentieth
More informationChapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea
Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.
More informationMinimize bias: Minimize random noise: Randomize Conceal allocation Blind. Standardization of measurements
Minimize bias: Randomize Conceal allocation lind Minimize random noise: Standardization of measurements What are the problems with non random allocation of assignment? Systematic assignment date of birth
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationWeek 5: Multiple Linear Regression II
Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R
More informationheight VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N
Math 3: CSM Tutorial: Probability, Statistics, and Navels Fall 2 In this worksheet, we look at navel ratios, means, standard deviations, relative frequency density histograms, and probability density functions.
More informationRetrieval Evaluation. Hongning Wang
Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User
More informationBootstrapping Methods
Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods
More information2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;
A. Goals of Exercise Biology 164 Laboratory Using Comparative Statistics in Biology "Statistics" is a mathematical tool for analyzing and making generalizations about a population from a number of individual
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationVariables and Data Representation
You will recall that a computer program is a set of instructions that tell a computer how to transform a given set of input into a specific output. Any program, procedural, event driven or object oriented
More informationApril 3, 2012 T.C. Havens
April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate
More informationMacros and ODS. SAS Programming November 6, / 89
Macros and ODS The first part of these slides overlaps with last week a fair bit, but it doesn t hurt to review as this code might be a little harder to follow. SAS Programming November 6, 2014 1 / 89
More informationCHAPTER 2. Morphometry on rodent brains. A.E.H. Scheenstra J. Dijkstra L. van der Weerd
CHAPTER 2 Morphometry on rodent brains A.E.H. Scheenstra J. Dijkstra L. van der Weerd This chapter was adapted from: Volumetry and other quantitative measurements to assess the rodent brain, In vivo NMR
More informationEvaluating Machine Learning Methods: Part 1
Evaluating Machine Learning Methods: Part 1 CS 760@UW-Madison Goals for the lecture you should understand the following concepts bias of an estimator learning curves stratified sampling cross validation
More informationHow to use FSBforecast Excel add in for regression analysis
How to use FSBforecast Excel add in for regression analysis FSBforecast is an Excel add in for data analysis and regression that was developed here at the Fuqua School of Business over the last 3 years
More informationSegmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su
Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su Radiologists and researchers spend countless hours tediously segmenting white matter lesions to diagnose and study brain diseases.
More informationEvaluating Robot Systems
Evaluating Robot Systems November 6, 2008 There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationThree Types of Probability
CHAPTER Three Types of Probability This article is not so much about particular problems or problem solving tactics as it is about labels. If you think about it, labels are a big key to the way we organize
More informationOptimal designs for comparing curves
Optimal designs for comparing curves Holger Dette, Ruhr-Universität Bochum Maria Konstantinou, Ruhr-Universität Bochum Kirsten Schorning, Ruhr-Universität Bochum FP7 HEALTH 2013-602552 Outline 1 Motivation
More informationTable Of Contents. Table Of Contents
Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store
More informationWritten by Donna Hiestand-Tupper CCBC - Essex TI 83 TUTORIAL. Version 3.0 to accompany Elementary Statistics by Mario Triola, 9 th edition
TI 83 TUTORIAL Version 3.0 to accompany Elementary Statistics by Mario Triola, 9 th edition Written by Donna Hiestand-Tupper CCBC - Essex 1 2 Math 153 - Introduction to Statistical Methods TI 83 (PLUS)
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationPackage PTE. October 10, 2017
Type Package Title Personalized Treatment Evaluator Version 1.6 Date 2017-10-9 Package PTE October 10, 2017 Author Adam Kapelner, Alina Levine & Justin Bleich Maintainer Adam Kapelner
More informationStatsMate. User Guide
StatsMate User Guide Overview StatsMate is an easy-to-use powerful statistical calculator. It has been featured by Apple on Apps For Learning Math in the App Stores around the world. StatsMate comes with
More informationConfidence Interval of a Proportion
Confidence Interval of a Proportion FPP 20-21 Using the sample to learn about the box Box models and CLT assume we know the contents of the box (the population). In real-world problems, we do not. In random
More informationpredict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015
predict and Friends: Common Methods for Predictive Models in R 36-402, Spring 2015 Handout No. 1, 25 January 2015 R has lots of functions for working with different sort of predictive models. This handout
More informationMaximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University
Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to
More informationClassification/Regression Trees and Random Forests
Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series
More informationCPSC 536N: Randomized Algorithms Term 2. Lecture 5
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Prof. Nick Harvey Lecture 5 University of British Columbia In this lecture we continue to discuss applications of randomized algorithms in computer networking.
More information, etc. Let s work with the last one. We can graph a few points determined by this equation.
1. Lines By a line, we simply mean a straight curve. We will always think of lines relative to the cartesian plane. Consider the equation 2x 3y 4 = 0. We can rewrite it in many different ways : 2x 3y =
More informationBootstrap confidence intervals Class 24, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bootstrap confidence intervals Class 24, 18.05 Jeremy Orloff and Jonathan Bloom 1. Be able to construct and sample from the empirical distribution of data. 2. Be able to explain the bootstrap
More informationGeorgia Institute of Technology College of Engineering School of Electrical and Computer Engineering
Georgia Institute of Technology College of Engineering School of Electrical and Computer Engineering ECE 8832 Summer 2002 Floorplanning by Simulated Annealing Adam Ringer Todd M c Kenzie Date Submitted:
More informationBivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationMultiple Comparisons of Treatments vs. a Control (Simulation)
Chapter 585 Multiple Comparisons of Treatments vs. a Control (Simulation) Introduction This procedure uses simulation to analyze the power and significance level of two multiple-comparison procedures that
More information1 RefresheR. Figure 1.1: Soy ice cream flavor preferences
1 RefresheR Figure 1.1: Soy ice cream flavor preferences 2 The Shape of Data Figure 2.1: Frequency distribution of number of carburetors in mtcars dataset Figure 2.2: Daily temperature measurements from
More informationHow mobile is changing and what publishers need to do about it
How mobile is changing email and what publishers need to do about it BY ADESTRA The mobile channel has produced a culture of information on-demand. We can now view our emails as and when they come through
More informationDual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys
Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Steven Pedlow 1, Kanru Xia 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603
More information9. MATHEMATICIANS ARE FOND OF COLLECTIONS
get the complete book: http://wwwonemathematicalcatorg/getfulltextfullbookhtm 9 MATHEMATICIANS ARE FOND OF COLLECTIONS collections Collections are extremely important in life: when we group together objects
More informationEvaluating Machine-Learning Methods. Goals for the lecture
Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from
More informationPair-Wise Multiple Comparisons (Simulation)
Chapter 580 Pair-Wise Multiple Comparisons (Simulation) Introduction This procedure uses simulation analyze the power and significance level of three pair-wise multiple-comparison procedures: Tukey-Kramer,
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationMath in MIPS. Subtracting a binary number from another binary number also bears an uncanny resemblance to the way it s done in decimal.
Page < 1 > Math in MIPS Adding and Subtracting Numbers Adding two binary numbers together is very similar to the method used with decimal numbers, except simpler. When you add two binary numbers together,
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationWe have seen that as n increases, the length of our confidence interval decreases, the confidence interval will be more narrow.
{Confidence Intervals for Population Means} Now we will discuss a few loose ends. Before moving into our final discussion of confidence intervals for one population mean, let s review a few important results
More informationSection 4 General Factorial Tutorials
Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One
More informationGraph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14
CSE341T/CSE549T 10/20/2014 Lecture 14 Graph Contraction Graph Contraction So far we have mostly talking about standard techniques for solving problems on graphs that were developed in the context of sequential
More information