7.2: Chi-Square Test for Association T.Scofield Nov. 17, 2016
|
|
- Darrell Brooks
- 5 years ago
- Views:
Transcription
1 72: Chi-Square Test for Association TScofield Nov 17, 2016 The goal of this section is to provide means for investigating whether there is an association between two categorical variables Before proceeding, it may be helpful to enter the following commands, which will add new commands called expectedcounts() and chisqstat() to our list of available ones expectedcounts <- function(stable) { casetotal = sum(stable) expcounts = stable for (ii in 1:nrow(sTable)) { for (jj in 1:ncol(sTable)) { expcounts[ii,jj] = sum(stable[ii,])*sum(stable[,jj]) / casetotal return (expcounts) chisqstat <- function(otab) { X2 = 0 etab <- expectedcounts(otab) for (ii in 1:nrow(oTab)) { for (jj in 1:ncol(oTab)) { X2 = X2 + (otab[ii,jj] - etab[ii,jj])^2 / etab[ii,jj] return (X2) Example: Is there an association between socioeconomic status (SES) and smoking? We will address this question using an hypothesis test The null and alternative hypotheses are stated this way: H 0 : H a : no association exists between SES and smoking status there is an association between SES and smoking status One can imagine the data one would collect for a study to address this question For each person sampled, we would record the values of two categorical variables: SES and smoking status case SES Smoking Status 1 middle current 2 low current 3 high never 4 middle former But perhaps the data comes to us not in this raw form, but already summarized as a two-way (contingency) table 1
2 current former never high low middle It is convenient to know steps one can use to build such a table inside RStudio directly from the numbers (ie, when no raw data set is available) These commands will do so, giving it the name smoketable ## current former never ## high ## low ## middle smoke <- matrix(c(51,92,68,43,28,22,22,21,9),ncol=3,byrow=true) colnames(smoke) <- c("current","former","never") rownames(smoke) <- c("high","low","middle") smoketable <- astable(smoke) If we wish to view the table at this point, we may do so by typing the name it in which it is stored: smoketable Because both of the categorical variables have 3 values, the table itself has 9 cells We might wish to see row and column totals as well, which is achieved by wrapping the table s name in an addmargins() command addmargins(smoketable) ## current former never Sum ## high ## low ## middle ## Sum As with goodness-of-fit testing, our test statistic will be χ 2 = (O i E i ) 2 E i The O i here are the numbers found in the various cells of the two-way table, while the E i represent the numbers we would have expected in these cells under the null hypothesis The expectedcounts() command we built above can help us with the E i Both of the commands created there have limited use They both require a two-way table as input, which means they are not appropriate for the goodness-of-fit tests we have previously discussed In our present context, however, we can process expected counts for our smoketable expectedcounts(smoketable) Once again, we can wrap this in addmargins() to see row/column totals addmargins(expectedcounts(smoketable)) ## current former never Sum ## high ## low ## middle ## Sum Take a moment and find the two-way table containing observed counts above Note that row and column totals are unchanged between that table and this one containing expected counts It is the contents of the individual cells which have been altered In the table of expected counts, for instance, we have that for each SES status, the proportion of current smokers is the same: 2
3 among high SES cases, the portion is 6875/211 = 0326 among low SES cases, the portion is 303/93 = 0326 among middle SES cases, the portion is 1694/52 = 0326 These sorts of proportions/ratios are consistent whichever way you look at them, either across rows or columns As another instance, if we focus on proportions of low SES across the various smoking statuses, we see among current smokers, the proportion is 303/116 = 0261 among former smokers, the proportion is 3683/141 = 0261 among those who never smoked, the proportion is 2586/99 = 0261 This is how things would be expected to look in a population perfectly represented by our sample when the two variables are independent (not associated) While software has provided us with the expected counts, the method for calculating each is straightforward, given that each cell has row and column totals which are identical to that of the actual data: E i = (row total) (column total) sample size The chisqstat() function we defined above can compute for us the χ 2 test statistic (though you should practice doing this by hand and see that you obtain the same value) As with our expectedcounts() command, it requires, as input, the two-way table chisqstat(smoketable) ## [1] Assuming our expected counts are all at least 5 (same rule of thumb as the Locks gave us for goodness-of-fit testing), we can obtain an approximate P -value from a chi-square distribution with df s given by df = [(number of rows) - 1] [(number of columns) - 1] In this case, our smallest expected count is 1446, so we choose df = (3 1) (3 1) = 4, and compute the approximate P -value (recalling that this is a 1-sided, right-tailed test): 1 - pchisq(1851, df=4) ## [1] We would reject the null hypothesis at the 5% level (also the 1% level), here, and conclude there is an association between SES and smoking status Example: This one provides a variation on Example 710, p~481 in the text We have access to the raw data for this example, which is found in the data frame WaterTaste We obtain a two-way table of observed frequencies, along with one containing the corresponding expected counts, for the categorical variables in question: UsuallyDrink and First: fulltable <- tally(~usuallydrink + First, data=watertaste) fulltable ## First ## UsuallyDrink Aquafina Fiji SamsChoice Tap ## Bottled ## Filtered ## Tap expectedcounts(fulltable) ## First ## UsuallyDrink Aquafina Fiji SamsChoice Tap 3
4 ## Bottled ## Filtered ## Tap We see the rule of thumb that all expected counts be at least 5 is not met Many authors state a different rule of thumb, saying you are safe to use a chi-square distribution to approximate the P -value if no expected count is less than 1, and no more than 20% of expected counts are less than 5, but even this relaxed rule of thumb is not met At this stage we have two options: We could combine values of a variable This is the approach taken in the text, where they have combined the Filtered and Tap rows into a single category they call "Tap/Filtered" The new expected counts after combining these rows are displayed in parentheses in Table 723 on p 482, and while there is still one of the 8 cells that contains an expected count smaller than 5, the relaxed rule of thumb is satisfied If we really do not wish to combine categories, we might use randomization to produce an approximate null distribution (distribution of χ 2 values from randomization samples) and obtain a P -value from it This approach is not overly difficult in this instance, particularly because we have the raw data set We adopt this approach below The command tally(~shuffle(usuallydrink) + shuffle(first), data=watertaste) produces a randomization sample Notice that if we execute this command and include row/column totals, these totals are maintained even though the simulated observed counts found in the individual cells change addmargins(tally(~shuffle(usuallydrink) + shuffle(first), data=watertaste)) ## shuffle(first) ## shuffle(usuallydrink) Aquafina Fiji SamsChoice Tap Sum ## Bottled ## Filtered ## Tap ## Sum We obtain an individual randomization (χ 2 ) statistic by generating a randomization sample and wrapping that inside a call to the chisqstat() function: chisqstat(tally(~shuffle(usuallydrink) + shuffle(first), data=watertaste)) ## [1] It is this that we would want to repeat often in order to generate a randomization distribution manychisqs = do(1000) * chisqstat(tally(~shuffle(usuallydrink) + shuffle(first), data=watertaste)) head(manychisqs) ## chisqstat ## ## ## ## ## ## Our test statistic is obtained similarly, but without shuffling: chisqstat(tally(~usuallydrink + First, data=watertaste)) ## [1]
5 We view the corresponding distribution, shading the region to the right of our test statistic, and compute the approximate P -value As the two-way table has 3 rows and 4 columns, the chi-square distribution which best approximates the null distribution is the one with df = (2)(3) = 6 We overlay this distribution to illustrate how similar (or not) it is to the randomization distribution Since the rule of thumb for using a chi-square distribution is not met, the implication is that these two (the randomization distribution and the chi-square density curve) are not similar enough to warrant using the pchisq() command to obtain a P -value histogram(~chisqstat, data=manychisqs, groups = chisqstat>=497) plotdist("chisq", df=6, add=true) 010 Density chisqstat nrow(subset(manychisqs, chisqstat >= 497)) / 1000 ## [1] 0553 With this high P -value, we fail to reject the null hypothesis 5
Goodness-of-Fit Testing T.Scofield Nov. 16, 2016
Goodness-of-Fit Testing T.Scofield Nov. 16, 2016 We do goodness-of-fit testing with a single categorical variable, to see if the distribution of its sampled values fits a specified probability model. The
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationHypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016
Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Randomization Distributions in Two-Proportion Settings By calling our setting a two proportion one, I mean that the data frame
More informationSTATS PAD USER MANUAL
STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11
More informationData Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration
Data Mining 2.4 Fall 2008 Instructor: Dr. Masoud Yaghini Data integration: Combines data from multiple databases into a coherent store Denormalization tables (often done to improve performance by avoiding
More informationSTA215 Inference about comparing two populations
STA215 Inference about comparing two populations Al Nosedal. University of Toronto. Summer 2017 June 22, 2017 Two-sample problems The goal of inference is to compare the responses to two treatments or
More information2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;
A. Goals of Exercise Biology 164 Laboratory Using Comparative Statistics in Biology "Statistics" is a mathematical tool for analyzing and making generalizations about a population from a number of individual
More informationFrequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values
Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationUsing Large Data Sets Workbook Version A (MEI)
Using Large Data Sets Workbook Version A (MEI) 1 Index Key Skills Page 3 Becoming familiar with the dataset Page 3 Sorting and filtering the dataset Page 4 Producing a table of summary statistics with
More informationIntroduction to Hypothesis Testing T.Scofield 10/03/2016
Introduction to Hypothesis Testing T.Scofield 10/03/016 Hypothesis Testing: the steps 1. Identify the research question, along with relevant variables.. Formulate hypotheses (null and alternative) appropriate
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationBluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition
Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created
More informationUnit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals
BIOSTATS 540 Fall 017 8. SUPPLEMENT Normal, T, Chi Square, F and Sums of Normals Page 1 of Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals Topic 1. Normal Distribution.. a. Definition..
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationTable Of Contents. Table Of Contents
Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,
More informationSelected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.
Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data
More informationScreening Design Selection
Screening Design Selection Summary... 1 Data Input... 2 Analysis Summary... 5 Power Curve... 7 Calculations... 7 Summary The STATGRAPHICS experimental design section can create a wide variety of designs
More informationTopic (3) SUMMARIZING DATA - TABLES AND GRAPHICS
Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS 3- Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS A) Frequency Distributions For Samples Defn: A FREQUENCY DISTRIBUTION is a tabular or graphical display
More informationEffective probabilistic stopping rules for randomized metaheuristics: GRASP implementations
Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Celso C. Ribeiro Isabel Rosseti Reinaldo C. Souza Universidade Federal Fluminense, Brazil July 2012 1/45 Contents
More informationCHAPTER 4: MICROSOFT OFFICE: EXCEL 2010
CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010 Quick Summary A workbook an Excel document that stores data contains one or more pages called a worksheet. A worksheet or spreadsheet is stored in a workbook, and
More informationBox-Cox Transformation for Simple Linear Regression
Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are
More informationTHE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann
Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG
More informationData Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine
Data Mining SPSS 12.0 6. k-means Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Outline K-Means Algorithm in K-Means Node References K-Means Algorithm in Overview The k-means method is a clustering
More informationSection 4 General Factorial Tutorials
Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One
More informationStrategies for Modeling Two Categorical Variables with Multiple Category Choices
003 Joint Statistical Meetings - Section on Survey Research Methods Strategies for Modeling Two Categorical Variables with Multiple Category Choices Christopher R. Bilder Department of Statistics, University
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses
More informationDual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys
Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Steven Pedlow 1, Kanru Xia 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603
More informationSPSS INSTRUCTION CHAPTER 9
SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can
More informationSPSS TRAINING SPSS VIEWS
SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data
More informationLab 5 - Risk Analysis, Robustness, and Power
Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors
More informationExcel Tips and FAQs - MS 2010
BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationPIVOT TABLES IN MICROSOFT EXCEL 2016
PIVOT TABLES IN MICROSOFT EXCEL 2016 A pivot table is a powerful tool that allows you to take a long list of data and transform it into a more compact and readable table. In the process, the tool allows
More informationTI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock
TI-83 Users Guide to accompany by Lock, Lock, Lock, Lock, and Lock TI-83 Users Guide- 1 Getting Started Entering Data Use the STAT menu, then select EDIT and hit Enter. Enter data for a single variable
More informationNonparametric Testing
Nonparametric Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationFor our example, we will look at the following factors and factor levels.
In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball
More informationExploring and Understanding Data Using R.
Exploring and Understanding Data Using R. Loading the data into an R data frame: variable
More informationStatistics with a Hemacytometer
Statistics with a Hemacytometer Overview This exercise incorporates several different statistical analyses. Data gathered from cell counts with a hemacytometer is used to explore frequency distributions
More informationMultivariate Capability Analysis
Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8
More informationSTAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015
STAT 113: Lab 9 Colin Reimer Dawson Last revised November 10, 2015 We will do some of the following together. The exercises with a (*) should be done and turned in as part of HW9. Before we start, let
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationCondence Intervals about a Single Parameter:
Chapter 9 Condence Intervals about a Single Parameter: 9.1 About a Population Mean, known Denition 9.1.1 A point estimate of a parameter is the value of a statistic that estimates the value of the parameter.
More informationOne way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).
One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). Suppose you have a one way design, and want to do an ANOVA, but discover that your data are seriously not normal? Just
More informationSection I: Dual Retrieval Models
Created by Carlos Gomes (cf365@cornell.edu) and Ryan Yeh (ry58@cornell.edu) 1 The purpose of this tutorial is to outline the application of a group of two-stage Markov models that have been used to quantify
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More informationStatistical Tests for Variable Discrimination
Statistical Tests for Variable Discrimination University of Trento - FBK 26 February, 2015 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, 2015 1 / 31 General statistics Descriptional:
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationElementary Statistics: Looking at the Big Picture
Excel 2007 Technology Guide for Elementary Statistics: Looking at the Big Picture 1st EDITION Nancy Pfenning University of Pittsburgh Prepared by Nancy Pfenning University of Pittsburgh Melissa M. Sovak
More informationJMP 10 Student Edition Quick Guide
JMP 10 Student Edition Quick Guide Instructions presume an open data table, default preference settings and appropriately typed, user-specified variables of interest. RMC = Click Right Mouse Button Graphing
More informationNormal Curves and Sampling Distributions
Normal Curves and Sampling Distributions 6 Copyright Cengage Learning. All rights reserved. Section 6.2 Standard Units and Areas Under the Standard Normal Distribution Copyright Cengage Learning. All rights
More informationCS130 Software Tools. Fall 2010 Intro to SPSS and Data Handling
Software Tools Intro to SPSS and Data Handling 1 Types of Analyses When doing data analysis, we are interested in two types of summaries: Statistical Summaries (e.g. descriptive, hypothesis testing) Visual
More information11. Chi Square. Calculate Chi Square for contingency tables. A Chi Square is used to analyze categorical data. It compares observed
11. Chi Square Objectives Calculate goodness of fit Chi Square Calculate Chi Square for contingency tables Calculate effect size Save data entry time by weighting cases A Chi Square is used to analyze
More informationPaired Home Range Size, Overlap, Joint-Space Use, and Similarity Indexes Justin Clapp ZOO 5890 Objective - quantify changes in home range
Paired Home Range Size, Overlap, Joint-Space Use, and Similarity Indexes Justin Clapp ZOO 5890 Objective - quantify changes in home range distributions using paired GPS location data in R The Brownian
More informationMINITAB 17 BASICS REFERENCE GUIDE
MINITAB 17 BASICS REFERENCE GUIDE Dr. Nancy Pfenning September 2013 After starting MINITAB, you'll see a Session window above and a worksheet below. The Session window displays non-graphical output such
More informationGO! with Microsoft Excel 2016 Comprehensive
GO! with Microsoft Excel 2016 Comprehensive First Edition Chapter 2 Using Functions, Creating Tables, and Managing Large Workbooks Use SUM and Statistical Functions The SUM function is a predefined formula
More informationCluster Randomization Create Cluster Means Dataset
Chapter 270 Cluster Randomization Create Cluster Means Dataset Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. Examples of such clusters
More informationProduct Catalog. AcaStat. Software
Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,
More informationComputing With R Handout 1
Computing With R Handout 1 The purpose of this handout is to lead you through a simple exercise using the R computing language. It is essentially an assignment, although there will be nothing to hand in.
More informationMinitab Guide for MA330
Minitab Guide for MA330 The purpose of this guide is to show you how to use the Minitab statistical software to carry out the statistical procedures discussed in your textbook. The examples usually are
More informationSTATA 13 INTRODUCTION
STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA
More informationExcel 2010 with XLSTAT
Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with
More informationStatsMate. User Guide
StatsMate User Guide Overview StatsMate is an easy-to-use powerful statistical calculator. It has been featured by Apple on Apps For Learning Math in the App Stores around the world. StatsMate comes with
More informationUnit 5: Estimating with Confidence
Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating
More informationData Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology
❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationBinary Diagnostic Tests Clustered Samples
Chapter 538 Binary Diagnostic Tests Clustered Samples Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. In the twogroup case, each cluster
More informationIntroduction to Statistics lab 1
Introduction to Statistics lab 1 Johan A. Elkink jos.elkink@ucd.ie 7 September 2015 The main purpose of today s class is to get a feel for how to open, access and view data, and to get some familiarity
More informationUnit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users
BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit
More informationMinitab on the Math OWL Computers (Windows NT)
STAT 100, Spring 2001 Minitab on the Math OWL Computers (Windows NT) (This is an incomplete revision by Mike Boyle of the Spring 1999 Brief Introduction of Benjamin Kedem) Department of Mathematics, UMCP
More informationCH 21 CONSECUTIVE INTEGERS
201 CH 21 CONSECUTIVE INTEGERS Introduction An integer is either a positive whole number, or zero, or a negative whole number; in other words it s the collection of numbers:... 4, 3, 2, 1, 0, 1, 2, 3,
More informationLand Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida
Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida FINAL REPORT Submitted October 2004 Prepared by: Daniel Gann Geographic Information
More informationExample Lecture 12: The Stiffness Method Prismatic Beams. Consider again the two span beam previously discussed and determine
Example 1.1 Consider again the two span beam previously discussed and determine The shearing force M1 at end B of member B. The bending moment M at end B of member B. The shearing force M3 at end B of
More informationSTATEWIDE WORKLOAD ANALYTIC TOOL
ADMINISTRATOR S REFERENCE GUIDE & SYSTEM DOCUMENTATION STATEWIDE WORKLOAD ANALYTIC TOOL PREPARED FOR: MINNESOTA DEPARTMENT OF HUMAN SERVICES BY: HORNBY ZELLER ASSOCIATES, INC. TABLE OF CONTENTS INTRODUCTION
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationWINKS SDA 7. Version 7
WINKS SDA 7 Version 7 (For BASIC and PROFESSIONAL Editions of WINKS SDA) PowerPoint Slides for this Guide are svailable at the website Click Instructors. www.texasoft.com TexaSoft, 2015 Do these tutorials
More informationBox-Cox Transformation
Chapter 190 Box-Cox Transformation Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a single batch of data. It is used to modify the distributional shape of a set
More informationIntroduction to Statistics lab 1
Introduction to Statistics lab 1 Johan A. Elkink jos.elkink@ucd.ie 10 September 2018 The main purpose of today s class is to get a feel for how to open, access and view data, and to get some familiarity
More informationStat 5100 Handout #14.a SAS: Logistic Regression
Stat 5100 Handout #14.a SAS: Logistic Regression Example: (Text Table 14.3) Individuals were randomly sampled within two sectors of a city, and checked for presence of disease (here, spread by mosquitoes).
More informationCSE141 Problem Set #4 Solutions
CSE141 Problem Set #4 Solutions March 5, 2002 1 Simple Caches For this first problem, we have a 32 Byte cache with a line length of 8 bytes. This means that we have a total of 4 cache blocks (cache lines)
More informationNavigating in SPSS. C h a p t e r 2 OBJECTIVES
C h a p t e r 2 Navigating in SPSS 2.1 Introduction and Objectives As with any new software program you may use, it is important that you are able to move around the screen with the mouse and that you
More informationCognalysis TM Reserving System User Manual
Cognalysis TM Reserving System User Manual Return to Table of Contents 1 Table of Contents 1.0 Starting an Analysis 3 1.1 Opening a Data File....3 1.2 Open an Analysis File.9 1.3 Create Triangles.10 2.0
More informationWORKING WITH PIVOT TABLES
WORKING WITH PIVOT TABLES Introduction Perhaps the most powerful analytical tool that Excel provides is the PivotTable command, with which one can cross-tabulate data stored in Excel lists. A cross-tabulation
More informationFrequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM
Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.
More informationMore Summer Program t-shirts
ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling
More informationMath 182. Assignment #4: Least Squares
Introduction Math 182 Assignment #4: Least Squares In any investigation that involves data collection and analysis, it is often the goal to create a mathematical function that fits the data. That is, a
More informationTwo-Stage Least Squares
Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes
More informationDesignDirector Version 1.0(E)
Statistical Design Support System DesignDirector Version 1.0(E) User s Guide NHK Spring Co.,Ltd. Copyright NHK Spring Co.,Ltd. 1999 All Rights Reserved. Copyright DesignDirector is registered trademarks
More informationResearch Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel
Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement
More informationK-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.
K-means clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 K-means Outline K-means, K-medoids Choosing the number of clusters: Gap test, silhouette plot. Mixture
More informationWINKS SDA Windows KwikStat Statistical Data Analysis and Graphs Getting Started Guide
WINKS SDA Windows KwikStat Statistical Data Analysis and Graphs Getting Started Guide 2011 Version 6A Do these tutorials first This series of tutorials provides a quick start to using WINKS. Feel free
More informationRaw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.
Section 2.1 - Introduction Graphs are commonly used to organize, summarize, and analyze collections of data. Using a graph to visually present a data set makes it easy to comprehend and to describe the
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationMinitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.
Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Introduction to Minitab The interface for Minitab is very user-friendly, with a spreadsheet orientation. When you first launch Minitab, you will see
More informationMendel and His Peas Investigating Monhybrid Crosses Using the Graphing Calculator
20 Investigating Monhybrid Crosses Using the Graphing Calculator This activity will use the graphing calculator s random number generator to simulate the production of gametes in a monohybrid cross. The
More information