7.2: Chi-Square Test for Association T.Scofield Nov. 17, 2016

Size: px
Start display at page:

Download "7.2: Chi-Square Test for Association T.Scofield Nov. 17, 2016"

Transcription

1 72: Chi-Square Test for Association TScofield Nov 17, 2016 The goal of this section is to provide means for investigating whether there is an association between two categorical variables Before proceeding, it may be helpful to enter the following commands, which will add new commands called expectedcounts() and chisqstat() to our list of available ones expectedcounts <- function(stable) { casetotal = sum(stable) expcounts = stable for (ii in 1:nrow(sTable)) { for (jj in 1:ncol(sTable)) { expcounts[ii,jj] = sum(stable[ii,])*sum(stable[,jj]) / casetotal return (expcounts) chisqstat <- function(otab) { X2 = 0 etab <- expectedcounts(otab) for (ii in 1:nrow(oTab)) { for (jj in 1:ncol(oTab)) { X2 = X2 + (otab[ii,jj] - etab[ii,jj])^2 / etab[ii,jj] return (X2) Example: Is there an association between socioeconomic status (SES) and smoking? We will address this question using an hypothesis test The null and alternative hypotheses are stated this way: H 0 : H a : no association exists between SES and smoking status there is an association between SES and smoking status One can imagine the data one would collect for a study to address this question For each person sampled, we would record the values of two categorical variables: SES and smoking status case SES Smoking Status 1 middle current 2 low current 3 high never 4 middle former But perhaps the data comes to us not in this raw form, but already summarized as a two-way (contingency) table 1

2 current former never high low middle It is convenient to know steps one can use to build such a table inside RStudio directly from the numbers (ie, when no raw data set is available) These commands will do so, giving it the name smoketable ## current former never ## high ## low ## middle smoke <- matrix(c(51,92,68,43,28,22,22,21,9),ncol=3,byrow=true) colnames(smoke) <- c("current","former","never") rownames(smoke) <- c("high","low","middle") smoketable <- astable(smoke) If we wish to view the table at this point, we may do so by typing the name it in which it is stored: smoketable Because both of the categorical variables have 3 values, the table itself has 9 cells We might wish to see row and column totals as well, which is achieved by wrapping the table s name in an addmargins() command addmargins(smoketable) ## current former never Sum ## high ## low ## middle ## Sum As with goodness-of-fit testing, our test statistic will be χ 2 = (O i E i ) 2 E i The O i here are the numbers found in the various cells of the two-way table, while the E i represent the numbers we would have expected in these cells under the null hypothesis The expectedcounts() command we built above can help us with the E i Both of the commands created there have limited use They both require a two-way table as input, which means they are not appropriate for the goodness-of-fit tests we have previously discussed In our present context, however, we can process expected counts for our smoketable expectedcounts(smoketable) Once again, we can wrap this in addmargins() to see row/column totals addmargins(expectedcounts(smoketable)) ## current former never Sum ## high ## low ## middle ## Sum Take a moment and find the two-way table containing observed counts above Note that row and column totals are unchanged between that table and this one containing expected counts It is the contents of the individual cells which have been altered In the table of expected counts, for instance, we have that for each SES status, the proportion of current smokers is the same: 2

3 among high SES cases, the portion is 6875/211 = 0326 among low SES cases, the portion is 303/93 = 0326 among middle SES cases, the portion is 1694/52 = 0326 These sorts of proportions/ratios are consistent whichever way you look at them, either across rows or columns As another instance, if we focus on proportions of low SES across the various smoking statuses, we see among current smokers, the proportion is 303/116 = 0261 among former smokers, the proportion is 3683/141 = 0261 among those who never smoked, the proportion is 2586/99 = 0261 This is how things would be expected to look in a population perfectly represented by our sample when the two variables are independent (not associated) While software has provided us with the expected counts, the method for calculating each is straightforward, given that each cell has row and column totals which are identical to that of the actual data: E i = (row total) (column total) sample size The chisqstat() function we defined above can compute for us the χ 2 test statistic (though you should practice doing this by hand and see that you obtain the same value) As with our expectedcounts() command, it requires, as input, the two-way table chisqstat(smoketable) ## [1] Assuming our expected counts are all at least 5 (same rule of thumb as the Locks gave us for goodness-of-fit testing), we can obtain an approximate P -value from a chi-square distribution with df s given by df = [(number of rows) - 1] [(number of columns) - 1] In this case, our smallest expected count is 1446, so we choose df = (3 1) (3 1) = 4, and compute the approximate P -value (recalling that this is a 1-sided, right-tailed test): 1 - pchisq(1851, df=4) ## [1] We would reject the null hypothesis at the 5% level (also the 1% level), here, and conclude there is an association between SES and smoking status Example: This one provides a variation on Example 710, p~481 in the text We have access to the raw data for this example, which is found in the data frame WaterTaste We obtain a two-way table of observed frequencies, along with one containing the corresponding expected counts, for the categorical variables in question: UsuallyDrink and First: fulltable <- tally(~usuallydrink + First, data=watertaste) fulltable ## First ## UsuallyDrink Aquafina Fiji SamsChoice Tap ## Bottled ## Filtered ## Tap expectedcounts(fulltable) ## First ## UsuallyDrink Aquafina Fiji SamsChoice Tap 3

4 ## Bottled ## Filtered ## Tap We see the rule of thumb that all expected counts be at least 5 is not met Many authors state a different rule of thumb, saying you are safe to use a chi-square distribution to approximate the P -value if no expected count is less than 1, and no more than 20% of expected counts are less than 5, but even this relaxed rule of thumb is not met At this stage we have two options: We could combine values of a variable This is the approach taken in the text, where they have combined the Filtered and Tap rows into a single category they call "Tap/Filtered" The new expected counts after combining these rows are displayed in parentheses in Table 723 on p 482, and while there is still one of the 8 cells that contains an expected count smaller than 5, the relaxed rule of thumb is satisfied If we really do not wish to combine categories, we might use randomization to produce an approximate null distribution (distribution of χ 2 values from randomization samples) and obtain a P -value from it This approach is not overly difficult in this instance, particularly because we have the raw data set We adopt this approach below The command tally(~shuffle(usuallydrink) + shuffle(first), data=watertaste) produces a randomization sample Notice that if we execute this command and include row/column totals, these totals are maintained even though the simulated observed counts found in the individual cells change addmargins(tally(~shuffle(usuallydrink) + shuffle(first), data=watertaste)) ## shuffle(first) ## shuffle(usuallydrink) Aquafina Fiji SamsChoice Tap Sum ## Bottled ## Filtered ## Tap ## Sum We obtain an individual randomization (χ 2 ) statistic by generating a randomization sample and wrapping that inside a call to the chisqstat() function: chisqstat(tally(~shuffle(usuallydrink) + shuffle(first), data=watertaste)) ## [1] It is this that we would want to repeat often in order to generate a randomization distribution manychisqs = do(1000) * chisqstat(tally(~shuffle(usuallydrink) + shuffle(first), data=watertaste)) head(manychisqs) ## chisqstat ## ## ## ## ## ## Our test statistic is obtained similarly, but without shuffling: chisqstat(tally(~usuallydrink + First, data=watertaste)) ## [1]

5 We view the corresponding distribution, shading the region to the right of our test statistic, and compute the approximate P -value As the two-way table has 3 rows and 4 columns, the chi-square distribution which best approximates the null distribution is the one with df = (2)(3) = 6 We overlay this distribution to illustrate how similar (or not) it is to the randomization distribution Since the rule of thumb for using a chi-square distribution is not met, the implication is that these two (the randomization distribution and the chi-square density curve) are not similar enough to warrant using the pchisq() command to obtain a P -value histogram(~chisqstat, data=manychisqs, groups = chisqstat>=497) plotdist("chisq", df=6, add=true) 010 Density chisqstat nrow(subset(manychisqs, chisqstat >= 497)) / 1000 ## [1] 0553 With this high P -value, we fail to reject the null hypothesis 5

Goodness-of-Fit Testing T.Scofield Nov. 16, 2016

Goodness-of-Fit Testing T.Scofield Nov. 16, 2016 Goodness-of-Fit Testing T.Scofield Nov. 16, 2016 We do goodness-of-fit testing with a single categorical variable, to see if the distribution of its sampled values fits a specified probability model. The

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016

Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Randomization Distributions in Two-Proportion Settings By calling our setting a two proportion one, I mean that the data frame

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration Data Mining 2.4 Fall 2008 Instructor: Dr. Masoud Yaghini Data integration: Combines data from multiple databases into a coherent store Denormalization tables (often done to improve performance by avoiding

More information

STA215 Inference about comparing two populations

STA215 Inference about comparing two populations STA215 Inference about comparing two populations Al Nosedal. University of Toronto. Summer 2017 June 22, 2017 Two-sample problems The goal of inference is to compare the responses to two treatments or

More information

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments; A. Goals of Exercise Biology 164 Laboratory Using Comparative Statistics in Biology "Statistics" is a mathematical tool for analyzing and making generalizations about a population from a number of individual

More information

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

Using Large Data Sets Workbook Version A (MEI)

Using Large Data Sets Workbook Version A (MEI) Using Large Data Sets Workbook Version A (MEI) 1 Index Key Skills Page 3 Becoming familiar with the dataset Page 3 Sorting and filtering the dataset Page 4 Producing a table of summary statistics with

More information

Introduction to Hypothesis Testing T.Scofield 10/03/2016

Introduction to Hypothesis Testing T.Scofield 10/03/2016 Introduction to Hypothesis Testing T.Scofield 10/03/016 Hypothesis Testing: the steps 1. Identify the research question, along with relevant variables.. Formulate hypotheses (null and alternative) appropriate

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created

More information

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals BIOSTATS 540 Fall 017 8. SUPPLEMENT Normal, T, Chi Square, F and Sums of Normals Page 1 of Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals Topic 1. Normal Distribution.. a. Definition..

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

Table Of Contents. Table Of Contents

Table Of Contents. Table Of Contents Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13. Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data

More information

Screening Design Selection

Screening Design Selection Screening Design Selection Summary... 1 Data Input... 2 Analysis Summary... 5 Power Curve... 7 Calculations... 7 Summary The STATGRAPHICS experimental design section can create a wide variety of designs

More information

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS 3- Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS A) Frequency Distributions For Samples Defn: A FREQUENCY DISTRIBUTION is a tabular or graphical display

More information

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Celso C. Ribeiro Isabel Rosseti Reinaldo C. Souza Universidade Federal Fluminense, Brazil July 2012 1/45 Contents

More information

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010 CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010 Quick Summary A workbook an Excel document that stores data contains one or more pages called a worksheet. A worksheet or spreadsheet is stored in a workbook, and

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

Data Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 6. k-means Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Outline K-Means Algorithm in K-Means Node References K-Means Algorithm in Overview The k-means method is a clustering

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

Strategies for Modeling Two Categorical Variables with Multiple Category Choices

Strategies for Modeling Two Categorical Variables with Multiple Category Choices 003 Joint Statistical Meetings - Section on Survey Research Methods Strategies for Modeling Two Categorical Variables with Multiple Category Choices Christopher R. Bilder Department of Statistics, University

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys

Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Steven Pedlow 1, Kanru Xia 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

SPSS TRAINING SPSS VIEWS

SPSS TRAINING SPSS VIEWS SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data

More information

Lab 5 - Risk Analysis, Robustness, and Power

Lab 5 - Risk Analysis, Robustness, and Power Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

PIVOT TABLES IN MICROSOFT EXCEL 2016

PIVOT TABLES IN MICROSOFT EXCEL 2016 PIVOT TABLES IN MICROSOFT EXCEL 2016 A pivot table is a powerful tool that allows you to take a long list of data and transform it into a more compact and readable table. In the process, the tool allows

More information

TI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock

TI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock TI-83 Users Guide to accompany by Lock, Lock, Lock, Lock, and Lock TI-83 Users Guide- 1 Getting Started Entering Data Use the STAT menu, then select EDIT and hit Enter. Enter data for a single variable

More information

Nonparametric Testing

Nonparametric Testing Nonparametric Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

For our example, we will look at the following factors and factor levels.

For our example, we will look at the following factors and factor levels. In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball

More information

Exploring and Understanding Data Using R.

Exploring and Understanding Data Using R. Exploring and Understanding Data Using R. Loading the data into an R data frame: variable

More information

Statistics with a Hemacytometer

Statistics with a Hemacytometer Statistics with a Hemacytometer Overview This exercise incorporates several different statistical analyses. Data gathered from cell counts with a hemacytometer is used to explore frequency distributions

More information

Multivariate Capability Analysis

Multivariate Capability Analysis Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8

More information

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015 STAT 113: Lab 9 Colin Reimer Dawson Last revised November 10, 2015 We will do some of the following together. The exercises with a (*) should be done and turned in as part of HW9. Before we start, let

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Condence Intervals about a Single Parameter:

Condence Intervals about a Single Parameter: Chapter 9 Condence Intervals about a Single Parameter: 9.1 About a Population Mean, known Denition 9.1.1 A point estimate of a parameter is the value of a statistic that estimates the value of the parameter.

More information

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). Suppose you have a one way design, and want to do an ANOVA, but discover that your data are seriously not normal? Just

More information

Section I: Dual Retrieval Models

Section I: Dual Retrieval Models Created by Carlos Gomes (cf365@cornell.edu) and Ryan Yeh (ry58@cornell.edu) 1 The purpose of this tutorial is to outline the application of a group of two-stage Markov models that have been used to quantify

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Statistical Tests for Variable Discrimination

Statistical Tests for Variable Discrimination Statistical Tests for Variable Discrimination University of Trento - FBK 26 February, 2015 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, 2015 1 / 31 General statistics Descriptional:

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Elementary Statistics: Looking at the Big Picture

Elementary Statistics: Looking at the Big Picture Excel 2007 Technology Guide for Elementary Statistics: Looking at the Big Picture 1st EDITION Nancy Pfenning University of Pittsburgh Prepared by Nancy Pfenning University of Pittsburgh Melissa M. Sovak

More information

JMP 10 Student Edition Quick Guide

JMP 10 Student Edition Quick Guide JMP 10 Student Edition Quick Guide Instructions presume an open data table, default preference settings and appropriately typed, user-specified variables of interest. RMC = Click Right Mouse Button Graphing

More information

Normal Curves and Sampling Distributions

Normal Curves and Sampling Distributions Normal Curves and Sampling Distributions 6 Copyright Cengage Learning. All rights reserved. Section 6.2 Standard Units and Areas Under the Standard Normal Distribution Copyright Cengage Learning. All rights

More information

CS130 Software Tools. Fall 2010 Intro to SPSS and Data Handling

CS130 Software Tools. Fall 2010 Intro to SPSS and Data Handling Software Tools Intro to SPSS and Data Handling 1 Types of Analyses When doing data analysis, we are interested in two types of summaries: Statistical Summaries (e.g. descriptive, hypothesis testing) Visual

More information

11. Chi Square. Calculate Chi Square for contingency tables. A Chi Square is used to analyze categorical data. It compares observed

11. Chi Square. Calculate Chi Square for contingency tables. A Chi Square is used to analyze categorical data. It compares observed 11. Chi Square Objectives Calculate goodness of fit Chi Square Calculate Chi Square for contingency tables Calculate effect size Save data entry time by weighting cases A Chi Square is used to analyze

More information

Paired Home Range Size, Overlap, Joint-Space Use, and Similarity Indexes Justin Clapp ZOO 5890 Objective - quantify changes in home range

Paired Home Range Size, Overlap, Joint-Space Use, and Similarity Indexes Justin Clapp ZOO 5890 Objective - quantify changes in home range Paired Home Range Size, Overlap, Joint-Space Use, and Similarity Indexes Justin Clapp ZOO 5890 Objective - quantify changes in home range distributions using paired GPS location data in R The Brownian

More information

MINITAB 17 BASICS REFERENCE GUIDE

MINITAB 17 BASICS REFERENCE GUIDE MINITAB 17 BASICS REFERENCE GUIDE Dr. Nancy Pfenning September 2013 After starting MINITAB, you'll see a Session window above and a worksheet below. The Session window displays non-graphical output such

More information

GO! with Microsoft Excel 2016 Comprehensive

GO! with Microsoft Excel 2016 Comprehensive GO! with Microsoft Excel 2016 Comprehensive First Edition Chapter 2 Using Functions, Creating Tables, and Managing Large Workbooks Use SUM and Statistical Functions The SUM function is a predefined formula

More information

Cluster Randomization Create Cluster Means Dataset

Cluster Randomization Create Cluster Means Dataset Chapter 270 Cluster Randomization Create Cluster Means Dataset Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. Examples of such clusters

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 The purpose of this handout is to lead you through a simple exercise using the R computing language. It is essentially an assignment, although there will be nothing to hand in.

More information

Minitab Guide for MA330

Minitab Guide for MA330 Minitab Guide for MA330 The purpose of this guide is to show you how to use the Minitab statistical software to carry out the statistical procedures discussed in your textbook. The examples usually are

More information

STATA 13 INTRODUCTION

STATA 13 INTRODUCTION STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

StatsMate. User Guide

StatsMate. User Guide StatsMate User Guide Overview StatsMate is an easy-to-use powerful statistical calculator. It has been featured by Apple on Apps For Learning Math in the App Stores around the world. StatsMate comes with

More information

Unit 5: Estimating with Confidence

Unit 5: Estimating with Confidence Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating

More information

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology ❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Binary Diagnostic Tests Clustered Samples

Binary Diagnostic Tests Clustered Samples Chapter 538 Binary Diagnostic Tests Clustered Samples Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. In the twogroup case, each cluster

More information

Introduction to Statistics lab 1

Introduction to Statistics lab 1 Introduction to Statistics lab 1 Johan A. Elkink jos.elkink@ucd.ie 7 September 2015 The main purpose of today s class is to get a feel for how to open, access and view data, and to get some familiarity

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit

More information

Minitab on the Math OWL Computers (Windows NT)

Minitab on the Math OWL Computers (Windows NT) STAT 100, Spring 2001 Minitab on the Math OWL Computers (Windows NT) (This is an incomplete revision by Mike Boyle of the Spring 1999 Brief Introduction of Benjamin Kedem) Department of Mathematics, UMCP

More information

CH 21 CONSECUTIVE INTEGERS

CH 21 CONSECUTIVE INTEGERS 201 CH 21 CONSECUTIVE INTEGERS Introduction An integer is either a positive whole number, or zero, or a negative whole number; in other words it s the collection of numbers:... 4, 3, 2, 1, 0, 1, 2, 3,

More information

Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida

Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida FINAL REPORT Submitted October 2004 Prepared by: Daniel Gann Geographic Information

More information

Example Lecture 12: The Stiffness Method Prismatic Beams. Consider again the two span beam previously discussed and determine

Example Lecture 12: The Stiffness Method Prismatic Beams. Consider again the two span beam previously discussed and determine Example 1.1 Consider again the two span beam previously discussed and determine The shearing force M1 at end B of member B. The bending moment M at end B of member B. The shearing force M3 at end B of

More information

STATEWIDE WORKLOAD ANALYTIC TOOL

STATEWIDE WORKLOAD ANALYTIC TOOL ADMINISTRATOR S REFERENCE GUIDE & SYSTEM DOCUMENTATION STATEWIDE WORKLOAD ANALYTIC TOOL PREPARED FOR: MINNESOTA DEPARTMENT OF HUMAN SERVICES BY: HORNBY ZELLER ASSOCIATES, INC. TABLE OF CONTENTS INTRODUCTION

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

WINKS SDA 7. Version 7

WINKS SDA 7. Version 7 WINKS SDA 7 Version 7 (For BASIC and PROFESSIONAL Editions of WINKS SDA) PowerPoint Slides for this Guide are svailable at the website Click Instructors. www.texasoft.com TexaSoft, 2015 Do these tutorials

More information

Box-Cox Transformation

Box-Cox Transformation Chapter 190 Box-Cox Transformation Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a single batch of data. It is used to modify the distributional shape of a set

More information

Introduction to Statistics lab 1

Introduction to Statistics lab 1 Introduction to Statistics lab 1 Johan A. Elkink jos.elkink@ucd.ie 10 September 2018 The main purpose of today s class is to get a feel for how to open, access and view data, and to get some familiarity

More information

Stat 5100 Handout #14.a SAS: Logistic Regression

Stat 5100 Handout #14.a SAS: Logistic Regression Stat 5100 Handout #14.a SAS: Logistic Regression Example: (Text Table 14.3) Individuals were randomly sampled within two sectors of a city, and checked for presence of disease (here, spread by mosquitoes).

More information

CSE141 Problem Set #4 Solutions

CSE141 Problem Set #4 Solutions CSE141 Problem Set #4 Solutions March 5, 2002 1 Simple Caches For this first problem, we have a 32 Byte cache with a line length of 8 bytes. This means that we have a total of 4 cache blocks (cache lines)

More information

Navigating in SPSS. C h a p t e r 2 OBJECTIVES

Navigating in SPSS. C h a p t e r 2 OBJECTIVES C h a p t e r 2 Navigating in SPSS 2.1 Introduction and Objectives As with any new software program you may use, it is important that you are able to move around the screen with the mouse and that you

More information

Cognalysis TM Reserving System User Manual

Cognalysis TM Reserving System User Manual Cognalysis TM Reserving System User Manual Return to Table of Contents 1 Table of Contents 1.0 Starting an Analysis 3 1.1 Opening a Data File....3 1.2 Open an Analysis File.9 1.3 Create Triangles.10 2.0

More information

WORKING WITH PIVOT TABLES

WORKING WITH PIVOT TABLES WORKING WITH PIVOT TABLES Introduction Perhaps the most powerful analytical tool that Excel provides is the PivotTable command, with which one can cross-tabulate data stored in Excel lists. A cross-tabulation

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

Math 182. Assignment #4: Least Squares

Math 182. Assignment #4: Least Squares Introduction Math 182 Assignment #4: Least Squares In any investigation that involves data collection and analysis, it is often the goal to create a mathematical function that fits the data. That is, a

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

DesignDirector Version 1.0(E)

DesignDirector Version 1.0(E) Statistical Design Support System DesignDirector Version 1.0(E) User s Guide NHK Spring Co.,Ltd. Copyright NHK Spring Co.,Ltd. 1999 All Rights Reserved. Copyright DesignDirector is registered trademarks

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining. K-means clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 K-means Outline K-means, K-medoids Choosing the number of clusters: Gap test, silhouette plot. Mixture

More information

WINKS SDA Windows KwikStat Statistical Data Analysis and Graphs Getting Started Guide

WINKS SDA Windows KwikStat Statistical Data Analysis and Graphs Getting Started Guide WINKS SDA Windows KwikStat Statistical Data Analysis and Graphs Getting Started Guide 2011 Version 6A Do these tutorials first This series of tutorials provides a quick start to using WINKS. Feel free

More information

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques. Section 2.1 - Introduction Graphs are commonly used to organize, summarize, and analyze collections of data. Using a graph to visually present a data set makes it easy to comprehend and to describe the

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Introduction to Minitab The interface for Minitab is very user-friendly, with a spreadsheet orientation. When you first launch Minitab, you will see

More information

Mendel and His Peas Investigating Monhybrid Crosses Using the Graphing Calculator

Mendel and His Peas Investigating Monhybrid Crosses Using the Graphing Calculator 20 Investigating Monhybrid Crosses Using the Graphing Calculator This activity will use the graphing calculator s random number generator to simulate the production of gametes in a monohybrid cross. The

More information