Hypothesis Test Exercises from Class, Oct. 12, 2018
|
|
- Judith Horton
- 5 years ago
- Views:
Transcription
1 Hypothesis Test Exercises from Class, Oct. 12, 218 Question 1: Is there a difference in mean sepal length between virsacolor irises and setosa ones? Worked on by Victoria BienAime and Pearl Park Null Hypothesis: µ v µ s = Alternative Hypothesis: µ v µ s > Only looking at data that excludes data from the species Virginica: x<- droplevels(subset(iris, Species!= "virginica")) head(x) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## setosa ## setosa ## setosa ## setosa ## setosa ## setosa Random Distribution for the mean difference in mean Sepal Length for Veriscolor (V) and Setosa (S): y<-do(1)* (diff(mean(sepal.length~shuffle(species), data=x))) gf_histogram(~versicolor, data = y, color = "black", fill = "red") Observed mean difference: versicolor diff(mean(sepal.length~species, data= x)) ## versicolor ##.93 Finding the p-value for the observed mean difference, we need to find the number of rows/cases where the they are greater than or equal to.93: nrow(subset(y, versicolor >=.93)) 1
2 ## [1] This value is divided by the total number of values (n=1) nrow(subset(y, versicolor >=.93))/1 ## [1] So the null hypothesis is rejected because it is off the chart. Question 3: Is there a positive correlation between eruption length and wait time for the Old Faithful geyser? Worked on by Danish and Michael Null Hypothesis: There is no correlation between eruption length and wait time. H : ρ= Alternative Hypothesis: There is a positive correlation between eruption length and wait time. Ha: ρ> cor(eruptions~waiting, data = faithful) ## [1] x <- do(5) * cor(eruptions ~ shuffle(waiting), data=faithful) head(x) ## cor ## ## ## ## ## ## gf_histogram(~cor, data=x, color="black") cor nrow(subset(x, cor>=.9)/5) ## [1] 2
3 Given that P-value is which is smaller than.5, obtaining a value of.98 is not frequent in a world where the null hypothesis is true. So we reject the null hypothesis. Question 4: Is a nonzero correlation even between unrelated data? Worked on by Thomas Scofield In introducing this question, I suggested these commands for generating our lists of numbers. x <- runif(n=5, min=15, max=45) y <- rnorm(n=5, mean=3, sd=5) In choosing the 5 numbers now found in x, it s as if all numbers between 15 and 45 were equally likely, and 5 were chosen at random; whereas the numbers for y were not equally likely, but 3, and numbers nearby, were most likely to be chosen, with the likelihood falling off as a number becomes farther from 3 (falling off like a normal distribution). Here is a scatter plot of the resulting chosen xy-pairs. gf_point(y~x) 4 35 y x The x- and y-coordinates of these plotted points were chosen with no relationship between them. But they will still yield a nonzero sample correlation. cor(y ~ x) ## [1] To see if this test statistic is statistically significant, we obtain a randomization distribution and see how often a result as extreme as this one occurs. Our hypotheses are H : ρ =, H a : ρ. Under this null hypothesis, one is just as likely to see any of the y-values paired with any of the x-values. manycors <- do(1) * cor(y ~ shuffle(x)) head(manycors) ## cor ## ## ##
4 ## ## ## We plot these randomization statistis, shading all that are at least as extreme (on either size of the null value ) as ours. gf_histogram(~cor, data=manycors, color="black", fill=~abs(cor) >=.1855) abs(cor) >=.1855 FALSE TRUE cor Counting occurrences of randomization statistics this extreme, we find the approximate P -value. nrow( subset(manycors, cor >=.1855) ) / 1 ## [1].956 What we have witnessed here, a correlation of.1856, only occurs about 1% of the time when 5 points are chosen with the x- and y- coordinates chosen independently. If we set α =.1 and drew a conclusion, we would reject the null hypothesis and, in this case, would have committed a Type I error, something that happens in 1% of cases with a true null hypothesis and α =.1. Question 5: Do births occur on weekend days in their proper proportion to a full week? Worked on by Kaitlyn Westra, Maddie Lenning and Allyson Prichard H o : p = 2/7 H a : p 2/7 First, we ll find out how many births happened throughout 215. sum(~births,data=births215) ## [1] totalbirths<-sum(~births,data=births215) sunbirths<-sum(~births, data=subset(births215,wday=="sun")) satbirths<-sum(~births, data=subset(births215,wday=="sat")) (sunbirths+satbirths)/totalbirths ## [1]
5 teststat <- (sunbirths+satbirths)/totalbirths That s the proportion of Weekend Births out of the total births. (I feel like that was done in a roundabout way though. I think there s a better and/or more exact way... ) This number is our test statistic. Here s our randomization distribution: We tried several more familiar ways to produce randomization distributions for a single proportion. Using rflip() to flip a weighted coin 3 million times in order to produce a single randomization sample proved excessively slow. The idea of sampling with replacement from a bag, implemented below, was about 5 times faster, but still took a long time when repeated just 1 times. bag <- c(,,,,,1,1) manyprobs <- do(1)*(sum(sample(bag,size=totalbirths,replace=true))/totalbirths) In contrast, a command we haven t previously seen, but one tailored for this very purpose, was lightning fast, even in producing 5 randomization statistics. rbinom(5, size=totalbirths, prob=2/7) / totalbirths A better version, one that it creates a data frame called manyprobs with a column called result, is this one: manyprobs <- data.frame(result = rbinom(5, size=totalbirths, prob=2/7) / totalbirths) gf_histogram(~result, data=manyprobs, color="black") result 2*nrow(subset(manyProbs, result <= teststat)) / 5 ## [1] It looks like our approximate P-value under a 2-sided alternative hypothesis is that number. P-value =.4, so we can reject our H o. Question 6: Are women left-handed at a different rate than men? Worked on by Abena Oduro Loading the required data The prompt I chose to use was prompt #6 which reads Are women left-handed at a different rate than men?. To load the data from a comma separated values file, I used the read.csv() command, and saved it as hands. However, the Selfhandedness column in this dataset had some empty values, so I used the 5
6 droplevels(subset()) command to remove those and focus specifically and those that had the values R for right-handedness and L for left-handedness. I saved these under handy. hands<- read.csv(" handy<-droplevels(subset(hands,selfhandedness=="l" selfhandedness=="r")) Calculating the test statistic For this prompt, I expected the null hypothesis to be H : p D =, where p D is p m -p f. This can be interpreted as the difference between the proportion of males that are left-handed (p m )nminus the proportion of females that are lefthanded (p f ) in the population is equal. I expected the alternative hypothesis to be H a : p D. This can be interpreted as the difference between the proportion of males and females that are left handed in the population are not equal. For all cases, p m -p f can be summarized as p D. To calculate the test statistic,p D I first used the tally() command to find the numbers of males and females that were right or left handed. Then, using the prop() command and success== L I narrowed down the results to the proportion of males and females who were left handed only. This made data specific to the question Are women left-handed at a different rate than men? Finally to calculate x D, I used the diff()command to find the difference between the two proportions, and it was calculated to be x D = tally(selfhandedness~gender, data=handy) ## gender ## selfhandedness F M ## L ## R prop(selfhandedness~gender, data=handy, success="l") ## prop_l.f prop_l.m ## diff(prop(selfhandedness~gender, data=handy, success="l")) ## prop_l.m ## Creating a Randomization Distribution To create a randomization distribution, I used the same diff(prop()) command with success= L, but this time, I used shuffle(gender) to tell R to assign random values of gender to different values of selfhandedness. This was done create a set of values under which the null hypothesis is true. Using the do(5) command, I created 5 samples under the null condition and saved them under lefty. To view the distribution of these samples (p D ), I used the command gf_histogram to generate a histogram of the values (p) saved in lefty. As expected, it was centered around the null value, O. lefty<-do(5)*diff(prop(selfhandedness~shuffle(gender), data=handy, success="l")) gf_histogram(~prop_l.m, data=lefty, color="black",fill="white") 6
7 6 4 2 Calculating the P-Value prop_l.m To calculate the P-value of the original test statistic, x D = , I used the nrow(subset()) command to find the number of values in the distribution that were above Since the H a implies a two-tailed test, I added the number of values in the distribution that were below Then, I divided the whole command by 5, which is the number of samples in the distribution. This was done to find the proportion of the 5 samples which were as extreme or more extreme than to the right of and to the left of to give the p-value. The p-value was calculated to be 1. (nrow( subset(lefty, prop_l.m >= ))+nrow( subset(lefty, prop_l.m <= )))/5 ## [1] 1 Question 7: Is a male student at Calvin College typical when it comes to height? Worked on by Daniel Sculley and Matthew Vos H : µ h = 7 H a : µ h 7 Significance Threshold: α :.5 First: We loaded the data set we will be analyzing and pulled out the all male data set. y <- read.csv(" CalvinStats <- subset(y, gender=="m") Next: We calculated the test statistic mean mean(~height, data=calvinstats, na.rm=true) ## [1] Test statistic x h : We will use this to find the amount we need to add to every data point to generate our randomization distribution: ## [1]
8 Thus we add: to every height on the data set Next: We generate a randomization distribution x <- do(5)*mean(~resample(height ), data=calvinstats, na.rm=true) gf_histogram(~mean, data=x, color="black") mean Finally: We calculate our p-value for the data set (nrow(subset(x, mean >= ))/5)*2 ## [1] As there are no values as extreme as our test statistic in the randomization distribution, it is safe to say that it is significant at the.5 level. File creation date: Editor: Thomas Scofield 8
Introduction to Hypothesis Testing T.Scofield 10/03/2016
Introduction to Hypothesis Testing T.Scofield 10/03/016 Hypothesis Testing: the steps 1. Identify the research question, along with relevant variables.. Formulate hypotheses (null and alternative) appropriate
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationk Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)
k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors
More informationData Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47
Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise
More informationResearch Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel
Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement
More informationIntroduction to R and Statistical Data Analysis
Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,
More informationElementary Statistics. Chapter 2 Review: Summarizing & Graphing Data
Name Elementary Statistics Date Period Chapter 2 Review: Summarizing & Graphing Data Quick Quiz p.74 #1-10 Use the following information to answer questions 1-3: When one is constructing a table representing
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses
More information23.2 Normal Distributions
1_ Locker LESSON 23.2 Normal Distributions Common Core Math Standards The student is expected to: S-ID.4 Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate
More informationGraphing Bivariate Relationships
Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship
More informationMULTIVARIATE ANALYSIS USING R
MULTIVARIATE ANALYSIS USING R B N Mandal I.A.S.R.I., Library Avenue, New Delhi 110 012 bnmandal @iasri.res.in 1. Introduction This article gives an exposition of how to use the R statistical software for
More informationHypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016
Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Randomization Distributions in Two-Proportion Settings By calling our setting a two proportion one, I mean that the data frame
More informationExcel Tips and FAQs - MS 2010
BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my
More informationIT 403 Practice Problems (1-2) Answers
IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles
More informationChapter 7 Assignment due Wednesday, May 24
due Wednesday, May 24 Calculating Probabilities for Normal Distributions Overview What you re going to do in this assignment is use an online applet to calculate: probabilities associated with given -scores
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationIntroduction to Minitab 1
Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,
More informationUSE IBM IN-DATABASE ANALYTICS WITH R
USE IBM IN-DATABASE ANALYTICS WITH R M. WURST, C. BLAHA, A. ECKERT, IBM GERMANY RESEARCH AND DEVELOPMENT Introduction To process data, most native R functions require that the data first is extracted from
More informationCS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1 Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development
More informationMean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationWhile not exactly the same, these definitions highlight four key elements of statistics.
What Is Statistics? Some Definitions of Statistics This is a book primarily about statistics, but what exactly is statistics? In other words, what is this book about? 1 Here are some definitions of statistics
More informationGoodness-of-Fit Testing T.Scofield Nov. 16, 2016
Goodness-of-Fit Testing T.Scofield Nov. 16, 2016 We do goodness-of-fit testing with a single categorical variable, to see if the distribution of its sampled values fits a specified probability model. The
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:
More informationClojure & Incanter. Introduction to Datasets & Charts. Data Sorcery with. David Edgar Liebke
Data Sorcery with Clojure & Incanter Introduction to Datasets & Charts National Capital Area Clojure Meetup 18 February 2010 David Edgar Liebke liebke@incanter.org Outline Overview What is Incanter? Getting
More informationTutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3
Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3 This tutorial shows you: how to simulate a random process how to plot the distribution of a variable how to assess the distribution
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationLecture 31 Sections 9.4. Tue, Mar 17, 2009
s for s for Lecture 31 Sections 9.4 Hampden-Sydney College Tue, Mar 17, 2009 Outline s for 1 2 3 4 5 6 7 s for Exercise 9.17, page 582. It is believed that 20% of all university faculty would be willing
More informationHot springs that erupt intermittently in a column
L A B 1 MODELING OLD FAITHFUL S ERUPTIONS Modeling Data Hot springs that erupt intermittently in a column of steam and hot water are called geysers. Geysers may erupt in regular or irregular intervals
More informationCHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)
CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is
More informationTutorial for the R Statistical Package
Tutorial for the R Statistical Package University of Colorado Denver Stephanie Santorico Mark Shin Contents 1 Basics 2 2 Importing Data 10 3 Basic Analysis 14 4 Plotting 22 5 Installing Packages 29 This
More informationCHAPTER 2 DESCRIPTIVE STATISTICS
CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of
More informationKTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn
KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in
More informationLecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #
Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms
More informationInstance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.
Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k
More informationHomework 1 Excel Basics
Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the
More informationDecision Trees In Weka,Data Formats
CS 4510/9010 Applied Machine Learning 1 Decision Trees In Weka,Data Formats Paula Matuszek Fall, 2016 J48: Decision Tree in Weka 2 NAME: weka.classifiers.trees.j48 SYNOPSIS Class for generating a pruned
More informationAverages and Variation
Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus
More informationSection 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc
Section 2-2 Frequency Distributions Copyright 2010, 2007, 2004 Pearson Education, Inc. 2.1-1 Frequency Distribution Frequency Distribution (or Frequency Table) It shows how a data set is partitioned among
More information3. Probability 51. probability A numerical value between 0 and 1 assigned to an event to indicate how often the event occurs (in the long run).
3. Probability 51 3 Probability 3.1 Key Definitions and Ideas random process A repeatable process that has multiple unpredictable potential outcomes. Although we sometimes use language that suggests that
More informationChapter 5: The beast of bias
Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared
More informationmmpf: Monte-Carlo Methods for Prediction Functions by Zachary M. Jones
CONTRIBUTED RESEARCH ARTICLE 1 mmpf: Monte-Carlo Methods for Prediction Functions by Zachary M. Jones Abstract Machine learning methods can often learn high-dimensional functions which generalize well
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationOld Faithful Chris Parrish
Old Faithful Chris Parrish 17-4-27 Contents Old Faithful eruptions 1 data.................................................. 1 duration................................................ 1 waiting time..............................................
More informationLinear discriminant analysis and logistic
Practical 6: classifiers Linear discriminant analysis and logistic This practical looks at two different methods of fitting linear classifiers. The linear discriminant analysis is implemented in the MASS
More informationIntroduction to R. Daniel Berglund. 9 November 2017
Introduction to R Daniel Berglund 9 November 2017 1 / 15 R R is available at the KTH computers If you want to install it yourself it is available at https://cran.r-project.org/ Rstudio an IDE for R is
More information2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;
A. Goals of Exercise Biology 164 Laboratory Using Comparative Statistics in Biology "Statistics" is a mathematical tool for analyzing and making generalizations about a population from a number of individual
More informationIntro to R for Epidemiologists
Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance
More informationHelp Guide DATA INTERACTION FOR PSSA /PASA CONTENTS
Help Guide Help Guide DATA INTERACTION FOR PSSA /PASA 2015+ CONTENTS 1. Introduction... 4 1.1. Data Interaction Overview... 4 1.2. Technical Support... 4 2. Access... 4 2.1. Single Sign-On Accoutns...
More informationWorkshop 8: Model selection
Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some
More informationLab 4: Distributions of random variables
Lab 4: Distributions of random variables In this lab we ll investigate the probability distribution that is most central to statistics: the normal distribution If we are confident that our data are nearly
More informationSPSS. (Statistical Packages for the Social Sciences)
Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.
More informationappstats6.notebook September 27, 2016
Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using
More informationExploring and Understanding Data Using R.
Exploring and Understanding Data Using R. Loading the data into an R data frame: variable
More informationExcel 2010 with XLSTAT
Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with
More informationUnit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users
BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit
More informationINTRODUCTION TO THE SAS ANNOTATE FACILITY
Improving Your Graphics Using SAS/GRAPH Annotate Facility David J. Pasta, Ovation Research Group, San Francisco, CA David Mink, Ovation Research Group, San Francisco, CA ABSTRACT Have you ever created
More information8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10
8: Statistics Statistics: Method of collecting, organizing, analyzing, and interpreting data, as well as drawing conclusions based on the data. Methodology is divided into two main areas. Descriptive Statistics:
More informationData Manipulation using dplyr
Data Manipulation in R Reading and Munging Data L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2017 Data Manipulation using dplyr The dplyr is a package
More informationChapter 2: The Normal Distribution
Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60
More informationIntroduction to Stata Toy Program #1 Basic Descriptives
Introduction to Stata 2018-19 Toy Program #1 Basic Descriptives Summary The goal of this toy program is to get you in and out of a Stata session and, along the way, produce some descriptive statistics.
More informationarulescba: Classification for Factor and Transactional Data Sets Using Association Rules
arulescba: Classification for Factor and Transactional Data Sets Using Association Rules Ian Johnson Southern Methodist University Abstract This paper presents an R package, arulescba, which uses association
More informationAn Introduction to R Graphics
An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost
More informationIntroduction to WHO s DHIS2 Data Quality Tool
Introduction to WHO s DHIS2 Data Quality Tool 1. Log onto the DHIS2 instance: https://who.dhis2.net/dq Username: demo Password: UGANDA 2016 2. Click on the menu icon in the upper right of the screen (
More informationSTAT 135 Lab 1 Solutions
STAT 135 Lab 1 Solutions January 26, 2015 Introduction To complete this lab, you will need to have access to R and RStudio. If you have not already done so, you can download R from http://cran.cnr.berkeley.edu/,
More informationChapter 2. Frequency distribution. Summarizing and Graphing Data
Frequency distribution Chapter 2 Summarizing and Graphing Data Shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values
More informationheight VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N
Math 3: CSM Tutorial: Probability, Statistics, and Navels Fall 2 In this worksheet, we look at navel ratios, means, standard deviations, relative frequency density histograms, and probability density functions.
More informationDistributions of random variables
Chapter 3 Distributions of random variables 31 Normal distribution Among all the distributions we see in practice, one is overwhelmingly the most common The symmetric, unimodal, bell curve is ubiquitous
More informationTable Of Contents. Table Of Contents
Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationNotes on Simulations in SAS Studio
Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write
More informationStatistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.
Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More information2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationSTATS PAD USER MANUAL
STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,
More informationBIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA
BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the
More informationUnit Testing as Hypothesis Testing
Unit Testing as Hypothesis Testing Jonathan Clark September 19, 2012 5 minutes You should test your code. Why? To find bugs. Even for seasoned programmers, bugs are an inevitable reality. Today, we ll
More informationDr. V. Alhanaqtah. Econometrics. Graded assignment
LABORATORY ASSIGNMENT 4 (R). SURVEY: DATA PROCESSING The first step in econometric process is to summarize and describe the raw information - the data. In this lab, you will gain insight into public health
More informationData Visualization Using R & ggplot2. Karthik Ram October 6, 2013
Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")
More informationANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a
ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a Put the following data into an spss data set: Be sure to include variable and value labels and missing value specifications for all variables
More informationFunction Approximation and Feature Selection Tool
Function Approximation and Feature Selection Tool Version: 1.0 The current version provides facility for adaptive feature selection and prediction using flexible neural tree. Developers: Varun Kumar Ojha
More informationChapter 2: Linear Equations and Functions
Chapter 2: Linear Equations and Functions Chapter 2: Linear Equations and Functions Assignment Sheet Date Topic Assignment Completed 2.1 Functions and their Graphs and 2.2 Slope and Rate of Change 2.1
More informationBL5229: Data Analysis with Matlab Lab: Learning: Clustering
BL5229: Data Analysis with Matlab Lab: Learning: Clustering The following hands-on exercises were designed to teach you step by step how to perform and understand various clustering algorithm. We will
More information4. Descriptive Statistics: Measures of Variability and Central Tendency
4. Descriptive Statistics: Measures of Variability and Central Tendency Objectives Calculate descriptive for continuous and categorical data Edit output tables Although measures of central tendency and
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationDistributions of Continuous Data
C H A P T ER Distributions of Continuous Data New cars and trucks sold in the United States average about 28 highway miles per gallon (mpg) in 2010, up from about 24 mpg in 2004. Some of the improvement
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers HW 34. Sketch
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationAND NUMERICAL SUMMARIES. Chapter 2
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationDescription/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources
R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org
More information