Assignment 5.5. Nothing here to hand in

Size: px
Start display at page:

Download "Assignment 5.5. Nothing here to hand in"

Transcription

1 Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse: readr ## Loading tidyverse: purrr ## Loading tidyverse: dplyr ## Conflicts with tidy packages ## filter(): dplyr, stats ## lag(): dplyr, stats 1. Can students throw a baseball farther than a softball? A statistics class, containing 24 students, went out to a football field to try to answer this question. Each student warmed up and then threw each type of ball as far as they could. The order of ball types was randomized: some students threw the baseball first, and some threw the softball first. (A softball is bigger than a baseball, so we might expect that a softball would be harder to throw a long way than a baseball.) The data are in http: // in three columns: the first is a number identifying the student, the second is the distance thrown with the baseball (in yards) and the third is the distance thrown with the softball (also in yards). (a) Read the data into SAS. There are no column headers, which you ll need to take into account. Solution: The file extension suggests that the data values are separated by spaces, which is correct, but there are no variable names, so getnames=no: filename myurl url " proc import datafile=myurl dbms=dlm out=throw replace; delimiter=' '; getnames=no; There are no variable names, so SAS had to invent some: proc print; 1

2 Obs VAR1 VAR2 VAR The data values look OK, and there are correctly 24 rows. The column names are VAR1, the student IDs, VAR2, the distance thrown with a baseball, and VAR3, the distance thrown with a softball. (b) Calculate a column of differences, baseball minus softball. Solution: Remember how SAS wants you to do this: create a new data set, copy in everything from the previous one, and then create your new variable. Don t forget to use SAS s variable names: data throw2; set throw; diff=var2-var3; and for completeness check that it worked, bearing in mind that the most-recently created data set is the new one, throw2, so this will do the right thing: proc print; Obs VAR1 VAR2 VAR3 diff Page

3 which it did. (c) Make a normal quantile plot of the differences. On your plot, add a line (using a µ and σ estimated from the data). What do you conclude from the plot, and thus why would a sign test be more appropriate than a matched-pairs t-test? Solution: This kind of thing: with result proc univariate noprint; qqplot diff / normal(mu=est sigma=est); These differences are mostly normal, except for the outlier at the upper end. The outlier makes us doubt normality, which is assumed for a t-test, so a sign test would be more appropriate. (d) Think about how you would use a sign test in this matched-pairs situation. Run an appropriate sign test in SAS, bearing in mind the null and alternative hypotheses that you wish to test. What do you conclude, in the context of the data? Solution: In the matched-pairs context, our null hypothesis is that there is no difference between how far students can throw a baseball and a softball: that is, that the median difference is zero. We wanted to see whether students can throw a baseball further on average than a softball: that is, whether the median difference is greater than zero (the way around I calculated it: if you did softball minus baseball, the median difference would be less than zero). Thus the SAS code is something like this: proc univariate mu0=0; Page 3

4 var diff; This will get us, remember, a two-sided test: The UNIVARIATE Procedure Variable: diff Tests for Location: Mu0=0 Test -Statistic p Value Student's t t Pr > t Sign M 9.5 Pr >= M <.0001 Signed Rank S Pr >= S <.0001 The two-sided P-value is less than But we wanted a one-sided P-value, for testing that the median difference is greater than zero. So we ought first to check that the median difference in the sample is greater than zero, which is also on the proc univariate output: Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Note: The mode displayed is the smallest of 3 modes with a count of 3. The median difference is 5, so we are on the correct side, and our one-sided P-value is half the twosided one, less than This is definitely small enough to reject the null with, and we can conclude that students really can throw a baseball farther than a softball. For a complete answer, you need in your discussion to say that SAS s P-value is two-sided and we need a one-sided one. Simply halving the two-sided one is not the best (you really ought to convince yourself that you are on the correct side ), but is acceptable. An answer simply using SAS s P-value, even though less than is the right answer, is not the right answer for the right reason, and so is incomplete. (e) Read the same data into R. You ll need to supply some names to the columns. Solution: This kind of thing: Page 4

5 myurl=" throws=read_delim(myurl," ",col_names=c("student","baseball","softball")) ## Parsed with column specification: ## cols( ## student = col integer(), ## baseball = col integer(), ## softball = col integer() ## ) throws ## # A tibble: 24 x 3 ## student baseball softball ## <int> <int> <int> ## ## ## ## ## ## ## ## ## ## ## #... with 14 more rows This is one of those times where we have to tell R what names to give the columns. Or you can put col names=f and leave the columns called X1, X2, X3 or whatever they end up as. (f) Calculate a column of differences, baseball minus softball, in the data frame. Solution: Add it to the data frame using mutate: throws2=throws %>% mutate(diff=baseball-softball) throws2 ## # A tibble: 24 x 4 ## student baseball softball diff ## <int> <int> <int> <int> ## ## ## ## ## ## ## ## ## ## ## #... with 14 more rows (g) Carry out a sign test in R, testing the null hypothesis that the median difference is zero, against Page 5

6 the alternative that it is greater than zero. Obtain a P-value and compare it with the one you got from SAS. Your option whether you use smmr or not. Solution: I think using smmr is way easier, so I ll do that first. There is even a shortcut in that the null median defaults to zero, which is exactly what we want here: library(smmr) sign_test(throws2,diff) ## $above_below ## below above ## 2 21 ## ## $p_values ## alternative p_value ## 1 lower e-01 ## 2 upper e-05 ## 3 two-sided e-05 We want, this time, the upper-tailed one-sided test, since we want to prove that students can throw a baseball a longer distance than a softball. Thus the P-value we want is To build it yourself, you know the steps by now. First step is to count how many differences are greater and less than zero: table(throws2$diff>0) ## ## FALSE TRUE ## 3 21 or table(throws2$diff<0) ## ## FALSE TRUE ## 22 2 or, since we have things in a data frame, throws2 %>% count(diff>0) ## # A tibble: 2 x 2 ## `diff > 0` n ## <lgl> <int> ## 1 FALSE 3 ## 2 TRUE 21 or count those less than zero. I d take any of those. Note that these are not all the same. One of the differences is in fact exactly zero. The technically right thing to do with the zero difference is to throw it away (leaving 23 differences with 2 negative and 21 positive). I would take that, or 2 or 3 negative differences out of 24 (depending on whether you count greater than zero or less than zero ). We hope that this won t make a material difference to the P-value; it ll make some difference, but won t (we hope) change the conclusion about whether to reject. Page 6

7 Second step is to get a P-value for whichever one of those you got, from the appropriate binomial distribution. The P-value is the probability of getting 21 (or 22) positive differences out of 24 (or 23) or more, since this is the end of the distribution we should be at if the alternative hypothesis is correct. Thus any of these will get you a defensible P-value: sum(dbinom(21:23,23,0.5)) ## [1] e-05 sum(dbinom(22:24,24,0.5)) ## [1] e-05 sum(dbinom(21:24,24,0.5)) ## [1] sum(dbinom(0:2,23,0.5)) ## [1] e-05 sum(dbinom(0:2,24,0.5)) ## [1] e-05 sum(dbinom(0:3,24,0.5)) ## [1] The first and fourth of those are the same as smmr (throwing away the exactly-median value). SAS s P-value was less than (remember, half of the one on the output). SAS actually does something else if there are values exactly equal to the median: it counts them as half above and half below. 1 If you got the last of those P-values, you ought to remark that it s slightly greater than the one SAS produced. As we hoped, there is no material difference here: there is no doubt with any of these possibilities that we will reject a median difference of zero in favour of a median difference greater than zero. 2. Previously, you carried out a sign test to determine whether students could throw a baseball farther than a softball. This time, we will calculate a confidence interval for the median difference baseball minus softball, using the results of sign tests. (a) Read the data into R from giving appropriate names to the columns, and add a column of differences. Solution: Of course, you can copy this from my solutions, which is fine since they are already public. Any way that works is OK, including tidyverse ideas, but I did it this way, combining the reading of the data with the calculation of the differences in one pipe: Page 7

8 myurl=" throws = read_delim(myurl," ",col_names=c("student","baseball","softball")) %>% mutate(diff=baseball-softball) ## Parsed with column specification: ## cols( ## student = col integer(), ## baseball = col integer(), ## softball = col integer() ## ) throws ## # A tibble: 24 x 4 ## student baseball softball diff ## <int> <int> <int> <int> ## ## ## ## ## ## ## ## ## ## ## #... with 14 more rows (b) What function in smmr will run a two-sided sign test and return only the P-value? Check that it works by testing whether the median difference for your data is zero or different from zero. Solution: It s called pval sign. If you haven t run into it before, in R Studio click on Packages, find smmr, and click on its name. This will bring up package help, which includes a list of all the functions in the package, along with a brief description of what each one does. (Clicking on a function name brings up the help for that function.) Let s check that it works properly by repeating the previous sign test and verifying that pval sign gives the same thing: sign_test(throws,diff,0) ## $above_below ## below above ## 2 21 ## ## $p_values ## alternative p_value ## 1 lower e-01 ## 2 upper e-05 ## 3 two-sided e-05 pval_sign(0,throws,diff) ## [1] e-05 Page 8

9 The P-values are the same (for the two-sided test) and both small, so the median difference is not zero. (c) Based on your P-value, do you think 0 is inside the confidence interval or not? Explain briefly. Solution: Absolutely not. The median difference is definitely not zero, so zero cannot be in the confidence interval. Our suspicion, from the one-sided test from earlier, is that the differences were mostly positive (people could throw a baseball farther than a softball, in most cases). So the confidence interval ought to contain only positive values. I ask this because it drives what happens below. (d) Obtain a 95% confidence interval for the population median difference, baseball minus softball, using a trial-and-error procedure that determines whether a number of possible medians are inside or outside the CI. Solution: I ve given you a fair bit of freedom to tackle this as you wish. Anything that makes sense is good: whatever mixture of mindlessness, guesswork and cleverness that you want to employ. The most mindless way to try some values one at a time and see what you get, eg.: pval_sign(1,throws,diff) ## [1] pval_sign(5,throws,diff) ## [1] So median 1 is outside and median 5 is inside the 95% interval. Keep trying values until you ve figured out where the lower and upper ends of the interval are: where the P-values cross from below 0.05 to above, or vice versa. Something more intelligent is to make a long list of potential medians, and get the P-value for each of them, eg.: my.med=seq(0,20,2) pvals=map_dbl(my.med,pval_sign,throws,diff) data.frame(my.med,pvals) ## my.med pvals ## e-05 ## e-02 ## e-01 ## e-01 ## e-01 ## e-02 ## e-03 ## e-05 ## e-05 ## e-06 ## e-06 2 is just inside the interval, 8 is also inside, and 10 is outside. Some closer investigation: Page 9

10 my.med=seq(0,2,0.5) pvals=map_dbl(my.med,pval_sign,throws,diff) data.frame(my.med,pvals) ## my.med pvals ## e-05 ## e-04 ## e-03 ## e-02 ## e-02 The bottom end of the interval actually is 2, since 2 is inside and 1.5 is outside. my.med=seq(8,10,0.5) pvals=map_dbl(my.med,pval_sign,throws,diff) data.frame(my.med,pvals) ## my.med pvals ## ## ## ## ## The top end is 9, 9 being inside and 9.5 outside. Since the data values are all whole numbers, I think this is accurate enough. The most sophisticated way is the bisection idea we saw before. We already have a kickoff for this, since we found, mindlessly, that 1 is outside the interval on the low end and 5 is inside, so the lower limit has to be between 1 and 5. Let s try halfway between, ie. 3: pval_sign(3,throws,diff) ## [1] Inside, so lower limit is between 1 and 3. This can be automated, thus: lo=1 hi=3 while(abs(hi-lo)>0.1) { try=(lo+hi)/2 ptry=pval_sign(try,throws,diff) if (ptry>0.05) { hi=try } else { lo=try } } c(lo,hi) ## [1] The difficult bit is to decide whether the value try becomes the new lo or the new hi. If the P-value for the median of try is greater than 0.05, try is inside the interval, and it becomes the new hi; otherwise it s outside and becomes the new lo. Whatever the values are, lo is always outside the interval and hi is always inside, and they move closer and closer to each other. Page 10

11 At the other end of the interval, lo is inside and hi is outside, so there is a little switching around within the loop. For starting values, you can be fairly mindless: for example, we know that 5 is inside and something big like 20 must be outside: lo=5 hi=20 while(abs(hi-lo)>0.1) { try=(lo+hi)/2 ptry=pval_sign(try,throws,diff) if (ptry>0.05) { lo=try } else { hi=try } } c(lo,hi) ## [1] The interval goes from 2 to (as calculated here) just under 9. Of course, smmr is much easier: ci_median(throws,diff) ## [1] This uses the bisection method with a smaller tolerance than we did, so the answer is more accurate. It looks as if the interval goes from 2 to 9: that is, students can throw a baseball on average between 2 and 9 feet further than they can throw a softball. 3. Previously, we looked at a parking survey designed to address whether men or women were better at parallel parking. Let s revisit these data, and see what might be a better test that the two-sample t-test we did before. The data were in Read the data into R, the same way that you did it before (assuming that it worked for you then). Solution: This is an Excel spreadsheet, so you need to do something like this: Page 11

12 library(readxl) parking=read_excel("parking.xlsx",sheet=2) parking ## # A tibble: 93 x 2 ## distance gender ## <dbl> <chr> ## male ## male ## male ## male ## male ## male ## male ## male ## male ## male ## #... with 83 more rows (a) Make, or re-make, a plot that will help you assess the assumptions of the two-sample t-test. Why do you have doubts about the two-sample t-test? Solution: The no-thinking plot is to note that you have one quantitative variable distance and one categorical one gender, and so a side-by side boxplot is the way to go: ggplot(parking,aes(x=gender,y=distance))+geom_boxplot() Page 12

13 distance female gender male The assumption behind the two-sample t-test is that both groups have approximately normal distributions. I think that fails here, because both distributions have outliers, or are skewed to the right (depending on the way you look at it). Thinking further about normality, you might consider that normal quantile plots, one for each group, would be the thing. This comes out nicely in ggplot with facets, once you get your head around what you need to do: ggplot(parking,aes(sample=distance))+stat_qq()+ facet_wrap(~gender,ncol=1) Page 13

14 50 female sample male theoretical First we make a plot that produces the right kind of thing (stat qq requires a sample of data) but for all the data together, and then, at the end, we produce a separate plot for each gender. I added one extra thing here: the default layout for two plots is to put them left and right, which makes them look tall and skinny (and hard to interpret), so I d rather put them above and below. One way to arrange this is with facet grid; this way is another, arranging all the subplots in an array with one column. So what are those plots showing us? For the males (at the bottom), I think the principal feature is the outlier at the top end; the other points are more or less straight. For the females (at the top), there is more of a curve, with values bunched up at the bottom and spread out at the top: skewed to the right. Or, you might say, the lowest values, the ones below 1 on the x-axis, are too bunched up, but the other values are more or less straight. Your call. Actually, bunching up at the bottom is not really indicating a problematic departure from normality (the Central Limit Theorem works just fine with short tails); it s long tails or outliers that really cause problems. Page 14

15 One of the classic situations that causes skewness is when there is a lower limit (that the data come close to). In this case, the variable is distance (from the curb), which cannot be less than zero. Most of the drivers parked their car pretty close to the curb (so there were a lot of values close to zero), but a few drivers were a long way away (a few very big positive values). This is exactly the kind of situation where you get skewness, and so it is no surprise that we saw what we did. Most of the cases of skewness that you see in practice can be traced back to data being close to a limit at one end. The classic case of left-skewness is an easy exam: most students get close to 100% (upper limit), while a few get a fair bit less. (b) Test for a difference between the median parking distances between males and females, using Mood s median test. Build this yourself in R, as in the lecture. What do you conclude? Solution: First, work out the overall median of all the distances, regardless of gender: parking %>% summarize(med=median(distance)) ## # A tibble: 1 x 1 ## med ## <dbl> ## 1 9 The overall median is 9. Count up how many distances of each gender were above or below the overall median. tab=with(parking,table(gender,distance<9)) tab ## ## gender FALSE TRUE ## female ## male For example, 19 of the male drivers had a distance (strictly) less than 9. Both genders are pretty close to above and below the overall median, which suggests that the males and females have about the same median. Strictly, I m supposed to throw away any values that are exactly equal to the overall median. Are there any here? any(parking$distance==9) ## [1] TRUE There are. I ll come back to that later, but for now, we ll go with the table we have. Is there an association between gender and being above or below the overall median? That s a chi-squared test for independence: chisq.test(tab,correct=f) ## ## Pearson's Chi-squared test ## ## data: tab ## X-squared = , df = 1, p-value = Page 15

16 This is even less significant (P-value ) than the two-sample t-test we did before, and so is consistent with our conclusion from before that there is actually no difference between males and females in terms of average parking distance. The Mood s median test is believable because it is not affected by outliers or distribution shape. My package smmr does Mood s median test as well as the sign test. (I made up the name as sign and Mood median test in R.) The function median test takes three things: a data frame, a column of values and a column of group memberships (both unquoted), which is exactly what we have: library(smmr) median_test(parking,distance,gender) ## $table ## above ## group above below ## female ## male ## ## $test ## what value ## 1 statistic ## 2 df ## 3 P-value This has a 0-variant that you can use if you can t get this to work. For this, you need to specify two columns: the measurements and the groups. You can specify the data frame twice, or you can use with, like this: with(parking,median_test0(distance,gender)) ## $table ## below ## group FALSE TRUE ## female ## male ## ## $test ## what value ## 1 statistic ## 2 df ## 3 P-value which gives identical results. The numbers in the tables here are a bit different from what we had before. This is because I wrote the function first to get rid of any data values exactly equal to the median (and we previously determined that there are some). We can get a more detailed look this way: Page 16

17 parking %>% filter(distance==9) ## # A tibble: 6 x 2 ## distance gender ## <dbl> <chr> ## 1 9 male ## 2 9 male ## 3 9 female ## 4 9 female ## 5 9 female ## 6 9 female This shows all of the people whose parking distance was exactly 9 (the overall median). There are six of them, two males and four females. In my first attempt, these got counted as FALSE ( not strictly less than 9 ), but now they are thrown away: not counted at all. Check that the female-false frequency has decreased by 4, and the male-false frequency has decreased by 2, with the other two being the same. So the P-value from median test is a bit smaller than before, because it now looks as if slightly more females were below the median distance and slightly more males were above it. Anyway, the P-value is still nowhere near significance, so we have no evidence of a difference in median parking distance between males and females. The kind of deviation from a completely even split is exactly the kind of thing that could have happened by chance. (c) Now we ll repeat the same test in SAS (which has it built in). First read the data into SAS and summarize the values. Solution: This is completely copied from what I did before: 2 proc import datafile='/home/ken/parking.xlsx' dbms=xlsx out=mydata replace; sheet=sheet2; getnames=yes; proc means; var distance; class gender; The MEANS Procedure Analysis Variable : distance distance N gender Obs N Mean Std Dev Minimum Maximum female male Page 17

18 The same number of males and females that we had before, and a slightly smaller mean for the females. Or, find the median and quartiles and compare with the boxplots: proc means q1 median q3; var distance; class gender; The MEANS Procedure Analysis Variable : distance distance N Lower Upper gender Obs Quartile Median Quartile female male Bearing in mind that the SAS and R definitions of quartiles do differ, so you may not get exactly the same thing, these appear to be the same as the boxplots. (d) Run Mood s median test. What do you conclude here, and do you get the same result as R (either the way you did it or the way smmr does it)? Solution: This is proc npar1way with option median (not mood!): proc npar1way median; var distance; class gender; The NPAR1WAY Procedure Median Scores (Number of Points Above Median) for Variable distance Classified by Variable gender Sum of Expected Std Dev Mean gender N Scores Under H0 Under H0 Score male female Average scores were used for ties. Median Two-Sample Test Statistic Z One-Sided Pr > Z Two-Sided Pr > Z Median One-Way Analysis Chi-Square DF 1 Pr > Chi-Square Page 18

19 This gives the same conclusion as before (no difference between the medians for males and females), but a different P-value (look in the Median One-way Analysis at the end of the output). I think the difference is yet another way of handling those observations that are exactly equal to 9. If you go back up to the table of median scores at the top of the output, the Sum of Scores column is the key. If there are no observations exactly equal to the overall median, this will be the numbers in our FALSE columns above: the number of values above the overall median. If there are values equal to the overall median, something else happens. In this case, there are 93 data values altogether. 43 of them are strictly less than the median, 44 are strictly greater and the other 6 are exactly equal to the median. If those values exactly equal to the median were in fact different from each other, they would have ranks 44, 45, from the bottom. The median would have rank (93 + 1)/2 = 47, so the first four of these are less than or equal to the median, and the last two are strictly greater. Now, we have two groups, so if those observations had actually been different from each other, we don t know which ones of them would have been greater than the median and which. So we pretend that 2/6 = 1/3 of them were greater than the median in each group. There were two male observations equal to 9, so SAS pretends that 2(1/3) = 2/3 = 0.67 of them were greater than equal to 9, giving a total of = There were four female observations equal to 9, and 19 strictly greater, giving a total of (1/3) = Those match the sums of scores in the output. Notes 1 There are extra complications with more than one exactly-equal, which we ll see with Mood s median test later. 2 This is the value of keeping all your work and being able to find it later. Page 19

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

Assignment 0. Nothing here to hand in

Assignment 0. Nothing here to hand in Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very

More information

Week 7: The normal distribution and sample means

Week 7: The normal distribution and sample means Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample

More information

University of Toronto Scarborough Department of Computer and Mathematical Sciences STAC32 (K. Butler), Midterm Exam October 24, 2016

University of Toronto Scarborough Department of Computer and Mathematical Sciences STAC32 (K. Butler), Midterm Exam October 24, 2016 University of Toronto Scarborough Department of Computer and Mathematical Sciences STAC32 (K. Butler), Midterm Exam October 24, 2016 Aids allowed: - My lecture slides - Any notes that you have taken in

More information

appstats6.notebook September 27, 2016

appstats6.notebook September 27, 2016 Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using

More information

Notes on Simulations in SAS Studio

Notes on Simulations in SAS Studio Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

Assignment 8. Due Thursday November 16 at 11:59pm on Blackboard

Assignment 8. Due Thursday November 16 at 11:59pm on Blackboard Assignment 8 Due Thursday November 16 at 11:59pm on Blackboard As before, the questions without solutions are an assignment: you need to do these questions yourself and hand them in (instructions below).

More information

Measures of Dispersion

Measures of Dispersion Measures of Dispersion 6-3 I Will... Find measures of dispersion of sets of data. Find standard deviation and analyze normal distribution. Day 1: Dispersion Vocabulary Measures of Variation (Dispersion

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

Introduction to Minitab 1

Introduction to Minitab 1 Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,

More information

Chapters 5-6: Statistical Inference Methods

Chapters 5-6: Statistical Inference Methods Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past

More information

Lecture 6: Chapter 6 Summary

Lecture 6: Chapter 6 Summary 1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z

More information

1. The Normal Distribution, continued

1. The Normal Distribution, continued Math 1125-Introductory Statistics Lecture 16 10/9/06 1. The Normal Distribution, continued Recall that the standard normal distribution is symmetric about z = 0, so the area to the right of zero is 0.5000.

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

Intro. Scheme Basics. scm> 5 5. scm>

Intro. Scheme Basics. scm> 5 5. scm> Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if

More information

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

1 Overview of Statistics; Essential Vocabulary

1 Overview of Statistics; Essential Vocabulary 1 Overview of Statistics; Essential Vocabulary Statistics: the science of collecting, organizing, analyzing, and interpreting data in order to make decisions Population and sample Population: the entire

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

STAT:5400 Computing in Statistics

STAT:5400 Computing in Statistics STAT:5400 Computing in Statistics Introduction to SAS Lecture 18 Oct 12, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowaedu SAS SAS is the statistical software package most commonly used in business,

More information

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 KEY SKILLS: Organize a data set into a frequency distribution. Construct a histogram to summarize a data set. Compute the percentile for a particular

More information

Macros and ODS. SAS Programming November 6, / 89

Macros and ODS. SAS Programming November 6, / 89 Macros and ODS The first part of these slides overlaps with last week a fair bit, but it doesn t hurt to review as this code might be a little harder to follow. SAS Programming November 6, 2014 1 / 89

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

UNIT 1A EXPLORING UNIVARIATE DATA

UNIT 1A EXPLORING UNIVARIATE DATA A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

Chapter 5: The beast of bias

Chapter 5: The beast of bias Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared

More information

No. of blue jelly beans No. of bags

No. of blue jelly beans No. of bags Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in

More information

An Introduction to Minitab Statistics 529

An Introduction to Minitab Statistics 529 An Introduction to Minitab Statistics 529 1 Introduction MINITAB is a computing package for performing simple statistical analyses. The current version on the PC is 15. MINITAB is no longer made for the

More information

Homework 1 Excel Basics

Homework 1 Excel Basics Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the

More information

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies. Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.

More information

Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots

Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots STAT 350 (Spring 2015) Lab 3: SAS Solutions 1 Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots Note: The data sets are not included in the solutions;

More information

MITOCW ocw f99-lec07_300k

MITOCW ocw f99-lec07_300k MITOCW ocw-18.06-f99-lec07_300k OK, here's linear algebra lecture seven. I've been talking about vector spaces and specially the null space of a matrix and the column space of a matrix. What's in those

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use? Chapter 4 Analyzing Skewed Quantitative Data Introduction: In chapter 3, we focused on analyzing bell shaped (normal) data, but many data sets are not bell shaped. How do we analyze quantitative data when

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

AP Statistics Summer Assignment:

AP Statistics Summer Assignment: AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this

More information

Table Of Contents. Table Of Contents

Table Of Contents. Table Of Contents Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store

More information

Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida

Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida FINAL REPORT Submitted October 2004 Prepared by: Daniel Gann Geographic Information

More information

STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I. 4 th Nine Weeks,

STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I. 4 th Nine Weeks, STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I 4 th Nine Weeks, 2016-2017 1 OVERVIEW Algebra I Content Review Notes are designed by the High School Mathematics Steering Committee as a resource for

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

L E A R N I N G O B JE C T I V E S

L E A R N I N G O B JE C T I V E S 2.2 Measures of Central Location L E A R N I N G O B JE C T I V E S 1. To learn the concept of the center of a data set. 2. To learn the meaning of each of three measures of the center of a data set the

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Chapter 2: The Normal Distributions

Chapter 2: The Normal Distributions Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and

More information

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS. 1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts

More information

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis. 1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

AP Statistics Prerequisite Packet

AP Statistics Prerequisite Packet Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these

More information

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software. Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Data organization. So what kind of data did we collect?

Data organization. So what kind of data did we collect? Data organization Suppose we go out and collect some data. What do we do with it? First we need to figure out what kind of data we have. To illustrate, let s do a simple experiment and collect the height

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week 02 Module 06 Lecture - 14 Merge Sort: Analysis So, we have seen how to use a divide and conquer strategy, we

More information

Chapter 12: Statistics

Chapter 12: Statistics Chapter 12: Statistics Once you have imported your data or created a geospatial model, you may wish to calculate some simple statistics, run some simple tests, or see some traditional plots. On the main

More information

Practical 2: Using Minitab (not assessed, for practice only!)

Practical 2: Using Minitab (not assessed, for practice only!) Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 1.3 Homework Answers Assignment 5 1.80 If you ask a computer to generate "random numbers between 0 and 1, you uniform will

More information

Minitab Guide for MA330

Minitab Guide for MA330 Minitab Guide for MA330 The purpose of this guide is to show you how to use the Minitab statistical software to carry out the statistical procedures discussed in your textbook. The examples usually are

More information

CHAPTER 6. The Normal Probability Distribution

CHAPTER 6. The Normal Probability Distribution The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED LESSON 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the

More information

Chapter 5: The standard deviation as a ruler and the normal model p131

Chapter 5: The standard deviation as a ruler and the normal model p131 Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is

More information

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). Suppose you have a one way design, and want to do an ANOVA, but discover that your data are seriously not normal? Just

More information

Distributions of Continuous Data

Distributions of Continuous Data C H A P T ER Distributions of Continuous Data New cars and trucks sold in the United States average about 28 highway miles per gallon (mpg) in 2010, up from about 24 mpg in 2004. Some of the improvement

More information

Lecture Notes 3: Data summarization

Lecture Notes 3: Data summarization Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &

More information

Assignment 3 due Thursday Oct. 11

Assignment 3 due Thursday Oct. 11 Instructor Linda C. Stephenson due Thursday Oct. 11 GENERAL NOTE: These assignments often build on each other what you learn in one assignment may be carried over to subsequent assignments. If I have already

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again

More information

Unit 5: Estimating with Confidence

Unit 5: Estimating with Confidence Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating

More information

SPSS. (Statistical Packages for the Social Sciences)

SPSS. (Statistical Packages for the Social Sciences) Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.

More information

A Quick Introduction to R

A Quick Introduction to R Math 4501 Fall 2012 A Quick Introduction to R The point of these few pages is to give you a quick introduction to the possible uses of the free software R in statistical analysis. I will only expect you

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

MAT 102 Introduction to Statistics Chapter 6. Chapter 6 Continuous Probability Distributions and the Normal Distribution

MAT 102 Introduction to Statistics Chapter 6. Chapter 6 Continuous Probability Distributions and the Normal Distribution MAT 102 Introduction to Statistics Chapter 6 Chapter 6 Continuous Probability Distributions and the Normal Distribution 6.2 Continuous Probability Distributions Characteristics of a Continuous Probability

More information

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007 What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this

More information

EXST SAS Lab Lab #8: More data step and t-tests

EXST SAS Lab Lab #8: More data step and t-tests EXST SAS Lab Lab #8: More data step and t-tests Objectives 1. Input a text file in column input 2. Output two data files from a single input 3. Modify datasets with a KEEP statement or option 4. Prepare

More information

Section 6.3: Measures of Position

Section 6.3: Measures of Position Section 6.3: Measures of Position Measures of position are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

University of Toronto Scarborough Department of Computer and Mathematical Sciences STAC32 (K. Butler), Final Exam December 7, :00-12:00

University of Toronto Scarborough Department of Computer and Mathematical Sciences STAC32 (K. Butler), Final Exam December 7, :00-12:00 University of Toronto Scarborough Department of Computer and Mathematical Sciences STAC32 (K. Butler), Final Exam December 7, 2017 9:00-12:00 IT IS ASSUMED THAT YOU HAVE READ THE BOX BELOW. Aids allowed:

More information