Solution to Tumor growth in mice

Size: px
Start display at page:

Download "Solution to Tumor growth in mice"

Transcription

1 Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly the file is stored on your computer. One possible way of doing this is to set the path for the working directory. I like doing so because it allows me control also where R saves my working history and output. I works like this (on my computer, you should of course set your own path): setwd("c:/users/nxr382/documents/teaching/basicstatistics/2016/data") tumordat <- read.csv2('tumorvols.csv', header=true) Another possiblity is: # tumordat <- read.csv2(file.choose()) which opens a file browser so that you can click your way to the datafile. In either case, the dataset tumordat should appear on the list in the Work Environment. If you click on the table-icon following the description of the data you can see what is inside the spreadsheet. As appears records have been made for each mouse at each day of follow. We have information on tumor volume, treatment group, and on sacrifice. Note that after a mouse has been sacrificed is recorded as dead and the variable volume has a missing value; called NA " in R. 2. Get a quick overview of the data To find out how large the data is (i.e. how many records it contains) and which variables it contains I use the functions dim and names, and summary : dim(tumordat) ## [1] names(tumordat) ## [1] "day" "mouseid" "volume" "treatment" "dead" ## [6] "sacrificed" From this we see that the dataset contains 442 records and six variables (day, mouseid, volume, treatment, dead, and sacrificed). 1

2 The function summary gives a little bit more information about the contents of the data: summary(tumordat) ## day mouseid volume treatment dead ## Min. : 1 Min. :21.0 Min. : 17.1 chemo:136 Min. : ## 1st Qu.:11 1st Qu.:27.0 1st Qu.: contr:170 1st Qu.: ## Median :20 Median :35.5 Median : radio:136 Median : ## Mean :20 Mean :39.5 Mean : Mean : ## 3rd Qu.:29 3rd Qu.:54.0 3rd Qu.: rd Qu.: ## Max. :39 Max. :60.0 Max. : Max. : ## NA's :226 ## sacrificed ## Min. : ## 1st Qu.: ## Median : ## Mean : ## 3rd Qu.: ## Max. : ## We can see that follow-up lasted for 39 days and mice are labeled with id-numbers from 21 to 60. There are more records from the control group reflecting that this group contained 10 mice while the active treatment groups had only 8 mice each. A little bit of basic calculus reveals that there must be 17 records for each mouse. The variables dead and sacrificed are binary. It is clear to us that 0 and 1 should be interpreted as no and yes, but R take all numbers literally and report summary statistics that are apropriate for numerical variables. The only variable that is recognized as nominal is treatment since this variable has text values. Further note that in the summary the treatment groups are listed in alphabetic order. By default R always use the group that is first in alphabetic order as reference point in figures, tables, and statistical analyses. We can tell R to another group as referece by using the relevel function as follows: tumordat$treatment <- relevel(tumordat$treatment, ref='contr') summary(tumordat['treatment']) ## treatment ## contr:170 ## chemo:136 ## radio:136 Now the control-group is reference and appears first in the summary. 3. Extract the baseline data In what follows we will only be concerned with the baseline data, that is the records from the first day of the trial. We pick out the relevant records with the subset function and save them in a new dataframe called day1: day1 <- subset(tumordat, day==1) # Don't forget the two "="s here! 2

3 We run a quick summary on the new dataset to confirm that it only contains records from the first day. summary(day1['day']) ## day ## Min. :1 ## 1st Qu.:1 ## Median :1 ## Mean :1 ## 3rd Qu.:1 ## Max. :1 4. Tabulate the distribution on treatment groups. We can use the table function to count the number of mice in each treatment group. Just for the exercise we also make a barplot to display the counts. treat.table <- table(day1$treatment) print(treat.table) ## ## contr chemo radio ## barplot(treat.table) contr chemo radio 3

4 5. View the distribution of the tumor volumes. To get an impression of the distribution of the tumor volumes we first make a histogram: hist(day1$volume, probability = TRUE) Histogram of day1$volume Density day1$volume The argument probability=true standardized the area of the histogram to one which makes the histogram comparabel to theoretical probabilty distributions. If omitted R would show the number of observations in each interval on the y-axis instead. It is possible to specify more arguments to the hist function, e.g. if you want other breakpoints for the boxes than Rs defaults. Check out hist in the R help for other options that will change the appearance of your histogram. As summary statistics we compute the mean and the standard deviation. mean(day1$volume) ## [1] sd(day1$volume) ## [1] BUT: Are these apropriate summary statistics? The histogram is obviously skewed to the left. It doesn t look like a normaldistribution. We compute the normalrange (mean +/- 2*sd), recalling that in a normaldistribution 95% of the distribution falls within this range while 2.5% should be below this range and 2.5% above. 4

5 c(mean(day1$volume)-2*sd(day1$volume),mean(day1$volume)+2*sd(day1$volume)) ## [1] According to this a tumor volume of -100 mmˆ3 falls within the normalrange! 6. Do logarithmic transformation of skew data To obtain a normally distributed outcome we can try to tranform the tumorvolumes with the logarithm. I use the natural logarithm ( log in R), but other logarithms as log2 or log10 would serve just as well as they are all proportional. We add the variable logvol to the data using the transform function: day1 <- transform(day1, logvol=log(volume)) Let have a look at the histogram: hist(day1$logvol, probability = TRUE) Histogram of day1$logvol Density day1$logvol The histogram of the log-volumes is not skew as that of the raw data. It doesn t look exactly as a normal curve either but this looks more like a small sample variation than a systematic deviation. 5

6 We compute the normal range: c(mean(day1$logvol)-2*sd(day1$logvol), mean(day1$logvol)+2*sd(day1$logvol)) ## [1] The limits of the estimated normal range are more reasonable (close to the range of the data). 7. Use QQplots to check for normality. To compare data with a normaldistribution We could overlay the histograms with normal curves. However, this it not what I would recommend because there is an arbitrariness in the histogram due to the choice of breakpoints. QQplots (which compare the ordered data points to the corresponding quantiles of the normaldistribution) is a better option. par(mfrow=c(1,2)) # plot the figures side by side qqnorm(scale(day1$volume), xlim=c(-2.2,2.2), ylim=c(-2.2,2.2)) abline(0,1) qqnorm(scale(day1$logvol), xlim=c(-2.2,2.2), ylim=c(-2.2,2.2)) abline(0,1) Normal Q Q Plot Normal Q Q Plot Sample Quantiles Sample Quantiles Theoretical Quantiles Theoretical Quantiles I ve chosen to standardize data because then, if data is normally distributed, the points in the qqplot should be on the straight line with intercept 0 and slope 1. 6

7 We see that the qqplot of the log-transformed data do not deviate systematically from the straight line. The qqplot of the raw data is smiling which is an indication of skewness. 8. Compare the treatment groups in a boxplot. To compare the distribution of the tumor volumes between the three treatment groups we make side by side boxplots. boxplot(day1$volume~day1$treatment) # The tilde "~" means "depending on" in R contr chemo radio The boxes represent the inter quartile range (25% quantile to 75% quantile) with the median shown as the thick line. In the control group whiskers are drawn at the minimum and maximum value. In the active treatment groups the maximum value is marked out as an outlier and the upper whisker is drawn at the second largest tumor volume. This is due to the rule that whiskers cannot exceed 1.5 times the length of the box. If data has a normal distribution poits that exceed this limit are rare. The reason for pointing out outliers is firstly that they might be registration errors which should be corrected and secondly that they may have an unnproportionally high infuence on the results of the statistical analysis so that sensitivity analyses or robust statistics should be considered. Considering the comparison of the treatment groups: The median tumor volume appears to be larger in the control groups. However, this has to be a spurious finding since treatment was randomized and baseline volumes was measured just before treatment was initiated! What we see in the picture is pure random variation. We have three random samples from the exact same populaiton. 7

8 9. Display small samples in a stripchart. The sample sizes of the three groups are all rather small, so we can expect a good deal of random variation in the boxplot. Note that a quarter of the data in the active treatment groups amount to only two observations! An alternative display of tiny datasets is the stripchart which shows the individual data points in each group. In R stripcharts can be made with the stripchart function. Here I have added two optional arguments to the function, vertical=true impies that the strips are displayed vertically with groups on the x-axis, ylab= Tumor volumes (mm3) changes the label on the y-axis. stripchart(day1$volume~day1$treatment, vertical=true, ylab='tumor volume (mm3)') Tumor volume (mm3) contr chemo radio Additional graphical arguments could be supplied if you want a nicer looking figure for presentation. Check out stripchart in R help or par if you want the full list of all of R s graphichal parameters. 8

9 Exercise 2 1. Survivors at end of follow-up We can tabulate the day variable to see on which days of follow up tumor volumes were (last) recorded table(tumordat$day) ## ## ## We see that the last day of follow up was day 39. Lets pick out the data from this day to see how many mice survived throughout the trial: day39 <- subset(tumordat, day==39) table(day39$dead) ## ## 0 1 ## 2 24 Only two mice were survived throughout the trial. We can identify them as: subset(day39, dead==0) ## day mouseid volume treatment dead sacrificed ## radio 0 1 ## chemo 0 1 Mouse number 35 has a tumor volume close to 1000 mm 3 so it would most likely have been sacrificed on the next day had the trial continued. Mouse number 23 on the other hand has a very small tumor. Let s pick out all the records on the mouse and plot its growth curve to see what has happened: m23 <- subset(tumordat, mouseid==23) plot(m23$day, m23$volume, type='b') 9

10 m23$volume m23$day Either this mouse has a very slowly growing tumor or maybe the tumor cells didn t grow in the first place so that all that is measured is the thickness of its skin. 2. Comparison of suvival times between the groups Next we pick out the data from when the mice were sacrificed. This will tell us how long the mice survived in the trial. sacrificed <- subset(tumordat, sacrificed==1) summary(sacrificed) ## day mouseid volume treatment dead ## Min. : 4.0 Min. :21.00 Min. : 52.3 contr:10 Min. :0 ## 1st Qu.: 6.5 1st Qu.: st Qu.: chemo: 8 1st Qu.:0 ## Median :19.0 Median :35.50 Median : radio: 8 Median :0 ## Mean :18.5 Mean :39.50 Mean : Mean :0 ## 3rd Qu.:27.0 3rd Qu.: rd Qu.: rd Qu.:0 ## Max. :39.0 Max. :60.00 Max. : Max. :0 ## sacrificed ## Min. :1 ## 1st Qu.:1 ## Median :1 ## Mean :1 ## 3rd Qu.:1 ## Max. :1 10

11 It seems that mice 35 and 23 were sacrificed due to end of study and not because their tumor volumes exceeded the ethical limit of 1000 mm 3. Let s look at data from all the mice who had smaller tumor volumes at the time of sacrfice. subset(sacrificed, volume<1000) ## day mouseid volume treatment dead sacrificed ## radio 0 1 ## radio 0 1 ## radio 0 1 ## radio 0 1 ## chemo 0 1 ## chemo 0 1 ## chemo 0 1 ## chemo 0 1 ## chemo 0 1 ## contr 0 1 ## contr 0 1 There is quite a few mice that were sacrificed before their tumor reached the critical limit. We learn from the investigator that in practice mice have been sacrificed already when the tumor volume reached as size of approximately 900 mm 3. This was considered morst ethical since mice are only followed up on Mondays, Wednesdays, and Fridays. Lowering the limit to 875 mm 3 leaves us with the following mice: subset(sacrificed, volume<875) ## day mouseid volume treatment dead sacrificed ## radio 0 1 ## chemo 0 1 ## contr 0 1 Besides mouse number 23 that was killed at end of study, we see that mouse number 34 and mouse number 60 were sacrificed before their tumors reached the critical size (both had wounds and were mistriving). We are now confident that we have the correct survival data and that sacrifice is predominately due to progression in tumor growth (we have one censoring at end of study and two deaths due to other causes). Strictly speaking the survival time is the day of sacrifice minus one, so we need to add this variable to the data before we do descriptive statistics. For completeness we also add a censoring variable sacrificed <- transform(sacrificed, time=day-1, cens=(day==39)&(volume>875)) summary(sacrificed[c('day','time','cens')]) ## day time cens ## Min. : 4.0 Min. : 3.0 Mode :logical ## 1st Qu.: 6.5 1st Qu.: 5.5 FALSE:25 ## Median :19.0 Median :18.0 TRUE :1 ## Mean :18.5 Mean :17.5 NA's :0 ## 3rd Qu.:27.0 3rd Qu.:26.0 ## Max. :39.0 Max. :38.0 At last we can compare survival in the three groups. This would usually be done in a Kaplan-Meier plot, but since follow up is the same for all mice and there is only one censoring making a boxplot of the time variable is ok. 11

12 boxplot(sacrificed$time~sacrificed$treatment) contr chemo radio As appears survival is better in the active treatment groups than in the control group. Also the mice who got chemo therapy has survived somewhat longer than those who got radio therapy. Whether these are significant findings we cannot say. We would have to conduct a formal statitical analysis. 12

Understanding and Comparing Distributions. Chapter 4

Understanding and Comparing Distributions. Chapter 4 Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables

More information

IT 403 Practice Problems (1-2) Answers

IT 403 Practice Problems (1-2) Answers IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles

More information

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Stat 290: Lab 2. Introduction to R/S-Plus

Stat 290: Lab 2. Introduction to R/S-Plus Stat 290: Lab 2 Introduction to R/S-Plus Lab Objectives 1. To introduce basic R/S commands 2. Exploratory Data Tools Assignment Work through the example on your own and fill in numerical answers and graphs.

More information

Week 7: The normal distribution and sample means

Week 7: The normal distribution and sample means Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample

More information

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

plots Chris Parrish August 20, 2015

plots Chris Parrish August 20, 2015 plots Chris Parrish August 20, 2015 plots We construct some of the most commonly used types of plots for numerical data. dotplot A stripchart is most suitable for displaying small data sets. data

More information

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis. 1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram

More information

Chapter 2: The Normal Distributions

Chapter 2: The Normal Distributions Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

3.3 The Five-Number Summary Boxplots

3.3 The Five-Number Summary Boxplots 3.3 The Five-Number Summary Boxplots Tom Lewis Fall Term 2009 Tom Lewis () 3.3 The Five-Number Summary Boxplots Fall Term 2009 1 / 9 Outline 1 Quartiles 2 Terminology Tom Lewis () 3.3 The Five-Number Summary

More information

Week 4: Describing data and estimation

Week 4: Describing data and estimation Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

You will begin by exploring the locations of the long term care facilities in Massachusetts using descriptive statistics.

You will begin by exploring the locations of the long term care facilities in Massachusetts using descriptive statistics. Getting Started 1. Create a folder on the desktop and call it your last name. 2. Copy and paste the data you will need to your folder from the folder specified by the instructor. Exercise 1: Explore the

More information

Exploring and Understanding Data Using R.

Exploring and Understanding Data Using R. Exploring and Understanding Data Using R. Loading the data into an R data frame: variable

More information

Statistical Graphics

Statistical Graphics Idea: Instant impression Statistical Graphics Bad graphics abound: From newspapers, magazines, Excel defaults, other software. 1 Color helpful: if used effectively. Avoid "chartjunk." Keep level/interests

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 Sections 2.3 and 2.4 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) 2 / 25 Descriptive statistics For continuous

More information

Practical 2: Using Minitab (not assessed, for practice only!)

Practical 2: Using Minitab (not assessed, for practice only!) Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need

More information

Data Management Project Using Software to Carry Out Data Analysis Tasks

Data Management Project Using Software to Carry Out Data Analysis Tasks Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Practical 2: Plotting

Practical 2: Plotting Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

Your Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread.

Your Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread. Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #3 Interpreting the Standard Deviation and Exploring Transformations Objectives: 1. To review stem-and-leaf plots and their

More information

Lecture 6: Chapter 6 Summary

Lecture 6: Chapter 6 Summary 1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z

More information

More Numerical and Graphical Summaries using Percentiles. David Gerard

More Numerical and Graphical Summaries using Percentiles. David Gerard More Numerical and Graphical Summaries using Percentiles David Gerard 2017-09-18 1 Learning Objectives Percentiles Five Number Summary Boxplots to compare distributions. Sections 1.6.5 and 1.6.6 in DBC.

More information

Box Plots. OpenStax College

Box Plots. OpenStax College Connexions module: m46920 1 Box Plots OpenStax College This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License 3.0 Box plots (also called box-and-whisker

More information

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT NAVAL POSTGRADUATE SCHOOL LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT Statistics (OA3102) Lab #2: Sampling, Sampling Distributions, and the Central Limit Theorem Goal: Use R to demonstrate sampling

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. 3 confint A metafor package function that gives you the confidence intervals of effect sizes.

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Weibull Reliability Analyses

Weibull Reliability Analyses Visual-XSel Software-Guide for Weibull The Weibull analysis shows the failure frequencies or the unreliability of parts and components in the Weibull-net and interprets them. Basics and more details can

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

Introduction to R Commander

Introduction to R Commander Introduction to R Commander 1. Get R and Rcmdr to run 2. Familiarize yourself with Rcmdr 3. Look over Rcmdr metadata (Fox, 2005) 4. Start doing stats / plots with Rcmdr Tasks 1. Clear Workspace and History.

More information

Lab 5 - Risk Analysis, Robustness, and Power

Lab 5 - Risk Analysis, Robustness, and Power Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors

More information

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Creating a Box-and-Whisker Graph in Excel: Step One: Step Two:

Creating a Box-and-Whisker Graph in Excel: Step One: Step Two: Creating a Box-and-Whisker Graph in Excel: It s not as simple as selecting Box and Whisker from the Chart Wizard. But if you ve made a few graphs in Excel before, it s not that complicated to convince

More information

Statistics 251: Statistical Methods

Statistics 251: Statistical Methods Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Assignments. Math 338 Lab 1: Introduction to R. Atoms, Vectors and Matrices

Assignments. Math 338 Lab 1: Introduction to R. Atoms, Vectors and Matrices Assignments Math 338 Lab 1: Introduction to R. Generally speaking, there are three basic forms of assigning data. Case one is the single atom or a single number. Assigning a number to an object in this

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

An Experiment in Visual Clustering Using Star Glyph Displays

An Experiment in Visual Clustering Using Star Glyph Displays An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

Basics of Plotting Data

Basics of Plotting Data Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S0 SPSS Intro November 2013 Wilma Heemsbergen w.heemsbergen@nki.nl 1 13.00 ~ 15.30 Database (20 min) SPSS (40 min) Short break Exercise (60 min) This Afternoon During the

More information

Chapter 3 Understanding and Comparing Distributions

Chapter 3 Understanding and Comparing Distributions Chapter 3 Understanding and Comparing Distributions In this chapter, we will meet a new statistics plot based on numerical summaries, a plot to track the changes in a data set through time, and ways to

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

Az R adatelemzési nyelv

Az R adatelemzési nyelv Az R adatelemzési nyelv alapjai II. Egészségügyi informatika és biostatisztika Gézsi András gezsi@mit.bme.hu Functions Functions Functions do things with data Input : function arguments (0,1,2, ) Output

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

Chapter 5: The standard deviation as a ruler and the normal model p131

Chapter 5: The standard deviation as a ruler and the normal model p131 Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S0 SPSS Intro December 2014 Wilma Heemsbergen w.heemsbergen@nki.nl This Afternoon 13.00 ~ 15.00 SPSS lecture Short break Exercise 2 Database Example 3 Types of data Type

More information

Using Large Data Sets Workbook Version A (MEI)

Using Large Data Sets Workbook Version A (MEI) Using Large Data Sets Workbook Version A (MEI) 1 Index Key Skills Page 3 Becoming familiar with the dataset Page 3 Sorting and filtering the dataset Page 4 Producing a table of summary statistics with

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS. 1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

STA Module 4 The Normal Distribution

STA Module 4 The Normal Distribution STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 KEY SKILLS: Organize a data set into a frequency distribution. Construct a histogram to summarize a data set. Compute the percentile for a particular

More information

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Chapter 5snow year.notebook March 15, 2018

Chapter 5snow year.notebook March 15, 2018 Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

STAT:5400 Computing in Statistics

STAT:5400 Computing in Statistics STAT:5400 Computing in Statistics Introduction to SAS Lecture 18 Oct 12, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowaedu SAS SAS is the statistical software package most commonly used in business,

More information

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata Paul Dickman September 2003 1 A brief introduction to Stata Starting the Stata program

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

DAY 52 BOX-AND-WHISKER

DAY 52 BOX-AND-WHISKER DAY 52 BOX-AND-WHISKER VOCABULARY The Median is the middle number of a set of data when the numbers are arranged in numerical order. The Range of a set of data is the difference between the highest and

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies. Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

An introduction to WS 2015/2016

An introduction to WS 2015/2016 An introduction to WS 2015/2016 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Prof. Dr. Martin Hutzenthaler (previously AG Metzler, now University of Duisburg-Essen) course

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

Lab 1: Introduction, Plotting, Data manipulation

Lab 1: Introduction, Plotting, Data manipulation Linear Statistical Models, R-tutorial Fall 2009 Lab 1: Introduction, Plotting, Data manipulation If you have never used Splus or R before, check out these texts and help pages; http://cran.r-project.org/doc/manuals/r-intro.html,

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data EXERCISE Using Excel for Graphical Analysis of Data Introduction In several upcoming experiments, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Boxplot

Boxplot Boxplot By: Meaghan Petix, Samia Porto & Franco Porto A boxplot is a convenient way of graphically depicting groups of numerical data through their five number summaries: the smallest observation (sample

More information

1.3 Graphical Summaries of Data

1.3 Graphical Summaries of Data Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this

More information

CHAPTER 6. The Normal Probability Distribution

CHAPTER 6. The Normal Probability Distribution The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit

More information

Minitab Notes for Activity 1

Minitab Notes for Activity 1 Minitab Notes for Activity 1 Creating the Worksheet 1. Label the columns as team, heat, and time. 2. Have Minitab automatically enter the team data for you. a. Choose Calc / Make Patterned Data / Simple

More information

Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots

Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots STAT 350 (Spring 2015) Lab 3: SAS Solutions 1 Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots Note: The data sets are not included in the solutions;

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information