Solution to Tumor growth in mice
|
|
- Sibyl Wood
- 5 years ago
- Views:
Transcription
1 Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly the file is stored on your computer. One possible way of doing this is to set the path for the working directory. I like doing so because it allows me control also where R saves my working history and output. I works like this (on my computer, you should of course set your own path): setwd("c:/users/nxr382/documents/teaching/basicstatistics/2016/data") tumordat <- read.csv2('tumorvols.csv', header=true) Another possiblity is: # tumordat <- read.csv2(file.choose()) which opens a file browser so that you can click your way to the datafile. In either case, the dataset tumordat should appear on the list in the Work Environment. If you click on the table-icon following the description of the data you can see what is inside the spreadsheet. As appears records have been made for each mouse at each day of follow. We have information on tumor volume, treatment group, and on sacrifice. Note that after a mouse has been sacrificed is recorded as dead and the variable volume has a missing value; called NA " in R. 2. Get a quick overview of the data To find out how large the data is (i.e. how many records it contains) and which variables it contains I use the functions dim and names, and summary : dim(tumordat) ## [1] names(tumordat) ## [1] "day" "mouseid" "volume" "treatment" "dead" ## [6] "sacrificed" From this we see that the dataset contains 442 records and six variables (day, mouseid, volume, treatment, dead, and sacrificed). 1
2 The function summary gives a little bit more information about the contents of the data: summary(tumordat) ## day mouseid volume treatment dead ## Min. : 1 Min. :21.0 Min. : 17.1 chemo:136 Min. : ## 1st Qu.:11 1st Qu.:27.0 1st Qu.: contr:170 1st Qu.: ## Median :20 Median :35.5 Median : radio:136 Median : ## Mean :20 Mean :39.5 Mean : Mean : ## 3rd Qu.:29 3rd Qu.:54.0 3rd Qu.: rd Qu.: ## Max. :39 Max. :60.0 Max. : Max. : ## NA's :226 ## sacrificed ## Min. : ## 1st Qu.: ## Median : ## Mean : ## 3rd Qu.: ## Max. : ## We can see that follow-up lasted for 39 days and mice are labeled with id-numbers from 21 to 60. There are more records from the control group reflecting that this group contained 10 mice while the active treatment groups had only 8 mice each. A little bit of basic calculus reveals that there must be 17 records for each mouse. The variables dead and sacrificed are binary. It is clear to us that 0 and 1 should be interpreted as no and yes, but R take all numbers literally and report summary statistics that are apropriate for numerical variables. The only variable that is recognized as nominal is treatment since this variable has text values. Further note that in the summary the treatment groups are listed in alphabetic order. By default R always use the group that is first in alphabetic order as reference point in figures, tables, and statistical analyses. We can tell R to another group as referece by using the relevel function as follows: tumordat$treatment <- relevel(tumordat$treatment, ref='contr') summary(tumordat['treatment']) ## treatment ## contr:170 ## chemo:136 ## radio:136 Now the control-group is reference and appears first in the summary. 3. Extract the baseline data In what follows we will only be concerned with the baseline data, that is the records from the first day of the trial. We pick out the relevant records with the subset function and save them in a new dataframe called day1: day1 <- subset(tumordat, day==1) # Don't forget the two "="s here! 2
3 We run a quick summary on the new dataset to confirm that it only contains records from the first day. summary(day1['day']) ## day ## Min. :1 ## 1st Qu.:1 ## Median :1 ## Mean :1 ## 3rd Qu.:1 ## Max. :1 4. Tabulate the distribution on treatment groups. We can use the table function to count the number of mice in each treatment group. Just for the exercise we also make a barplot to display the counts. treat.table <- table(day1$treatment) print(treat.table) ## ## contr chemo radio ## barplot(treat.table) contr chemo radio 3
4 5. View the distribution of the tumor volumes. To get an impression of the distribution of the tumor volumes we first make a histogram: hist(day1$volume, probability = TRUE) Histogram of day1$volume Density day1$volume The argument probability=true standardized the area of the histogram to one which makes the histogram comparabel to theoretical probabilty distributions. If omitted R would show the number of observations in each interval on the y-axis instead. It is possible to specify more arguments to the hist function, e.g. if you want other breakpoints for the boxes than Rs defaults. Check out hist in the R help for other options that will change the appearance of your histogram. As summary statistics we compute the mean and the standard deviation. mean(day1$volume) ## [1] sd(day1$volume) ## [1] BUT: Are these apropriate summary statistics? The histogram is obviously skewed to the left. It doesn t look like a normaldistribution. We compute the normalrange (mean +/- 2*sd), recalling that in a normaldistribution 95% of the distribution falls within this range while 2.5% should be below this range and 2.5% above. 4
5 c(mean(day1$volume)-2*sd(day1$volume),mean(day1$volume)+2*sd(day1$volume)) ## [1] According to this a tumor volume of -100 mmˆ3 falls within the normalrange! 6. Do logarithmic transformation of skew data To obtain a normally distributed outcome we can try to tranform the tumorvolumes with the logarithm. I use the natural logarithm ( log in R), but other logarithms as log2 or log10 would serve just as well as they are all proportional. We add the variable logvol to the data using the transform function: day1 <- transform(day1, logvol=log(volume)) Let have a look at the histogram: hist(day1$logvol, probability = TRUE) Histogram of day1$logvol Density day1$logvol The histogram of the log-volumes is not skew as that of the raw data. It doesn t look exactly as a normal curve either but this looks more like a small sample variation than a systematic deviation. 5
6 We compute the normal range: c(mean(day1$logvol)-2*sd(day1$logvol), mean(day1$logvol)+2*sd(day1$logvol)) ## [1] The limits of the estimated normal range are more reasonable (close to the range of the data). 7. Use QQplots to check for normality. To compare data with a normaldistribution We could overlay the histograms with normal curves. However, this it not what I would recommend because there is an arbitrariness in the histogram due to the choice of breakpoints. QQplots (which compare the ordered data points to the corresponding quantiles of the normaldistribution) is a better option. par(mfrow=c(1,2)) # plot the figures side by side qqnorm(scale(day1$volume), xlim=c(-2.2,2.2), ylim=c(-2.2,2.2)) abline(0,1) qqnorm(scale(day1$logvol), xlim=c(-2.2,2.2), ylim=c(-2.2,2.2)) abline(0,1) Normal Q Q Plot Normal Q Q Plot Sample Quantiles Sample Quantiles Theoretical Quantiles Theoretical Quantiles I ve chosen to standardize data because then, if data is normally distributed, the points in the qqplot should be on the straight line with intercept 0 and slope 1. 6
7 We see that the qqplot of the log-transformed data do not deviate systematically from the straight line. The qqplot of the raw data is smiling which is an indication of skewness. 8. Compare the treatment groups in a boxplot. To compare the distribution of the tumor volumes between the three treatment groups we make side by side boxplots. boxplot(day1$volume~day1$treatment) # The tilde "~" means "depending on" in R contr chemo radio The boxes represent the inter quartile range (25% quantile to 75% quantile) with the median shown as the thick line. In the control group whiskers are drawn at the minimum and maximum value. In the active treatment groups the maximum value is marked out as an outlier and the upper whisker is drawn at the second largest tumor volume. This is due to the rule that whiskers cannot exceed 1.5 times the length of the box. If data has a normal distribution poits that exceed this limit are rare. The reason for pointing out outliers is firstly that they might be registration errors which should be corrected and secondly that they may have an unnproportionally high infuence on the results of the statistical analysis so that sensitivity analyses or robust statistics should be considered. Considering the comparison of the treatment groups: The median tumor volume appears to be larger in the control groups. However, this has to be a spurious finding since treatment was randomized and baseline volumes was measured just before treatment was initiated! What we see in the picture is pure random variation. We have three random samples from the exact same populaiton. 7
8 9. Display small samples in a stripchart. The sample sizes of the three groups are all rather small, so we can expect a good deal of random variation in the boxplot. Note that a quarter of the data in the active treatment groups amount to only two observations! An alternative display of tiny datasets is the stripchart which shows the individual data points in each group. In R stripcharts can be made with the stripchart function. Here I have added two optional arguments to the function, vertical=true impies that the strips are displayed vertically with groups on the x-axis, ylab= Tumor volumes (mm3) changes the label on the y-axis. stripchart(day1$volume~day1$treatment, vertical=true, ylab='tumor volume (mm3)') Tumor volume (mm3) contr chemo radio Additional graphical arguments could be supplied if you want a nicer looking figure for presentation. Check out stripchart in R help or par if you want the full list of all of R s graphichal parameters. 8
9 Exercise 2 1. Survivors at end of follow-up We can tabulate the day variable to see on which days of follow up tumor volumes were (last) recorded table(tumordat$day) ## ## ## We see that the last day of follow up was day 39. Lets pick out the data from this day to see how many mice survived throughout the trial: day39 <- subset(tumordat, day==39) table(day39$dead) ## ## 0 1 ## 2 24 Only two mice were survived throughout the trial. We can identify them as: subset(day39, dead==0) ## day mouseid volume treatment dead sacrificed ## radio 0 1 ## chemo 0 1 Mouse number 35 has a tumor volume close to 1000 mm 3 so it would most likely have been sacrificed on the next day had the trial continued. Mouse number 23 on the other hand has a very small tumor. Let s pick out all the records on the mouse and plot its growth curve to see what has happened: m23 <- subset(tumordat, mouseid==23) plot(m23$day, m23$volume, type='b') 9
10 m23$volume m23$day Either this mouse has a very slowly growing tumor or maybe the tumor cells didn t grow in the first place so that all that is measured is the thickness of its skin. 2. Comparison of suvival times between the groups Next we pick out the data from when the mice were sacrificed. This will tell us how long the mice survived in the trial. sacrificed <- subset(tumordat, sacrificed==1) summary(sacrificed) ## day mouseid volume treatment dead ## Min. : 4.0 Min. :21.00 Min. : 52.3 contr:10 Min. :0 ## 1st Qu.: 6.5 1st Qu.: st Qu.: chemo: 8 1st Qu.:0 ## Median :19.0 Median :35.50 Median : radio: 8 Median :0 ## Mean :18.5 Mean :39.50 Mean : Mean :0 ## 3rd Qu.:27.0 3rd Qu.: rd Qu.: rd Qu.:0 ## Max. :39.0 Max. :60.00 Max. : Max. :0 ## sacrificed ## Min. :1 ## 1st Qu.:1 ## Median :1 ## Mean :1 ## 3rd Qu.:1 ## Max. :1 10
11 It seems that mice 35 and 23 were sacrificed due to end of study and not because their tumor volumes exceeded the ethical limit of 1000 mm 3. Let s look at data from all the mice who had smaller tumor volumes at the time of sacrfice. subset(sacrificed, volume<1000) ## day mouseid volume treatment dead sacrificed ## radio 0 1 ## radio 0 1 ## radio 0 1 ## radio 0 1 ## chemo 0 1 ## chemo 0 1 ## chemo 0 1 ## chemo 0 1 ## chemo 0 1 ## contr 0 1 ## contr 0 1 There is quite a few mice that were sacrificed before their tumor reached the critical limit. We learn from the investigator that in practice mice have been sacrificed already when the tumor volume reached as size of approximately 900 mm 3. This was considered morst ethical since mice are only followed up on Mondays, Wednesdays, and Fridays. Lowering the limit to 875 mm 3 leaves us with the following mice: subset(sacrificed, volume<875) ## day mouseid volume treatment dead sacrificed ## radio 0 1 ## chemo 0 1 ## contr 0 1 Besides mouse number 23 that was killed at end of study, we see that mouse number 34 and mouse number 60 were sacrificed before their tumors reached the critical size (both had wounds and were mistriving). We are now confident that we have the correct survival data and that sacrifice is predominately due to progression in tumor growth (we have one censoring at end of study and two deaths due to other causes). Strictly speaking the survival time is the day of sacrifice minus one, so we need to add this variable to the data before we do descriptive statistics. For completeness we also add a censoring variable sacrificed <- transform(sacrificed, time=day-1, cens=(day==39)&(volume>875)) summary(sacrificed[c('day','time','cens')]) ## day time cens ## Min. : 4.0 Min. : 3.0 Mode :logical ## 1st Qu.: 6.5 1st Qu.: 5.5 FALSE:25 ## Median :19.0 Median :18.0 TRUE :1 ## Mean :18.5 Mean :17.5 NA's :0 ## 3rd Qu.:27.0 3rd Qu.:26.0 ## Max. :39.0 Max. :38.0 At last we can compare survival in the three groups. This would usually be done in a Kaplan-Meier plot, but since follow up is the same for all mice and there is only one censoring making a boxplot of the time variable is ok. 11
12 boxplot(sacrificed$time~sacrificed$treatment) contr chemo radio As appears survival is better in the active treatment groups than in the control group. Also the mice who got chemo therapy has survived somewhat longer than those who got radio therapy. Whether these are significant findings we cannot say. We would have to conduct a formal statitical analysis. 12
Understanding and Comparing Distributions. Chapter 4
Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables
More informationIT 403 Practice Problems (1-2) Answers
IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which
More informationPage 1. Graphical and Numerical Statistics
TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationLAB #1: DESCRIPTIVE STATISTICS WITH R
NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab
More informationStat 290: Lab 2. Introduction to R/S-Plus
Stat 290: Lab 2 Introduction to R/S-Plus Lab Objectives 1. To introduce basic R/S commands 2. Exploratory Data Tools Assignment Work through the example on your own and fill in numerical answers and graphs.
More informationWeek 7: The normal distribution and sample means
Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationplots Chris Parrish August 20, 2015
plots Chris Parrish August 20, 2015 plots We construct some of the most commonly used types of plots for numerical data. dotplot A stripchart is most suitable for displaying small data sets. data
More informationDensity Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.
1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram
More informationChapter 2: The Normal Distributions
Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and
More informationChapter 1. Looking at Data-Distribution
Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw
More information3.3 The Five-Number Summary Boxplots
3.3 The Five-Number Summary Boxplots Tom Lewis Fall Term 2009 Tom Lewis () 3.3 The Five-Number Summary Boxplots Fall Term 2009 1 / 9 Outline 1 Quartiles 2 Terminology Tom Lewis () 3.3 The Five-Number Summary
More informationWeek 4: Describing data and estimation
Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationYou will begin by exploring the locations of the long term care facilities in Massachusetts using descriptive statistics.
Getting Started 1. Create a folder on the desktop and call it your last name. 2. Copy and paste the data you will need to your folder from the folder specified by the instructor. Exercise 1: Explore the
More informationExploring and Understanding Data Using R.
Exploring and Understanding Data Using R. Loading the data into an R data frame: variable
More informationStatistical Graphics
Idea: Instant impression Statistical Graphics Bad graphics abound: From newspapers, magazines, Excel defaults, other software. 1 Color helpful: if used effectively. Avoid "chartjunk." Keep level/interests
More informationSections 2.3 and 2.4
Sections 2.3 and 2.4 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) 2 / 25 Descriptive statistics For continuous
More informationPractical 2: Using Minitab (not assessed, for practice only!)
Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need
More informationData Management Project Using Software to Carry Out Data Analysis Tasks
Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationPractical 2: Plotting
Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationBIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26
Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations
More informationYour Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread.
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #3 Interpreting the Standard Deviation and Exploring Transformations Objectives: 1. To review stem-and-leaf plots and their
More informationLecture 6: Chapter 6 Summary
1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z
More informationMore Numerical and Graphical Summaries using Percentiles. David Gerard
More Numerical and Graphical Summaries using Percentiles David Gerard 2017-09-18 1 Learning Objectives Percentiles Five Number Summary Boxplots to compare distributions. Sections 1.6.5 and 1.6.6 in DBC.
More informationBox Plots. OpenStax College
Connexions module: m46920 1 Box Plots OpenStax College This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License 3.0 Box plots (also called box-and-whisker
More informationLAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT
NAVAL POSTGRADUATE SCHOOL LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT Statistics (OA3102) Lab #2: Sampling, Sampling Distributions, and the Central Limit Theorem Goal: Use R to demonstrate sampling
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationNo Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.
No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. 3 confint A metafor package function that gives you the confidence intervals of effect sizes.
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationWeibull Reliability Analyses
Visual-XSel Software-Guide for Weibull The Weibull analysis shows the failure frequencies or the unreliability of parts and components in the Weibull-net and interprets them. Basics and more details can
More informationUsing Excel for Graphical Analysis of Data
Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are
More informationIntroduction to R Commander
Introduction to R Commander 1. Get R and Rcmdr to run 2. Familiarize yourself with Rcmdr 3. Look over Rcmdr metadata (Fox, 2005) 4. Start doing stats / plots with Rcmdr Tasks 1. Clear Workspace and History.
More informationLab 5 - Risk Analysis, Robustness, and Power
Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors
More informationBIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA
BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the
More informationAverages and Variation
Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationCreating a Box-and-Whisker Graph in Excel: Step One: Step Two:
Creating a Box-and-Whisker Graph in Excel: It s not as simple as selecting Box and Whisker from the Chart Wizard. But if you ve made a few graphs in Excel before, it s not that complicated to convince
More informationStatistics 251: Statistical Methods
Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics
More informationVocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.
5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table
More informationLecture 3 Questions that we should be able to answer by the end of this lecture:
Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair
More informationAssignments. Math 338 Lab 1: Introduction to R. Atoms, Vectors and Matrices
Assignments Math 338 Lab 1: Introduction to R. Generally speaking, there are three basic forms of assigning data. Case one is the single atom or a single number. Assigning a number to an object in this
More informationChapter 2 Modeling Distributions of Data
Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and
More informationSummarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester
Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these
More informationAn Experiment in Visual Clustering Using Star Glyph Displays
An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master
More informationLecture 3 Questions that we should be able to answer by the end of this lecture:
Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair
More informationCHAPTER 3: Data Description
CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a
More informationBasics of Plotting Data
Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based
More informationChapter 2: The Normal Distribution
Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60
More informationStatistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.
Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S0 SPSS Intro November 2013 Wilma Heemsbergen w.heemsbergen@nki.nl 1 13.00 ~ 15.30 Database (20 min) SPSS (40 min) Short break Exercise (60 min) This Afternoon During the
More informationChapter 3 Understanding and Comparing Distributions
Chapter 3 Understanding and Comparing Distributions In this chapter, we will meet a new statistics plot based on numerical summaries, a plot to track the changes in a data set through time, and ways to
More informationAn Introduction to R- Programming
An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University
More informationAz R adatelemzési nyelv
Az R adatelemzési nyelv alapjai II. Egészségügyi informatika és biostatisztika Gézsi András gezsi@mit.bme.hu Functions Functions Functions do things with data Input : function arguments (0,1,2, ) Output
More informationName Date Types of Graphs and Creating Graphs Notes
Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.
More informationChapter 5: The standard deviation as a ruler and the normal model p131
Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S0 SPSS Intro December 2014 Wilma Heemsbergen w.heemsbergen@nki.nl This Afternoon 13.00 ~ 15.00 SPSS lecture Short break Exercise 2 Database Example 3 Types of data Type
More informationUsing Large Data Sets Workbook Version A (MEI)
Using Large Data Sets Workbook Version A (MEI) 1 Index Key Skills Page 3 Becoming familiar with the dataset Page 3 Sorting and filtering the dataset Page 4 Producing a table of summary statistics with
More informationGraphical Analysis of Data using Microsoft Excel [2016 Version]
Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationDepending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.
1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts
More informationThings you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.
1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.
More informationSTA Module 4 The Normal Distribution
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationSTA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationComputing With R Handout 1
Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationUnit I Supplement OpenIntro Statistics 3rd ed., Ch. 1
Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 KEY SKILLS: Organize a data set into a frequency distribution. Construct a histogram to summarize a data set. Compute the percentile for a particular
More informationName: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution
Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the
More informationChapter 2 Describing, Exploring, and Comparing Data
Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative
More informationChapter 5snow year.notebook March 15, 2018
Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationSTAT:5400 Computing in Statistics
STAT:5400 Computing in Statistics Introduction to SAS Lecture 18 Oct 12, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowaedu SAS SAS is the statistical software package most commonly used in business,
More informationInternational Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata
International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata Paul Dickman September 2003 1 A brief introduction to Stata Starting the Stata program
More informationBar Charts and Frequency Distributions
Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats
More informationDAY 52 BOX-AND-WHISKER
DAY 52 BOX-AND-WHISKER VOCABULARY The Median is the middle number of a set of data when the numbers are arranged in numerical order. The Range of a set of data is the difference between the highest and
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationA (very) brief introduction to R
A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce
More informationThe first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.
Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationAn introduction to WS 2015/2016
An introduction to WS 2015/2016 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Prof. Dr. Martin Hutzenthaler (previously AG Metzler, now University of Duisburg-Essen) course
More informationChapter 2: Descriptive Statistics
Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,
More informationLab 1: Introduction, Plotting, Data manipulation
Linear Statistical Models, R-tutorial Fall 2009 Lab 1: Introduction, Plotting, Data manipulation If you have never used Splus or R before, check out these texts and help pages; http://cran.r-project.org/doc/manuals/r-intro.html,
More informationUsing Excel for Graphical Analysis of Data
EXERCISE Using Excel for Graphical Analysis of Data Introduction In several upcoming experiments, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationBoxplot
Boxplot By: Meaghan Petix, Samia Porto & Franco Porto A boxplot is a convenient way of graphically depicting groups of numerical data through their five number summaries: the smallest observation (sample
More information1.3 Graphical Summaries of Data
Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this
More informationCHAPTER 6. The Normal Probability Distribution
The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit
More informationMinitab Notes for Activity 1
Minitab Notes for Activity 1 Creating the Worksheet 1. Label the columns as team, heat, and time. 2. Have Minitab automatically enter the team data for you. a. Choose Calc / Make Patterned Data / Simple
More informationLab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots
STAT 350 (Spring 2015) Lab 3: SAS Solutions 1 Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots Note: The data sets are not included in the solutions;
More informationUnivariate Statistics Summary
Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:
More information