R Bootcamp Part I (B)

Size: px
Start display at page:

Download "R Bootcamp Part I (B)"

Transcription

1 R Bootcamp Part I (B) An R Script is available to make it easy for you to copy/paste all the tutorial commands into RStudio: Preliminaries: Skip this during Bootcamp presentation The Dataset Used in This Part of the R Bootcamp: The data are a modification of the survey dataset in the MASS package. For the record, below shows exactly how we created the data for the tutorial file datafile.csv. You can ignore this stuff and when you get to Step (2) below, just read the already-modified data file into R. library(mass) help(survey) # remove missing values for this example to illustrate basic functions mydata <- na.omit(survey) n <- nrow(mydata) # rename variables to allow "generic" code examples for the tutorial set.seed( ) library(tidyverse) mydata <- transmute(mydata, y = Pulse, x = *y + rnorm(n, m=0, s=7), x1 = Wr.Hnd, x2 = NW.Hnd, x3 = Height, g = Clap, g1 = Exer, g2 = Sex ) The Variable Names y is a quantitative (perhaps response) variable (plotted on y-axis) x, x1, x2, x3 are quantitative (perhaps explanatory/predictor) variables (plotted on x-axis) g, g1, g2 are categorical (perhaps explanatory/predictor) variables (g stands for group ) Starting any Data Analysis (1) Skip this step if using rstudio.uchicago.edu Load extra packages every time you start RStudio installed on your own computer If you are using packages are automatically loaded for you and you can skip down to step (2). Yay! If you are using RStudio on your own computer, you should enter the following into the Console each time you start up the software. This loads up many usefuel extra R functions. library(mass) library(ggally) library(openintro) library(mosaic) library(knitr) library(tidyverse) library(ggformula) R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 1 of 13

2 library(gridextra) library(broom) Caution: A lot of red text will flash by as various packages are loaded up. This is normal....unless you see messages labeled Error. The most common error is "there is no package called..." This means that you never installed the package. Go back the the Canvas R Help page and find the installation instructions ( Follow the 3rd step of the instructions: Install add-on packages. (2) Download, Upload, then Read in the data. Your instructor will likely provide you with clean, comma-separated datasets, with variable names already included. For example, Dr. Collins plans to store STAT 234 datasets at Many instructors post data files on their Canvas website. For this tutorial, we put the data at this location: 1. Go to and download datafile.csv 2. Upload the file to rstudio.uchicago.edu 3. Read the data file into R/RStudio mydata <- read.csv("datafile.csv") (3) Check the data. Explore the data structure. Always make a quick check of the data you just loaded into R. The following functions will help. # in the tibble pkg within the tidyverse set of pkgs glimpse(mydata) # info on variable names, types, and values # base R functions str(mydata) # info on variable names, types, and values summary(mydata) # numerical summaries of the values in each variable head(mydata, 10) # see the first 10 rows, cases, observations tail(mydata, 7) # see the last 7 rows, cases, observations names(mydata) # see only the names of the dataset columns, variables R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 2 of 13

3 View(mydata) # view the data in a spreadsheet-like format Just one or two of these functions should suffice to get the information you need to move forward with your data analysis. Which function(s) depends on the type of information you are looking for. End of Preliminaries: Start Bootcamp presentation here Learn by Doing: Follow Along On Your Computer Right Now Start RStudio or just log on at Workflow: The Source Pane in RStudio An R Script is available to make it easy for you to copy/paste all the tutorial commands into RStudio: Download RBootcamp1B Rcode.R and upload to rstudio.uchicago.edu Go to the Files tab and double-click on the filename. To run code, highlight line(s) of R code and click CTL-Enter (Windows) or CMD-Enter (Mac). Read in the file datafile.csv mydata <- read.csv("datafile.csv") and then run the R code below from the R Script RBootcamp1B Rcode.R. OK. Here we go! Every command in R is a function: mean(), histogram(), var(),... The R function syntax for indicating y-axis and x-axis variables is typically either 1. a list of variables: functionname(x,y) or 2. a formula y = f(x), entered like this: functionname(y ~ x). The symbol is called a tilde or a squiggle. On your keyboard, upper left under the esc key. Formula-based notation makes it easy to summarize results by groups (categories). We recommend using the following formula-based packages for Numerical summaries: mosaic package (augments many base R functions) Graphical summaries: ggformula package (formula-based interface for ggplot2 graphics). Example: Convincing you that formula-based functions are easier Here is an example of getting the mean of a quantitative variable (y) across several groups (a categorical variable, g). The categorical variable g has three levels ("Left", "Right", "Neither"). Using base R things look rather ugly: R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 3 of 13

4 head(mydata, 12) mean(mydata$y[ mydata$g == "Left" ]) mean(mydata$y[ mydata$g == "Right" ]) mean(mydata$y[ mydata$g == "Neither" ]) Of course, you could Google and find this (opaque, but shorter) code using base R: tapply(mydata$y, mydata$g, mean) Strong recommendation: Use the mosaic package instead, which elegantly extends the abilities of base R functions: mean(y ~ g, data = mydata) This is transparent. The mean of y for each group in g: easier to read, understand, and remember. Numerical Summaries The mosaic package augments most of the basic numerical summaries to allow formula-based input. library(mosaic) # sample average, sd, and variance mean(~ y, data = mydata) sd(~ y, data = mydata) var(~ y, data = mydata) # five-number summary: min, Q1, median, Q3, max min(~ y, data = mydata) quantile(~ y, data = mydata, probs = 0.25) median(~ y, data = mydata) quantile(~ y, data = mydata, probs = 0.75) max(~ y, data = mydata) # another way to get the median quantile(~ y, data = mydata, probs = 0.50) # getting the five-number summary from one function fivenum(~ y, data = mydata) quantile(~ y, data = mydata, probs = c(0, 0.25, 0.50, 0.75, 1)) # favstats is a function in the mosaic pkg (not available in base R) favstats(~ y, data = mydata) # separate summaries of a quantitative variable by groups/categories mean(~ y g, data = mydata) sd(~ y g, data = mydata) favstats(~ y g, data = mydata) R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 4 of 13

5 # summaries of y for each combination of values from two sets of groups/categories mean(~ y g1 + g2, data = mydata) sd(~ y g1 + g2, data = mydata) favstats(~ y g1 + g2, data = mydata) Graphical Summaries We highly recommend using the ggformula package for plotting. Some ggformula tutorials are available, but in this R Bootcamp you will see pretty much all syntax you will need for intro stats. The ggformula authors provide a tutorial: org/web/packages/ggformula/vignettes/ggformula.html After demonstrating ggformula plots, we will recreate some of the same plots using base R. We think you should at least know that base R does have plotting capabilities, because if you ask someone for help iwht R, they may not be familiar with recent developments (like the ggformula package). However, you ll see pretty quickly from the base R examples, that ggformula plotting is much, much simpler and feature-rich. Basic syntax for ggformula functions: library(ggformula) # This is not real R code. It will not work. It's just a template. gf_plottype(formula, data = mydata) %>% gf_plotdecorator1( options ) %>% gf_plotdecorator2( options ) The elements of the plot are chained together by the %>% operator plottype is a keyword for a type of plot, for example, dotplot (common options: binwidth, dotsize, fill = varname) boxplot (common option: outlier.size = 0.5 or outlier.size = 1.5) bar (a bar plot of counts for categorical variables) histogram (common options: bins = 12, color = "white") point (a scatterplot with nifty options: color = g, size = x) qq (quantile plot with default option: distribution = qnorm) qqline (quantile plot reference line with default option: distribution = qnorm) and common plotdecorator functions modify size and color, add lines and text: gf labs(x = "x-axis label", y = "y-axis label") gf lims(x = c(1, 5), y = c(10, 40)) (set the limits of the axes) gf hline(yintercept = 7, linetype = "dashed") (add a dashed horizontal line at y=7) gf vline(xintercept = 4, linetype = "dotted", size = 2) (add a thick, dotted vertical line) gf abline(intercept = 3, slope = 2) (add a line with intercept and slope specified R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 5 of 13

6 gf coefline(model = lm(y and more... x, data = mydata)) (add a least-squares regression line) Reminder: Always include clarifying axis labels on all graphs using options like: gf_labs(x = "x-axis label (with units)", y = "y-axis label (with units)") Dotplots and Boxplots # dotplot colored by group membership gf_dotplot(~ y, data = mydata, fill = ~ g2, binwidth = 3, binpositions = "all") # separate boxplots for each group/category gf_boxplot(y ~ g, data = mydata, outlier.size = 0.75) %>% gf_labs(x = "x-axis label", y = "y-axis label (with units)") # swap (flip) the x and y axes (horizontal boxplots) gf_boxplot(y ~ g, data = mydata, outlier.size = 1.5) %>% gf_labs(x = "y-axis label", y = "x-axis label (with units)") + coord_flip() # Note: used "+" not "%>%" to "chain" since coord_flip() is in ggplot2, but not in ggformula We do NOT recommend it, but someone will likely try to help you with dotplots and boxplots by showing you base R plotting with stripchart() and boxplot(). We demonstrate these functions at the end of this document. Barplots # barplot of counts for categorical variable gf_bar(~ g, data = mydata) # separate barplots by categories of another variable, colored by a 3rd variable gf_bar(~ g g1, data = mydata, fill = ~ g2, position = "stack") gf_bar(~ g g1, data = mydata, fill = ~ g2, position = "dodge") gf_bar(~ g g1, data = mydata, fill = ~ g2, position = "fill") We do NOT recommend it, but someone will likely try to help you with barplots by showing you base R plotting with barplot(). Consider trying to recreate a plot like this in base R: gf_bar(~ g g1, data = mydata, fill = ~ g2, position = "stack") See the base R barplot difficulty level yourself at Histograms R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 6 of 13

7 # histogram of counts for quantitative variable gf_histogram(~ x, data = mydata, bins = 20, color = "white") # or, more explicitly asking for counts... gf_histogram(..count.. ~ x, data = mydata, bins = 20, color = "white") # separate histograms for each group/category gf_histogram(~ x g, data = mydata, bins = 15, color = "white") # or stack the histograms for easier comparison gf_histogram(~ x g, data = mydata, bins = 15, color = "white") + facet_wrap(~ g, ncol=1) # Note: used "+" not "%>%" to "chain" since facet_wrap() is in ggplot2, but not in ggformula # ggformula has not translated *everything* you might want from ggplot2 #...and yes, we had to Google to even find this syntax We do NOT recommend it, but someone will likely try to help you with histograms by showing you base R plotting with hist(). Consider trying to re-create a plot like this in base R: gf_histogram(~ x g, data = mydata, bins = 15, color = "white") See the base R histogram difficulty yourself at Scatterplots # scatterplot of y vs. x gf_point(y ~ x, data = mydata) %>% gf_labs(x = "x-axis label (with units)", y = "y-axis label (with units)") # separate scatterplots for each group/category gf_point(y ~ x g, data = mydata) # one plot, color indicates group/catetory gf_point(y ~ x, color = ~ g, data = mydata) # one plot, color indicates group (categorical), size indicate a 3rd quantitative variable gf_point(y ~ x, size = ~ x3, color = ~ g, alpha = 0.5, data = mydata) # add horizontal, vertical, and intercept/slope lines to a scatterplot gf_point(y ~ x, data = mydata) %>% gf_hline(yintercept = mean(~ y, data = mydata), linetype = "dotted") %>% gf_vline(xintercept = mean(~ x, data = mydata), size = 1.5) %>% gf_abline(intercept = 41, slope = 0.09, linetype = "dashed") R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 7 of 13

8 # For illustraton, get the point (0,0) onto the graph to see the intercept = 40 gf_point(y ~ x, data = mydata) %>% gf_lims(x = c(0, max(~ x, data = mydata)), y = c(0, max(~ y, data = mydata))) %>% gf_abline(intercept = 41, slope = 0.09, linetype = "dashed") # add a least-squares regression line to a scatterplot gf_point(y ~ x, data = mydata) %>% gf_abline(intercept = 41, slope = 0.09, linetype = "dashed") %>% gf_coefline(model = lm(y ~ x, data = mydata), size = 1.5) We do NOT recommend it, but someone will likely try to help you with scatterplots by showing you base R plotting with plot(). We demonstrate these functions at the end of this document. Displaying Several Plots Together The gridextra package has a function called grid.arrange that works nicely. Save each plot first, then display them with the grid.arrange function. plot1 <- gf_histogram(~ y g2, data = mydata, bins = 12, color = "white") + facet_wrap(~ g2, ncol = 1) plot2 <- gf_histogram(~ x g2, data = mydata, bins = 8, color = "white") plot3 <- gf_dotplot(~ x3, data = mydata, binwidth = 2, dotsize = 0.8, fill = ~ g2, binpositions = "all") plot4 <- gf_point(y ~ x, size = ~ x3, color = ~ g2, alpha = 0.7, data = mydata) library(gridextra) grid.arrange(plot1, plot2, ncol = 2) grid.arrange(plot3, plot4, nrow = 2) STOP R Bootcamp Part I(B) Presentation Here Here are more advanced plot types you for your reference later in an introductory statistics course. Then we show examples of some of the plots in this document created using base R plotting functions. Quantile Plots # normal quantile plot gf_qq(~ y, data = mydata) %>% gf_qqline() R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 8 of 13

9 # or, equivalently gf_qq(~ y, data = mydata, distribution = qnorm) %>% gf_qqline(distribution = qnorm) # standardize the data (a good idea to make normal quantile plots comparable) mydata <- mutate(mydata, z = (y - mean(y)) / sd(y)) # normal quantile plot with data standardized gf_qq(~ z, data = mydata, distribution = qnorm) %>% gf_qqline(distribution = qnorm) # or compare the data to a different distribution: here, t-distn (df = 5) gf_qq(~ z, data = mydata, distribution = qt, dparams = list(df = 5)) %>% gf_qqline(distribution = qt, dparams = list(df = 5)) # separate normal quantile plots for each group/category gf_qq(~ z g, data = mydata, distribution = qnorm) %>% gf_qqline(distribution = qnorm) We do NOT recommend it, but someone will likely try to help you with normal quantile plots by showing you base R plotting with qqnorm(). Getting quantile plots for other (non-normal) distributions (like the t-distribution demonstrated above) requires other base R functions (like qqplot) and the set-up of temporary variables (it s not pretty). We demonstrate the qqnorm function at the end of this document. Simple Linear Regression Plots It s OK if you don t know what linear regression is...yet # find the least squares regression equation and save the results myfit <- lm(y ~ x, data = mydata) # examine the results summary(myfit) coef(myfit) library(broom) tidy(myfit, conf.int = TRUE, conf.level = 0.95) glance(summary(myfit)) # find the SSE for a model: Recall sigmahat = SSE / df sigmahat <- as.numeric( glance(summary(myfit))[3] ) df <- as.numeric( glance(summary(myfit))[6] ) R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 9 of 13

10 SSE <- sigmahat * df c(sigmahat, df, SSE) # view the results data.frame(sigmahat, df, SSE) # fancier view of the results # make a scatterplot of y vs. x and add the regression line to the plot gf_point(y ~ x, data = mydata) %>% gf_coefline(model = myfit) # add the fitted values, residuals, and stndardized residuals to dataset mydata <- mutate(mydata, fitted = fitted(myfit), resids = residuals(myfit), sresids = rstandard(myfit) ) # plot the residuals vs. fitted values (with dashed line at y = 0) gf_point(resids ~ fitted, data = mydata) %>% gf_hline(yintercept = 0, linetype = "dashed") %>% gf_labs(x = "fitted values", y = "residuals") # plot the standardized residuals vs. fitted values (with dashed lines at y = -2, 0, 2) gf_point(sresids ~ fitted, data = mydata) %>% gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed") %>% gf_labs(x = "fitted values", y = "studentized residuals") NOT recommended: base R plots Here, we try to recreate some of the plots above using base R. If time, some plots will be demonstrated during the R Bootcamp especially consider that we gave up on creating a three-variable barplot and we encountered complexities for separate histograms by category below. Of course, you can try these plots later, on your own. We are confident you will quickly get the idea that ggformula plotting is so much easier and allows greater complexity and flexibility in plotting. Take-Home Message: Use ggformula plotting, not base R. Base R Dotplots and Boxplots There is no dotplot function in base R, but We Googled and found stripchart() in base R. # dotplot separated by group membership (We didn't succeed at matching gf_dotplot output) stripchart(mydata$y ~ mydata$g, pch=20, method="stack", horizontal=true, xlab = "x-axis label") R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 10 of 13

11 # separate boxplots for each group/category boxplot(mydata$y ~ mydata$g, pch=20, cex=1.5, xlab = "x-axis label") # swap (flip) the x and y axes (horizontal boxplots) boxplot(mydata$y ~ mydata$g, pch=20, cex=1.5, xlab = "x-axis label", horizontal=true) Base R Barplots and Histograms # barplot of counts for categorical variable counts <- table(mydata$g) barplot(counts) # separate barplots by categories of another variable, colored by a 3rd variable counts <- table(mydata$g, mydata$g1) barplot(counts, legend=unique(mydata$g), col=unique(mydata$g)) We re not even going to try adding a third variable to a barplot in base R. See the difficulty level yourself at # histogram of counts for quantitative variable hist(mydata$x, xlab = "x-axis label", breaks = 20, col = "grey", main="") # or, more explicitly asking for counts... hist(mydata$x, freq = TRUE, xlab = "x-axis label", breaks = 20, col = "grey", main="") Separate histograms for each group/category table(mydata$g) # Just reminding myself of the categories in variable g par(mfrow=c(1,3)) # set up plotting space for 1 row of 3 plots hist(mydata$x[mydata$g == "Left"], xlab = "Left", breaks = 12, col = "grey", main="") hist(mydata$x[mydata$g == "Neither"], xlab = "Neither", breaks = 12, col = "grey", main="") hist(mydata$x[mydata$g == "Right"], xlab = "Right", breaks = 12, col = "grey", main="") par(mfrow=c(1,1)) # return the plotting space to 1 plot per page Unfortunately, these plots are not on the same scale....and whenever you are asked to compare graphs, you are required to display them using the same scale (or you won t get full credit on homework). So, we ll try to fix the problem in Base R. After looking at the 3 histograms above, now that we can see about how tall the tallest plot is. We ll use that information to make all 3 plots that tall. R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 11 of 13

12 par(mfrow=c(1,3)) # set up plotting space for 1 row of 3 plots hist(mydata$x[mydata$g == "Left"], xlab = "Left", breaks = 12, col = "grey", main="", ylim = c(0, 15)) hist(mydata$x[mydata$g == "Neither"], xlab = "Neither", breaks = 12, col = "grey", main="", ylim = c(0, 15)) hist(mydata$x[mydata$g == "Right"], xlab = "Right", breaks = 12, col = "grey", main="", ylim = c(0, 15)) par(mfrow=c(1,1)) # return the plotting space to 1 plot per page Reminder: Separate histograms for each group/catetory is just one line using ggformula. Plus, all three plots are automatically on the same scale for easy comparison. gf_histogram(~ x g, data = mydata, bins = 12, color = "white") # or stack the histograms for easier comparison par(mfrow=c(3,1)) # set up plotting space for 3 rows of 1 plot each hist(mydata$x[mydata$g == "Left"], xlab = "Left", breaks = 12, col = "grey", main="", ylim = c(0, 15)) hist(mydata$x[mydata$g == "Neither"], xlab = "Neither", breaks = 12, col = "grey", main="", ylim = c(0, 15)) hist(mydata$x[mydata$g == "Right"], xlab = "Right", breaks = 12, col = "grey", main="", ylim = c(0, 15)) par(mfrow=c(1,1)) # return the plotting space to 1 plot per page Base R Scatterplots # scatterplot of y vs. x plot(mydata$x, mydata$y, pch=20, xlab = "x-axis label (with units)", ylab = "y-axis label (with units)") # separate scatterplots for each group/category table(mydata$g) # Just reminding myself of the category names par(mfrow=c(1,3)) # set up plotting space for 1 row of 3 plots plot(mydata$x[mydata$g == "Left"], mydata$y[mydata$g == "Left"], pch=20, xlab = "x-axis label (with units)", ylab = "y-axis label (with units)", main = "Left") plot(mydata$x[mydata$g == "Neither"], mydata$y[mydata$g == "Neither"], pch=20, xlab = "x-axis label (with units)", ylab = "y-axis label (with units)", main = "Neither") plot(mydata$x[mydata$g == "Right"], mydata$y[mydata$g == "Right"], pch=20, xlab = "x-axis label (with units)", ylab = "y-axis label (with units)", main = "Right") par(mfrow=c(1,1)) # return the plotting space to 1 plot per page Again, the plots are not on the same scale and so not directly comparable (and this would not receive full credit on homework). This is crazy! First, we had to type in the values of the categorical variable R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 12 of 13

13 myself; typed 3 times for each value! Then, the plots are not even on the same scale, which does not get you full credit on homework. Recall that with ggformula, just one line to do it all! gf point(y x g, data = mydata) We re going to stop here and not bother trying any of the more difficult scatterplots using just base R. We ll stick with ggformula plotting and we strongly recommend you do the same. Base R Quantile Plots # normal quantile plot qqnorm(mydata$y, pch = 20) qqline(mydata$y) # standardize the data (a good idea to make normal quantile plots comparable) mydata <- mutate(mydata, z = (y - mean(y)) / sd(y)) # normal quantile plot with data standardized qqnorm(mydata$z, pch = 20) qqline(mydata$z) You already know, from demonstrations of hist() and plot() above, that plotting separately by group is difficult in base R. We ll not attempt it for quantile plots. Base R Displaying Several Plots of Different Types Together We re not sure how to do this in base R. Can it be done??? grid.arrange() does not work with the plot types created by base R. We could Google, but the answer would be so difficult, we would lose interest. We ll stick with ggformula plots and use the grid.arrange function to display them together. Base R Simple Linear Regression Plots We are tiring of base R plotting at this point, so we ll stop now. Take-Home Message: Use ggformula plotting, not base R. R Bootcamp: Part 1(B) Revised: March 30, 2018 (2:26pm) 13 of 13

Statistics 251: Statistical Methods

Statistics 251: Statistical Methods Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics

More information

Practical 2: Plotting

Practical 2: Plotting Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

Homework 1 Excel Basics

Homework 1 Excel Basics Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the

More information

STAT 213 HW0a. R/RStudio Intro / Basic Descriptive Stats. Last Revised February 5, 2018

STAT 213 HW0a. R/RStudio Intro / Basic Descriptive Stats. Last Revised February 5, 2018 STAT 213 HW0a R/RStudio Intro / Basic Descriptive Stats Last Revised February 5, 2018 1 Starting R/RStudio There are two ways you can run the software we will be using for labs, R and RStudio. Option 1

More information

Basics of Plotting Data

Basics of Plotting Data Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based

More information

Introduction to Minitab 1

Introduction to Minitab 1 Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software. Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

STAT 213: R/RStudio Intro

STAT 213: R/RStudio Intro STAT 213: R/RStudio Intro Colin Reimer Dawson Last Revised February 10, 2016 1 Starting R/RStudio Skip to the section below that is relevant to your choice of implementation. Installing R and RStudio Locally

More information

1 Introduction to Using Excel Spreadsheets

1 Introduction to Using Excel Spreadsheets Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)

More information

Creating a Box-and-Whisker Graph in Excel: Step One: Step Two:

Creating a Box-and-Whisker Graph in Excel: Step One: Step Two: Creating a Box-and-Whisker Graph in Excel: It s not as simple as selecting Box and Whisker from the Chart Wizard. But if you ve made a few graphs in Excel before, it s not that complicated to convince

More information

Basic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19

Basic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19 Basic Statistical Graphics in R. Stem and leaf plots Example. Create a vector of data titled exam containing the following scores: 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

An Introduction to Minitab Statistics 529

An Introduction to Minitab Statistics 529 An Introduction to Minitab Statistics 529 1 Introduction MINITAB is a computing package for performing simple statistical analyses. The current version on the PC is 15. MINITAB is no longer made for the

More information

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker,

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, Orientation to MINITAB, Mary Parker, mparker@austincc.edu. Last updated 1/3/10. page 1 of Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, mparker@austincc.edu When you

More information

STAT 113: R/RStudio Intro

STAT 113: R/RStudio Intro STAT 113: R/RStudio Intro Colin Reimer Dawson Last Revised September 1, 2017 1 Starting R/RStudio There are two ways you can run the software we will be using for labs, R and RStudio. Option 1 is to log

More information

Data Visualization. Andrew Jaffe Instructor

Data Visualization. Andrew Jaffe Instructor Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data

More information

Intro to Stata for Political Scientists

Intro to Stata for Political Scientists Intro to Stata for Political Scientists Andrew S. Rosenberg Junior PRISM Fellow Department of Political Science Workshop Description This is an Introduction to Stata I will assume little/no prior knowledge

More information

Data Management Project Using Software to Carry Out Data Analysis Tasks

Data Management Project Using Software to Carry Out Data Analysis Tasks Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min

More information

Introduction to R. Andy Grogan-Kaylor October 22, Contents

Introduction to R. Andy Grogan-Kaylor October 22, Contents Introduction to R Andy Grogan-Kaylor October 22, 2018 Contents 1 Background 2 2 Introduction 2 3 Base R and Libraries 3 4 Working Directory 3 5 Writing R Code or Script 4 6 Graphical User Interface 4 7

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky R Workshop Module 3: Plotting Data Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky October 15, 2013 Reading in Data Start by reading the dataset practicedata.txt

More information

An Introduction to R 2.2 Statistical graphics

An Introduction to R 2.2 Statistical graphics An Introduction to R 2.2 Statistical graphics Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop, 29-Apr-2015 Scatter plots

More information

Stat 290: Lab 2. Introduction to R/S-Plus

Stat 290: Lab 2. Introduction to R/S-Plus Stat 290: Lab 2 Introduction to R/S-Plus Lab Objectives 1. To introduce basic R/S commands 2. Exploratory Data Tools Assignment Work through the example on your own and fill in numerical answers and graphs.

More information

Graphing Bivariate Relationships

Graphing Bivariate Relationships Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship

More information

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS. 1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts

More information

0 Graphical Analysis Use of Excel

0 Graphical Analysis Use of Excel Lab 0 Graphical Analysis Use of Excel What You Need To Know: This lab is to familiarize you with the graphing ability of excels. You will be plotting data set, curve fitting and using error bars on the

More information

Introduction to R: Day 2 September 20, 2017

Introduction to R: Day 2 September 20, 2017 Introduction to R: Day 2 September 20, 2017 Outline RStudio projects Base R graphics plotting one or two continuous variables customizable elements of plots saving plots to a file Create a new project

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

An Introduction to the R Commander

An Introduction to the R Commander An Introduction to the R Commander BIO/MAT 460, Spring 2011 Christopher J. Mecklin Department of Mathematics & Statistics Biomathematics Research Group Murray State University Murray, KY 42071 christopher.mecklin@murraystate.edu

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Demo yeast mutant analysis

Demo yeast mutant analysis Demo yeast mutant analysis Jean-Yves Sgro February 20, 2018 Contents 1 Analysis of yeast growth data 1 1.1 Set working directory........................................ 1 1.2 List all files in directory.......................................

More information

Intro To Excel Spreadsheet for use in Introductory Sciences

Intro To Excel Spreadsheet for use in Introductory Sciences INTRO TO EXCEL SPREADSHEET (World Population) Objectives: Become familiar with the Excel spreadsheet environment. (Parts 1-5) Learn to create and save a worksheet. (Part 1) Perform simple calculations,

More information

IPS9 in R: Bootstrap Methods and Permutation Tests (Chapter 16)

IPS9 in R: Bootstrap Methods and Permutation Tests (Chapter 16) IPS9 in R: Bootstrap Methods and Permutation Tests (Chapter 6) Bonnie Lin and Nicholas Horton (nhorton@amherst.edu) July, 8 Introduction and background These documents are intended to help describe how

More information

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015 R syntax guide Richard Gonzalez Psychology 613 August 27, 2015 This handout will help you get started with R syntax. There are obviously many details that I cannot cover in these short notes but these

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS code SAS (originally Statistical Analysis Software) is a commercial statistical software package based on a powerful programming

More information

Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3

Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3 Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3 This tutorial shows you: how to simulate a random process how to plot the distribution of a variable how to assess the distribution

More information

Here is the data collected.

Here is the data collected. Introduction to Scientific Analysis of Data Using Spreadsheets. Computer spreadsheets are very powerful tools that are widely used in Business, Science, and Engineering to perform calculations and record,

More information

How to use Excel Spreadsheets for Graphing

How to use Excel Spreadsheets for Graphing How to use Excel Spreadsheets for Graphing 1. Click on the Excel Program on the Desktop 2. You will notice that a screen similar to the above screen comes up. A spreadsheet is divided into Columns (A,

More information

Lab 1 Introduction to R

Lab 1 Introduction to R Lab 1 Introduction to R Date: August 23, 2011 Assignment and Report Due Date: August 30, 2011 Goal: The purpose of this lab is to get R running on your machines and to get you familiar with the basics

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

QUEEN MARY, UNIVERSITY OF LONDON. Introduction to Statistics

QUEEN MARY, UNIVERSITY OF LONDON. Introduction to Statistics QUEEN MARY, UNIVERSITY OF LONDON MTH 4106 Introduction to Statistics Practical 1 10 January 2012 In this practical you will be introduced to the statistical computing package called Minitab. You will use

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a

More information

Individual Covariates

Individual Covariates WILD 502 Lab 2 Ŝ from Known-fate Data with Individual Covariates Today s lab presents material that will allow you to handle additional complexity in analysis of survival data. The lab deals with estimation

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Chapter 2 The SAS Environment

Chapter 2 The SAS Environment Chapter 2 The SAS Environment Abstract In this chapter, we begin to become familiar with the basic SAS working environment. We introduce the basic 3-screen layout, how to navigate the SAS Explorer window,

More information

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...

More information

How to Make Graphs in EXCEL

How to Make Graphs in EXCEL How to Make Graphs in EXCEL The following instructions are how you can make the graphs that you need to have in your project.the graphs in the project cannot be hand-written, but you do not have to use

More information

Advanced Regression Analysis Autumn Stata 6.0 For Dummies

Advanced Regression Analysis Autumn Stata 6.0 For Dummies Advanced Regression Analysis Autumn 2000 Stata 6.0 For Dummies Stata 6.0 is the statistical software package we ll be using for much of this course. Stata has a number of advantages over other currently

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3A Visualizing Data By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to visualize data. If you intend to

More information

Mixed models in R using the lme4 package Part 2: Lattice graphics

Mixed models in R using the lme4 package Part 2: Lattice graphics Mixed models in R using the lme4 package Part 2: Lattice graphics Douglas Bates University of Wisconsin - Madison and R Development Core Team University of Lausanne July 1,

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

Excel Primer CH141 Fall, 2017

Excel Primer CH141 Fall, 2017 Excel Primer CH141 Fall, 2017 To Start Excel : Click on the Excel icon found in the lower menu dock. Once Excel Workbook Gallery opens double click on Excel Workbook. A blank workbook page should appear

More information

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version) Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:

More information

SPSS. (Statistical Packages for the Social Sciences)

SPSS. (Statistical Packages for the Social Sciences) Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.

More information

Using Built-in Plotting Functions

Using Built-in Plotting Functions Workshop: Graphics in R Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky September 15, 2016 Using Built-in Plotting Functions ## Plotting One Quantitative

More information

INSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING

INSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING APPENDIX INSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING (Developed by Dr. Dale Vogelien, Kennesaw State University) ** For a good review of basic

More information

Homework set 4 - Solutions

Homework set 4 - Solutions Homework set 4 - Solutions Math 3200 Renato Feres 1. (Eercise 4.12, page 153) This requires importing the data set for Eercise 4.12. You may, if you wish, type the data points into a vector. (a) Calculate

More information

Data Import and Formatting

Data Import and Formatting Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data

More information

1 Pencil and Paper stuff

1 Pencil and Paper stuff Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman

More information

plot(seq(0,10,1), seq(0,10,1), main = "the Title", xlim=c(1,20), ylim=c(1,20), col="darkblue");

plot(seq(0,10,1), seq(0,10,1), main = the Title, xlim=c(1,20), ylim=c(1,20), col=darkblue); R for Biologists Day 3 Graphing and Making Maps with Your Data Graphing is a pretty convenient use for R, especially in Rstudio. plot() is the most generalized graphing function. If you give it all numeric

More information

Data Science Essentials

Data Science Essentials Data Science Essentials Lab 2 Working with Summary Statistics Overview In this lab, you will learn how to use either R or Python to compute and understand the basics of descriptive statistics. Descriptive

More information

Standardized Tests: Best Practices for the TI-Nspire CX

Standardized Tests: Best Practices for the TI-Nspire CX The role of TI technology in the classroom is intended to enhance student learning and deepen understanding. However, efficient and effective use of graphing calculator technology on high stakes tests

More information

BIOL 417: Biostatistics Laboratory #3 Tuesday, February 8, 2011 (snow day February 1) INTRODUCTION TO MYSTAT

BIOL 417: Biostatistics Laboratory #3 Tuesday, February 8, 2011 (snow day February 1) INTRODUCTION TO MYSTAT BIOL 417: Biostatistics Laboratory #3 Tuesday, February 8, 2011 (snow day February 1) INTRODUCTION TO MYSTAT Go to the course Blackboard site and download Laboratory 3 MYSTAT Intro.xls open this file in

More information

Introduction to Stata: An In-class Tutorial

Introduction to Stata: An In-class Tutorial Introduction to Stata: An I. The Basics - Stata is a command-driven statistical software program. In other words, you type in a command, and Stata executes it. You can use the drop-down menus to avoid

More information

Making plots in R [things I wish someone told me when I started grad school]

Making plots in R [things I wish someone told me when I started grad school] Making plots in R [things I wish someone told me when I started grad school] Kirk Lohmueller Department of Ecology and Evolutionary Biology UCLA September 22, 2017 In honor of Talk Like a Pirate Day...

More information

Week 7: The normal distribution and sample means

Week 7: The normal distribution and sample means Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample

More information

Chapter 2: Descriptive Statistics: Tabular and Graphical Methods

Chapter 2: Descriptive Statistics: Tabular and Graphical Methods Chapter 2: Descriptive Statistics: Tabular and Graphical Methods Example 1 C2_1

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.

Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data. Outline Part 2: Lattice graphics ouglas ates University of Wisconsin - Madison and R evelopment ore Team Sept 08, 2010 Presenting data Scatter plots Histograms and density plots

More information

Lab 5, part b: Scatterplots and Correlation

Lab 5, part b: Scatterplots and Correlation Lab 5, part b: Scatterplots and Correlation Toews, Math 160, Fall 2014 November 21, 2014 Objectives: 1. Get more practice working with data frames 2. Start looking at relationships between two variables

More information

Introductory SAS example

Introductory SAS example Introductory SAS example STAT:5201 1 Introduction SAS is a command-driven statistical package; you enter statements in SAS s language, submit them to SAS, and get output. A fairly friendly user interface

More information

Data Visualization. Module 7

Data Visualization.  Module 7 Data Visualization http://datascience.tntlab.org Module 7 Today s Agenda A Brief Reminder to Update your Software A walkthrough of ggplot2 Big picture New cheatsheet, with some familiar caveats Geometric

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

Package sure. September 19, 2017

Package sure. September 19, 2017 Type Package Package sure September 19, 2017 Title Surrogate Residuals for Ordinal and General Regression Models An implementation of the surrogate approach to residuals and diagnostics for ordinal and

More information

An Introductory Guide to R

An Introductory Guide to R An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics

More information

Introduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives

Introduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives Introduction to R and R-Studio 2018-19 Toy Program #2 Basic Descriptives Summary The goal of this toy program is to give you a boiler for working with your own excel data. So, I m hoping you ll try!. In

More information

Technology Is For You!

Technology Is For You! Technology Is For You! Technology Department of Idalou ISD because we love learning! Tuesday, March 4, 2014 MICROSOFT EXCEL Useful website for classroom ideas: YouTube lessons for visual learners: http://www.alicechristie.org/edtech/ss/

More information

ggplot2 for beginners Maria Novosolov 1 December, 2014

ggplot2 for beginners Maria Novosolov 1 December, 2014 ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Our Changing Forests Level 2 Graphing Exercises (Google Sheets)

Our Changing Forests Level 2 Graphing Exercises (Google Sheets) Our Changing Forests Level 2 Graphing Exercises (Google Sheets) In these graphing exercises, you will learn how to use Google Sheets to create a simple pie chart to display the species composition of your

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

R Commander Tutorial

R Commander Tutorial R Commander Tutorial Introduction R is a powerful, freely available software package that allows analyzing and graphing data. However, for somebody who does not frequently use statistical software packages,

More information

SciGraphica. Tutorial Manual - Tutorials 1and 2 Version 0.8.0

SciGraphica. Tutorial Manual - Tutorials 1and 2 Version 0.8.0 SciGraphica Tutorial Manual - Tutorials 1and 2 Version 0.8.0 Copyright (c) 2001 the SciGraphica documentation group Permission is granted to copy, distribute and/or modify this document under the terms

More information