Introduction to R: Day 2 September 20, 2017
|
|
- Marilyn Shelton
- 5 years ago
- Views:
Transcription
1 Introduction to R: Day 2 September 20, 2017 Outline RStudio projects Base R graphics plotting one or two continuous variables customizable elements of plots saving plots to a file Create a new project for today First, let s see how RStudio projects work in practice. We saw in Part 1 how they let you divide your work into self-contained contexts, since each project has its own working directory, workspace, history, and source documents. Exercise: Projects 1. create a new RStudio project 2. create a couple of variables and plot them, e.g.: X <- rnorm(20, 3, 2) Y <- rnorm(20, X/2+2.5, 1.5) 3. quit RStudio ( yes to save) and start again. 4. switch back to the project we used on Day 1 5. switch back to today s project so we can work. Note the following happens in RStudio when switching to an existing project: restore the environment and history set the current working directory to the project directory. restore documents that were open in the editor tabs arrange panels where they were when the project was last open Environment/workspace Objects that you create by assignment make up your R workspace. The workspace is shown in the Environment tab in RStudio, and so the term environment is sometimes used, even though strictly speaking that s not [what environment means][r-environment] in the R programming language itself. But the meaning is usually clear from the context, and the latter won t normally be something you have to deal with. It s important to remember that the workspace exists only in your computer s memory: unless you save it to the disk, it will be gone once the R application exits (or you turn off the computer). Luckily, saving the workspace is simple, because R gives us the option to save it to disk when quitting or switching projects. When you choose to save the workspace, all objects in it are stored in a file named.rdata in R s current working directory. When starting R (or opening an RStudio project), the saved workspace is automatically restored from the.rdata file in the starting directory. So for the most part this is not something you need 1
2 to worry about just don t delete the.rdata file, unless you know how to recreate all the important objects in it. (And you should know, either from the history or because you saved your analysis code into an R script file.) Working Directory R has a notion of the working directory, which is the location where it will look for files to read or write if then don t contain the full directory information. (Kind of like the current directory in a terminal shell.) Initially, this is the location where R was started from, and if you re using RStudio projects, it s inside the project directory. So unless you choose to change your working directory, all your files will be stored in the project, which is exactly what you want. (You can set the working directory from the File pane, but avoid do so until you re more comfortable with R. You can always load a file from directories other than the working one by giving the correct path in the file name.) Switching the working directories, and more generally managing files from R, is a more advanced topic that we will cover in one of the later workshops on reproducible computing practices. Graphics Let s revisit the mtcars dataset from last session. mtcars ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX ## Mazda RX4 Wag ## Datsun ## Hornet 4 Drive ## Hornet Sportabout ## Valiant ## Duster ## Merc 240D ## Merc ## Merc ## Merc 280C ## Merc 450SE ## Merc 450SL ## Merc 450SLC ## Cadillac Fleetwood ## Lincoln Continental ## Chrysler Imperial ## Fiat ## Honda Civic ## Toyota Corolla ## Toyota Corona ## Dodge Challenger ## AMC Javelin ## Camaro Z ## Pontiac Firebird ## Fiat X ## Porsche ## Lotus Europa ## Ford Pantera L ## Ferrari Dino
3 ## Maserati Bora ## Volvo 142E This is a data.frame object, a table are rectangular objects where each column is a variable and each row is an observed unit. Data frames are particularly well suited to representing data being analyzed, and as such are the most common type of data used in functions used for statistical analysis. We saw on Monday many functions that can be used to learn about the shape and size of the data in the data frame: dim(mtcars) ## [1] str(mtcars) ## 'data.frame': 32 obs. of 11 variables: ## $ mpg : num ## $ cyl : num ## $ disp: num ## $ hp : num ## $ drat: num ## $ wt : num ## $ qsec: num ## $ vs : num ## $ am : num ## $ gear: num ## $ carb: num summary(mtcars) ## mpg cyl disp hp ## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 ## 1st Qu.: st Qu.: st Qu.: st Qu.: 96.5 ## Median :19.20 Median :6.000 Median :196.3 Median :123.0 ## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 ## 3rd Qu.: rd Qu.: rd Qu.: rd Qu.:180.0 ## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 ## drat wt qsec vs ## Min. :2.760 Min. :1.513 Min. :14.50 Min. : ## 1st Qu.: st Qu.: st Qu.: st Qu.: ## Median :3.695 Median :3.325 Median :17.71 Median : ## Mean :3.597 Mean :3.217 Mean :17.85 Mean : ## 3rd Qu.: rd Qu.: rd Qu.: rd Qu.: ## Max. :4.930 Max. :5.424 Max. :22.90 Max. : ## am gear carb ## Min. : Min. :3.000 Min. :1.000 ## 1st Qu.: st Qu.: st Qu.:2.000 ## Median : Median :4.000 Median :2.000 ## Mean : Mean :3.688 Mean :2.812 ## 3rd Qu.: rd Qu.: rd Qu.:4.000 ## Max. : Max. :5.000 Max. :8.000 head(mtcars) ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX ## Mazda RX4 Wag
4 ## Datsun ## Hornet 4 Drive ## Hornet Sportabout ## Valiant Then there are the functions that calculate various descriptive statistics, like the mean and standard deviation of the variables, either individually or all at once: mean(mtcars$mpg) ## [1] apply(mtcars, 2, sd) ## mpg cyl disp hp drat wt ## ## qsec vs am gear carb ## median(mtcars$mpg) ## [1] 19.2 Just as importantly, the early stages of data exploration should also involve a lot of graphing, because our visual system can very quickly process the information displayed in a way that is just not possible in a tabular form. We saw already how to create a boxplot in R. Let s try it on the mpg variable from the mtcars dataset: boxplot(mtcars$mpg)
5 Recall that the dollar sign $ can be used to access individual variables in a data frame, so mtcars$mpg is how I get the value of the mpg variable. This particular plot gives us some sense of the shape of the mpg values, but boxplots really come into their own when comparing different groups. There are two ways this can be done: by having the values for each group in a different argument, for example Xs and Ys from the first exercise: boxplot(x, Y) However, we can also use a more concise notation when the grouping variable is already part of the dataset. With mtcars, we have such a grouping variable in cyl, indicating the number of cylinders in the engine. In this case, we can write: boxplot(mpg ~ cyl, data = mtcars) 5
6 Notice the mpg ~ cyl. This is called a formula in R, and is used by many functions in a (fairly) consistent way. Just remember that to the right side of tilde are the X variables (i.e., predictors aka independent variables), while the left-hand side is the Y (response, or dependent variable). I think you ll agree that the formula notation makes it easier to see what is being plotted, since it focuses on looking at your data as variables playing a certain role in the statistical model. We will stick with it whenever possible. Exercise: Boxplots Create a boxplot of fuel efficiency (mpg) vs. transmission type (am), using the data in mtcars. Customizing the plot This is not a terribly pretty graphic, and certainly not good enough for a publication. We briefly saw on Monday that various aspects of the plot can be modified by specifying optional arguments. There are many such arguments in the R graphing system, but we will focus on only the most common ones. These graphics parameters are shared by the various plotting functions, so you only have to learn them once. Most commonly, we label the plot to make it more informative to the reader: boxplot(mpg ~ cyl, data = mtcars, ylab = "Cylinders", main = "Fuel efficiency by engine cylinders") 6
7 Fuel efficiency by engine cylinders Cylinders We can also change the size of the font used to annotate the plot ( cex. plus the element being annotated, such as main for the title): boxplot(mpg ~ cyl, data = mtcars, ylab = "Cylinders", main = "Fuel efficiency by engine cylinders", cex.lab = 1.5, cex.main = 2, cex.axis =.8) 7
8 Fuel efficiency by engine cylinders Cylinders There are also several arguments that control specifically the way the boxplot is drawn: the length of the whiskers ( range ), whether the outliers are drawn ( outline ), whether to notch the side of the boxes at median ( notch ). Example: boxplot(mpg ~ cyl, data = mtcars, outline = FALSE, range = 1) 8
9 Lastly, we can make the boxplot extend horizontally, with the horizontal argument: boxplot(mpg ~ cyl, data = mtcars, horizontal = TRUE) 9
10 Note that in this case you may also want to rotate the axis tick labes so that they re horizontal (with las = 1 ): boxplot(mpg ~ cyl, data = mtcars, horizontal = TRUE, las = 1) 10
11 Exercise: Modifying boxplots Add the plot title and both axis labels to the boxplot you produced in our last exercise. Increase the size of the plot title, and make the whiskers extend the full range of the data. Histograms To get a fuller picture of the distribution of observed values, we use histograms. This plot type divides the entire range of values into a series of intervals, and then shows the count of how many values fall into each interval by the height of the interval s bar. We create histograms using the hist functions. Unfortunately, this function doesn t accept a formula as input; you have to give it a vector of observations: hist(mtcars$mpg) 11
12 Histogram of mtcars$mpg Frequency mtcars$mpg By default, R uses equally spaced breaks to divide the data into bins and plot the counts of points falling into each bin. (If not specified by the user, the number of bins is determined by one of a number of available algorithms, and depends on various aspects of the input data. The default calculation uses only the number of observations to determine the number of breaks.) One thing to keep in mind with histograms is that the choice of the number of bins can greatly determine the overall shape of the histogram and potentially mask important aspects of the data. So it s always good to try plotting the histograms with a couple of different numbers of bins. For the hist function, this is done with the argument breaks : hist(mtcars$mpg, breaks=10) 12
13 Histogram of mtcars$mpg Frequency mtcars$mpg Exercise: Histograms Plot a histogram of engine size (displacement) for data in mtcars. Try varying the number of bins. What do you see? Note that if the number of breaks given doesn t allow the breakpoints to fall nicely on on pretty values (for some R-defined notion of nice and pretty ), you may end up with fewer bins than requested: hist(mtcars$mpg, breaks=8) 13
14 Histogram of mtcars$mpg Frequency mtcars$mpg If you really want to have 8 bins in this case, you will have to specify the breakpoints explicitly: hist(mtcars$mpg, breaks=c(10, 14, 16, 18, 21, 24, 28, 30, 34)) 14
15 Histogram of mtcars$mpg Density mtcars$mpg Notice that you don t have to have evenly spaced breaks, although they usually are. If you need to use the exact number of evenly-spaced bins (let s say 8), you can use the seq function, giving one more than the desired number of bins in the length argument: hist(mtcars$mpg, breaks = seq(10, 35, length=9)) 15
16 Histogram of mtcars$mpg Frequency mtcars$mpg If it bothers you that the tails of the histogram hang outside the drawn X-axis, you can control the axis s range with the xlim argument. Just give it the pair of the lower and upper endpoint of the axis: hist(mtcars$mpg, breaks=10, xlim=c(0, 35)) 16
17 Histogram of mtcars$mpg Frequency mtcars$mpg Also, all the usual ways to annotate the plot are available, main, xlab, ylab, etc. (You cannot make the histogram draw horizontally, unfortunately.) hist(mtcars$mpg, main="fuel efficiency (mpg)") 17
18 Fuel efficiency (mpg) Frequency mtcars$mpg Dot charts If you want to plot a single variable but see all the individual cases, a dot chart is a good option: dotchart(mtcars$mpg) 18
19 Usually, you will want to label the cases: dotchart(mtcars$mpg, rownames(mtcars)) 19
20 Mazda Mazda Datsun Hornet Hornet Valiant Duster Merc Merc Merc Merc Merc Merc Merc Cadillac Lincoln Chrysler Fiat Honda Toyota Toyota Dodge AMC Camaro Pontiac Fiat Porsche Lotus Ford Ferrari Maserati Volvo 142E Pantera Dino Bora Europa L X Firebird Javelin Z28 Corolla Corona Challenger 128 Civic Continental Imperial 240D C 450SE 450SL 450SLC Fleetwood Sportabout RX4 RX4 710 Drive Wag If the labels don t look right, you know how to modify them: with the cex.axis argument! Except that here it is called cex. (Base graphics is sometimes inconsistent, as you may have noticed.) dotchart(mtcars$mpg, rownames(mtcars), cex =.8) 20
21 Volvo 142E Maserati Bora Ferrari Dino Ford Pantera L Lotus Europa Porsche Fiat X1 9 Pontiac Firebird Camaro Z28 AMC Javelin Dodge Challenger Toyota Corona Toyota Corolla Honda Civic Fiat 128 Chrysler Imperial Lincoln Continental Cadillac Fleetwood Merc 450SLC Merc 450SL Merc 450SE Merc 280C Merc 280 Merc 230 Merc 240D Duster 360 Valiant Hornet Sportabout Hornet 4 Drive Datsun 710 Mazda RX4 Wag Mazda RX Breaking the cases into groups is also possible, with the groups argument: dotchart(mtcars$mpg, rownames(mtcars), groups = mtcars$cyl, cex =.6) 21
22 Volvo 142E Lotus Europa Porsche Fiat X1 9 Toyota Corona Toyota Corolla Honda Civic Fiat 128 Merc 230 Merc 240D Datsun 710 Ferrari Dino Merc 280C Merc 280 Valiant Hornet 4 Drive Mazda RX4 Wag Mazda RX4 Maserati Bora Ford Pantera L Pontiac Firebird Camaro Z28 AMC Javelin Dodge Challenger Chrysler Imperial Lincoln Continental Cadillac Fleetwood Merc 450SLC Merc 450SL Merc 450SE Duster 360 Hornet Sportabout Why can t this be done with a formula? Good question. Most likely, this function was created before the formula interface was introduced, and in a way that makes it very difficult to add the formula support at this point. But there are alternatives in other packages that do just that. (There almost always are, with R. For example, see dotplot in package lattice, although it does involve a whole new graphics framework, which is very different to work with from base graphics that we re exploring here. Or wait until the workshop on ggplot2 in November, which is very flexible and currently the most popular way to produce graphics in R.) lattice::dotplot(rownames(mtcars)~mpg, data=mtcars, groups=cyl) 22
23 Volvo 142E Valiant Toyota Corona Toyota Corolla Porsche Pontiac Firebird Merc 450SLC Merc 450SL Merc 450SE Merc 280C Merc 280 Merc 240D Merc 230 Mazda RX4 Wag Mazda RX4 Maserati Bora Lotus Europa Lincoln Continental Hornet Sportabout Hornet 4 Drive Honda Civic Ford Pantera L Fiat X1 9 Fiat 128 Ferrari Dino Duster 360 Dodge Challenger Datsun 710 Chrysler Imperial Camaro Z28 Cadillac Fleetwood AMC Javelin mpg Exercise: Dot charts Create a dot chart of weights (wt) for mtcars, grouped by number of cylinders. Add labels for the x axis and title. Scatterplots When you want to plot two continuous variables, you probably want to use a scatterplot. In base R graphics, this is done with the plot function: plot(mpg~disp, data = mtcars) 23
24 mpg disp Exercise: Customizing scatterplot labels Create a scatterplot of fuel consumption (mpg) vs. car weight (wt) from mtcars. Add title and axis labels to the plot. In addition, there is a range of plotting parameters that you may want to modify. The style of the points and their colour are the most common, and are controlled with the pch and col argument to plot. This can be useful to distinguish different groups in the single plot. plot(mpg~disp, data = mtcars, col = cyl) 24
25 mpg disp The coding of pch is very contra-intuitive. For instance, the value of 19 will produce a solid circle, while 9 is an asterisk. See the help page for points for different values, or consult this cheat sheet. Creating a matrix of scatterplots When dealing with multi-dimensional data, you may wish to explore relationships between many pairs of variables. With R, you can do so for many variables at once using the pairs function. The function accepts a formula interface, where on the right-hand side you include all the variables you re interested in, joined with + signs: pairs(~mpg+wt+disp, data=mtcars) 25
26 mpg wt disp Line plot Often, the data contain repeated measurements, or even a time series. BOD ## Time demand ## ## ## ## ## ## plot(demand ~ Time, data = BOD) 26
27 demand Time In those cases, we may want to emphasize the continuity by connecting the data points with a line. This is done by giving argument type = "l" to plot: plot(demand ~ Time, data = BOD, type = "l") 27
28 demand Time Similarly to scatter plots, we can control the colour and type of the line being drawn with arguments to plot: plot(demand ~ Time, data = BOD, type = "l", lty=2, lwd = 3.5, col='red') 28
29 demand Time Combining multiple line plots in a single graph Calling lines multiple times simply adds another line to the current plot for each call. Here is an example where repeated measurements of multiple subjects are drawn as a scatterplot showing all the data at once. Then, lines connecting the measurements of four subjects are drawn one at a time ( subset = Subject == 1 ), each in a different colour: plot(conc~time, data = Theoph) lines(conc~time, data = Theoph, subset = Subject == 1) lines(conc~time, data = Theoph, subset = Subject == 2, col='red') lines(conc~time, data = Theoph, subset = Subject == 3, col='blue') lines(conc~time, data = Theoph, subset = Subject == 4, col='darkgreen') 29
30 conc Time Adding straight lines to the plot Often time, we want to add to the plot one or more straight lines to provide some kind of comparison with or reference to the plotted/observed points. For instance: coordinate axes, mean value of cases in particular groups, or the line of best fit. While this can be done by providing the precise coordinates for lines, there is a simpler way with abline. This function can draw horizontal and vertical lines by providing a single argument: plot(mpg ~ disp, data = mtcars) abline(h = 20) # mean fuel consumption abline(v = median(mtcars$disp)) 30
31 mpg disp Two other arguments, a and b, are used to draw angled lines, by giving the intercept (a) and slope (b) of the line. Exercise: adding a line Draw a scatterplot of fuel consumption (mpg) vs. weight (wt) for mtcars, and add a red line for the least squares estimate with slope of and intercept at Legends When plotting different groups, it can be useful to include the legend to help the reader match the plotted colours/symbols with the groups. With base graphics, we do this with the legend function, which takes in the X and Y coordinate of the top-left corner of the legend, the text of each of the legend s items, as well as a symbol/colour matching it with the plot. For instance: boxplot(mpg~cyl, data=mtcars, col=c('steelblue', 'darkgreen', 'darkred')) legend(3, 35, c('4 cyl', '6 cyl', '8 cyl'), fill=c('steelblue', 'darkgreen', 'darkred')) 31
32 cyl 6 cyl 8 cyl (The X-coordinate is 3, because the boxes for the three groups are drawn at one-unit increments along the X-axis i.e., 1, 2, and 3 despite the axis tick labels saying 4, 6, and 8. A different plot type, say a scatterplot, uses the actual range of the data values.) Exercise: Legends Add a legend to the scatterplot from the last exercise to provide the explanation of what the red line is. In this case, we have just a single item in the legend, but we want to provide an additional symbol matching the label with the red line in the plot. We do this by specifying the color (i.e., red: col="red" ) and the style of the symbol (i.e., a line: lwd=1). plot(mpg ~ wt, data = mtcars) abline(a=37.285, b=-5.34, col="red") legend(4.5, 33, "Best fit", col="red", lwd=1) 32
33 mpg Best fit wt 33
Introduction for heatmap3 package
Introduction for heatmap3 package Shilin Zhao April 6, 2015 Contents 1 Example 1 2 Highlights 4 3 Usage 5 1 Example Simulate a gene expression data set with 40 probes and 25 samples. These samples are
More informationGraphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley
Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Base Graphics 2 Graphics in R Traditional
More informationBasic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1)
Basic R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-3_basic_r.html#(1) 1/21 Preliminary R is case sensitive: a is not the same as A.
More informationQuick Guide for pairheatmap Package
Quick Guide for pairheatmap Package Xiaoyong Sun February 7, 01 Contents McDermott Center for Human Growth & Development The University of Texas Southwestern Medical Center Dallas, TX 75390, USA 1 Introduction
More informationChapter 7. The Data Frame
Chapter 7. The Data Frame The R equivalent of the spreadsheet. I. Introduction Most analytical work involves importing data from outside of R and carrying out various manipulations, tests, and visualizations.
More informationObjects, Class and Attributes
Objects, Class and Attributes Introduction to objects classes and attributes Practically speaking everything you encounter in R is an object. R has a few different classes of objects. I will talk mainly
More informationIntro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington
Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationenote 1 1 enote 1 Introduction to R Updated: 01/02/16 kl. 16:10
enote 1 1 enote 1 Introduction to R Updated: 01/02/16 kl. 16:10 enote 1 INDHOLD 2 Indhold 1 Introduction to R 1 1.1 Getting started with R and Rstudio....................... 3 1.1.1 Console and scripts............................
More informationThis is a simple example of how the lasso regression model works.
1 of 29 5/25/2016 11:26 AM This is a simple example of how the lasso regression model works. save.image("backup.rdata") rm(list=ls()) library("glmnet") ## Loading required package: Matrix ## ## Attaching
More informationR Visualizing Data. Fall Fall 2016 CS130 - Intro to R 1
R Visualizing Data Fall 2016 Fall 2016 CS130 - Intro to R 1 mtcars Data Frame R has a built-in data frame called mtcars Useful R functions length(object) # number of variables str(object) # structure of
More informationIntroduction to Huxtable David Hugh-Jones
Introduction to Huxtable David Hugh-Jones 2018-01-01 Contents Introduction 2 About this document............................................ 2 Huxtable..................................................
More informationIntroduction to R Software
1. Introduction R is a free software environment for statistical computing and graphics. It is almost perfectly compatible with S-plus. The only thing you need to do is download the software from the internet
More informationWill Landau. January 24, 2013
Iowa State University January 24, 2013 Iowa State University January 24, 2013 1 / 30 Outline Iowa State University January 24, 2013 2 / 30 statistics: the use of plots and numerical summaries to describe
More informationgetting started in R
Garrick Aden-Buie // Friday, March 25, 2016 getting started in R 1 / 70 getting started in R Garrick Aden-Buie // Friday, March 25, 2016 INFORMS Code & Data Boot Camp Today we ll talk about Garrick Aden-Buie
More informationComputing with large data sets
Computing with large data sets Richard Bonneau, spring 009 (week ): introduction to R other notes, courses, lectures about R and S Ingo Ruczinski and Rafael Irizarry (Johs Hopkins Biostat): http://www.biostat.jhsph.edu/~bcaffo/statcomp/index.html
More informationGetting started with ggplot2
Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for
More informationRegression Models Course Project Vincent MARIN 28 juillet 2016
Regression Models Course Project Vincent MARIN 28 juillet 2016 Executive Summary "Is an automatic or manual transmission better for MPG" "Quantify the MPG difference between automatic and manual transmissions"
More informationAz R adatelemzési nyelv
Az R adatelemzési nyelv alapjai II. Egészségügyi informatika és biostatisztika Gézsi András gezsi@mit.bme.hu Functions Functions Functions do things with data Input : function arguments (0,1,2, ) Output
More informationThe Tidyverse BIOF 339 9/25/2018
The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,
More informationWEEK 13: FSQCA IN R THOMAS ELLIOTT
WEEK 13: FSQCA IN R THOMAS ELLIOTT This week we ll see how to run qualitative comparative analysis (QCA) in R. While Charles Ragin provides a program on his website for running QCA, it is not able to do
More informationSpring 2017 CS130 - Intro to R 1 R VISUALIZING DATA. Spring 2017 CS130 - Intro to R 2
Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA Spring 2017 Spring 2017 CS130 - Intro to R 2 Goals for this lecture: Review constructing Data Frame, Categorizing variables Construct basic graph, learn
More informationPractical 2: Plotting
Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory
More informationStat 241 Review Problems
1 Even when things are running smoothly, 5% of the parts produced by a certain manufacturing process are defective. a) If you select parts at random, what is the probability that none of them are defective?
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationSurvey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9
Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2
More informationThe xtablelist Gallery. Contents. David J. Scott. January 4, Introduction 2. 2 Single Column Names 7. 3 Multiple Column Names 9.
The xtablelist Gallery David J. Scott January 4, 2018 Contents 1 Introduction 2 2 Single Column Names 7 3 Multiple Column Names 9 4 lsmeans 12 1 1 Introduction This document represents a test of the functions
More informationLAB #1: DESCRIPTIVE STATISTICS WITH R
NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab
More informationDSCI 325: Handout 18 Introduction to Graphics in R
DSCI 325: Handout 18 Introduction to Graphics in R Spring 2016 This handout will provide an introduction to creating graphics in R. One big advantage that R has over SAS (and over several other statistical
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationplot(seq(0,10,1), seq(0,10,1), main = "the Title", xlim=c(1,20), ylim=c(1,20), col="darkblue");
R for Biologists Day 3 Graphing and Making Maps with Your Data Graphing is a pretty convenient use for R, especially in Rstudio. plot() is the most generalized graphing function. If you give it all numeric
More informationThe first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.
Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you
More informationMixed models in R using the lme4 package Part 2: Lattice graphics
Mixed models in R using the lme4 package Part 2: Lattice graphics Douglas Bates University of Wisconsin - Madison and R Development Core Team University of Lausanne July 1,
More informationAdvanced Econometric Methods EMET3011/8014
Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer
More informationMBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R
MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic
More informationGraphics #1. R Graphics Fundamentals & Scatter Plots
Graphics #1. R Graphics Fundamentals & Scatter Plots In this lab, you will learn how to generate customized publication-quality graphs in R. Working with R graphics can be done as a stepwise process. Rather
More informationCreate Awesome LaTeX Table with knitr::kable and kableextra Hao Zhu
Create Awesome LaTeX Table with knitr::kable and kableextra Hao Zhu 2017-10-31 Contents Overview 2 Installation 2 Getting Started 2 LaTeX packages used in this package...................................
More informationStatistics 251: Statistical Methods
Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics
More informationIST 3108 Data Analysis and Graphics Using R Week 9
IST 3108 Data Analysis and Graphics Using R Week 9 Engin YILDIZTEPE, Ph.D 2017-Spring Introduction to Graphics >y plot (y) In R, pictures are presented in the active graphical device or window.
More informationIntroduction to Minitab 1
Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,
More informationAn Introduction to R Graphics
An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost
More information1 Introduction to Using Excel Spreadsheets
Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)
More informationAn Introduction to R 2.2 Statistical graphics
An Introduction to R 2.2 Statistical graphics Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop, 29-Apr-2015 Scatter plots
More informationAn introduction to plotting data
An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a
More informationPlotting Graphs. Error Bars
E Plotting Graphs Construct your graphs in Excel using the method outlined in the Graphing and Error Analysis lab (in the Phys 124/144/130 laboratory manual). Always choose the x-y scatter plot. Number
More informationComputing With R Handout 1
Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution
More informationCreate Awesome LaTeX Table with knitr::kable and kableextra
Create Awesome LaTeX Table with knitr::kable and kableextra Hao Zhu 2018-01-15 Contents Overview 3 Installation 3 Getting Started 3 LaTeX packages used in this package...................................
More informationOutline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.
Outline Part 2: Lattice graphics ouglas ates University of Wisconsin - Madison and R evelopment ore Team Sept 08, 2010 Presenting data Scatter plots Histograms and density plots
More informationENV Laboratory 2: Graphing
Name: Date: Introduction It is often said that a picture is worth 1,000 words, or for scientists we might rephrase it to say that a graph is worth 1,000 words. Graphs are most often used to express data
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationAdvances in integrating statistical inference
Nicos Angelopoulos 1 Samer Abdallah 2 and Georgios Giamas 1 1 Department of Surgery and Cancer, Division of Cancer, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 ONN, UK.
More informationSolution to Tumor growth in mice
Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly
More informationData Visualization. Andrew Jaffe Instructor
Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data
More informationBasics of Plotting Data
Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based
More informationStatistical Programming with R
Statistical Programming with R Lecture 9: Basic graphics in R Part 2 Bisher M. Iqelan biqelan@iugaza.edu.ps Department of Mathematics, Faculty of Science, The Islamic University of Gaza 2017-2018, Semester
More informationComputing With R Handout 1
Computing With R Handout 1 The purpose of this handout is to lead you through a simple exercise using the R computing language. It is essentially an assignment, although there will be nothing to hand in.
More informationVisual Analytics. Visualizing multivariate data:
Visual Analytics 1 Visualizing multivariate data: High density time-series plots Scatterplot matrices Parallel coordinate plots Temporal and spectral correlation plots Box plots Wavelets Radar and /or
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationModule 10. Data Visualization. Andrew Jaffe Instructor
Module 10 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots on Wednesday, but we are going to expand the ability to customize these basic graphics first. 2/37 But first...
More informationLogical operators: R provides an extensive list of logical operators. These include
meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few
More information= 3 + (5*4) + (1/2)*(4/2)^2.
Physics 100 Lab 1: Use of a Spreadsheet to Analyze Data by Kenneth Hahn and Michael Goggin In this lab you will learn how to enter data into a spreadsheet and to manipulate the data in meaningful ways.
More informationApplied Calculus. Lab 1: An Introduction to R
1 Math 131/135/194, Fall 2004 Applied Calculus Profs. Kaplan & Flath Macalester College Lab 1: An Introduction to R Goal of this lab To begin to see how to use R. What is R? R is a computer package for
More informationStat 290: Lab 2. Introduction to R/S-Plus
Stat 290: Lab 2 Introduction to R/S-Plus Lab Objectives 1. To introduce basic R/S commands 2. Exploratory Data Tools Assignment Work through the example on your own and fill in numerical answers and graphs.
More informationR Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky
R Workshop Module 3: Plotting Data Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky October 15, 2013 Reading in Data Start by reading the dataset practicedata.txt
More informationLastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.
Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means
More informationAccessing Databases from R
user Vignette: Accessing Databases from R Greater Boston user Group May, 20 by Jeffrey Breen jbreen@cambridge.aero Photo from http://en.wikipedia.org/wiki/file:oracle_headquarters_redwood_shores.jpg Outline
More informationHomework 1 Excel Basics
Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the
More informationTips and Guidance for Analyzing Data. Executive Summary
Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to
More informationChapter 2 - Graphical Summaries of Data
Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationMATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data
MATH& 146 Lesson 10 Section 1.6 Graphing Numerical Data 1 Graphs of Numerical Data One major reason for constructing a graph of numerical data is to display its distribution, or the pattern of variability
More informationHOUR 12. Adding a Chart
HOUR 12 Adding a Chart The highlights of this hour are as follows: Reasons for using a chart The chart elements The chart types How to create charts with the Chart Wizard How to work with charts How to
More informationlimma: A brief introduction to R
limma: A brief introduction to R Natalie P. Thorne September 5, 2006 R basics i R is a command line driven environment. This means you have to type in commands (line-by-line) for it to compute or calculate
More informationINTRODUCTION TO R. Basic Graphics
INTRODUCTION TO R Basic Graphics Graphics in R Create plots with code Replication and modification easy Reproducibility! graphics package ggplot2, ggvis, lattice graphics package Many functions plot()
More informationHere is the data collected.
Introduction to Scientific Analysis of Data Using Spreadsheets. Computer spreadsheets are very powerful tools that are widely used in Business, Science, and Engineering to perform calculations and record,
More informationDemo yeast mutant analysis
Demo yeast mutant analysis Jean-Yves Sgro February 20, 2018 Contents 1 Analysis of yeast growth data 1 1.1 Set working directory........................................ 1 1.2 List all files in directory.......................................
More informationLab 5, part b: Scatterplots and Correlation
Lab 5, part b: Scatterplots and Correlation Toews, Math 160, Fall 2014 November 21, 2014 Objectives: 1. Get more practice working with data frames 2. Start looking at relationships between two variables
More informationMetropolis. A modern beamer theme. Matthias Vogelgesang October 12, Center for modern beamer themes
Metropolis A modern beamer theme Matthias Vogelgesang October 12, 2018 Center for modern beamer themes Introduction Title formats Elements Conclusion 2 Introduction 3 Metropolis The metropolis theme is
More informationThis lesson is designed to improve students
NATIONAL MATH + SCIENCE INITIATIVE Mathematics g x 8 6 4 2 0 8 6 4 2 y h x k x f x r x 8 6 4 2 0 8 6 4 2 2 2 4 6 8 0 2 4 6 8 4 6 8 0 2 4 6 8 LEVEL Algebra or Math in a unit on function transformations
More informationChapel Hill Math Circle: Symmetry and Fractals
Chapel Hill Math Circle: Symmetry and Fractals 10/7/17 1 Introduction This worksheet will explore symmetry. To mathematicians, a symmetry of an object is, roughly speaking, a transformation that does not
More informationStatistical Graphics
Idea: Instant impression Statistical Graphics Bad graphics abound: From newspapers, magazines, Excel defaults, other software. 1 Color helpful: if used effectively. Avoid "chartjunk." Keep level/interests
More informationChapter 3 - Displaying and Summarizing Quantitative Data
Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative
More informationLab 1: Introduction, Plotting, Data manipulation
Linear Statistical Models, R-tutorial Fall 2009 Lab 1: Introduction, Plotting, Data manipulation If you have never used Splus or R before, check out these texts and help pages; http://cran.r-project.org/doc/manuals/r-intro.html,
More informationExcel 2013 Intermediate
Instructor s Excel 2013 Tutorial 2 - Charts Excel 2013 Intermediate 103-124 Unit 2 - Charts Quick Links Chart Concepts Page EX197 EX199 EX200 Selecting Source Data Pages EX198 EX234 EX237 Creating a Chart
More informationUsing Built-in Plotting Functions
Workshop: Graphics in R Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky September 15, 2016 Using Built-in Plotting Functions ## Plotting One Quantitative
More informationYou are to turn in the following three graphs at the beginning of class on Wednesday, January 21.
Computer Tools for Data Analysis & Presentation Graphs All public machines on campus are now equipped with Word 2010 and Excel 2010. Although fancier graphical and statistical analysis programs exist,
More informationCharts in Excel 2003
Charts in Excel 2003 Contents Introduction Charts in Excel 2003...1 Part 1: Generating a Basic Chart...1 Part 2: Adding Another Data Series...3 Part 3: Other Handy Options...5 Introduction Charts in Excel
More informationIntro to R for Epidemiologists
Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which
More informationBasic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19
Basic Statistical Graphics in R. Stem and leaf plots Example. Create a vector of data titled exam containing the following scores: 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationAn Introductory Guide to R
An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics
More informationDecimals should be spoken digit by digit eg 0.34 is Zero (or nought) point three four (NOT thirty four).
Numeracy Essentials Section 1 Number Skills Reading and writing numbers All numbers should be written correctly. Most pupils are able to read, write and say numbers up to a thousand, but often have difficulty
More informationMicrosoft Excel 2007
Microsoft Excel 2007 1 Excel is Microsoft s Spreadsheet program. Spreadsheets are often used as a method of displaying and manipulating groups of data in an effective manner. It was originally created
More informationOur Changing Forests Level 2 Graphing Exercises (Google Sheets)
Our Changing Forests Level 2 Graphing Exercises (Google Sheets) In these graphing exercises, you will learn how to use Google Sheets to create a simple pie chart to display the species composition of your
More informationPractical 1P1 Computing Exercise
Practical 1P1 Computing Exercise What you should learn from this exercise How to use the teaching lab computers and printers. How to use a spreadsheet for basic data analysis. How to embed Excel tables
More informationExercise 2.23 Villanova MAT 8406 September 7, 2015
Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations
More informationStatistical transformations
Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn
More informationLab #7 - More on Regression in R Econ 224 September 18th, 2018
Lab #7 - More on Regression in R Econ 224 September 18th, 2018 Robust Standard Errors Your reading assignment from Chapter 3 of ISL briefly discussed two ways that the standard regression inference formulas
More informationR Demonstration Summary Statistics and the Law of Large Numbers
R Demonstration Summary Statistics and the Law of Large Numbers Objective: The purpose of this session is to use some of the R functionality you have recently learned to demonstrate the Law of Large Numbers.
More information