An Introduction to R 1.3 Some important practical matters when working with R Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop, 29-Apr-2015
Several boring tasks... Mundane but important things: Installing and loading packages Moving around the file system on the computer Loading and saving data, writing scripts
Several boring tasks... Mundane but important things: Installing and loading packages Moving around the file system on the computer Loading and saving data, writing scripts R or Rstudio? R has (of course) many tools for doing these tasks Rstudio has very good tools too, and easier ones These tasks aren t interesting data analysis stuff, so there s not much value in exploring the more powerful command line versions
Installing and loading packages
Packages What is a package? A collection of R functions and data sets that someone has contributed to the R ecosystem Packages extend the functionality of R: most of the value to R comes from the 5000+ packages out there
Packages What is a package? A collection of R functions and data sets that someone has contributed to the R ecosystem Packages extend the functionality of R: most of the value to R comes from the 5000+ packages out there Where do they come from? Most packages are distributed centrally via CRAN (comprehensive R archive network) There are lots of mirrors of CRAN. In Australia: CSIRO Canberra, University of Melbourne You can also get packages in other ways (not discussed)
Base R comes with about 30 packages >.packages(true) [1] "base" "boot" "class" "cluster" [5] "codetools" "compiler" "datasets" "foreign" [9] "graphics" "grdevices" "grid" "KernSmooth" [13] "lattice" "MASS" "Matrix" "methods" [17] "mgcv" "nlme" "nnet" "rpart" [21] "spatial" "splines" "stats" "stats4" [25] "survival" "tcltk" "tools" "utils" [29] "manipulate" (ignore the.packages command, I ll show an easier way in a moment)
After a while you end up with lots more >.packages(true) [1] "alr3" "base" "BayesFactor" "bitops" [5] "boot" "brew" "car" "class" [9] "cluster" "coda" "codetools" "coin" [13] "colorspace" "compiler" "datasets" "devtools" [17] "dichromat" "digest" "effects" "evaluate" [21] "ez" "foreign" "formatr" "Formula" [25] "ggplot2" "GPArotation" "graphics" "grdevices" [29] "grid" "gtable" "hexbin" "highr" [33] "Hmisc" "httr" "KernSmooth" "knitr" [37] "labeling" "lattice" "lavaan" "leaps"... [101] "stats" "stats4" "stringr" "survey" [105] "survival" "tcltk" "testit" "testthat" [109] "tools" "utils" "whisker" "XML" [113] "xtable" "zoo" (the only reason why there aren t more is that this is a new-ish machine, and I only install packages when I need them)
Rstudio has a nice display showing which packages you have installed
the package name
description of what it does
what version do you have installed?
click this button to uninstall the package
check to see if there are newer versions of your packages on CRAN
refresh the list (e.g., if you ve just done something new)
click this to install new packages (we ll come back to this)
is the package loaded? (check the box to load, uncheck to unload)
Terminology Installed means... That the package files are stored on your computer Your version of R is able to load the package
Terminology Installed means... That the package files are stored on your computer Your version of R is able to load the package Loaded means... That R has opened the package files (sort of), and now knows what they contain You can use the functions / data stored in the package
Terminology Installed means... That the package files are stored on your computer Your version of R is able to load the package Loaded means... That R has opened the package files (sort of), and now knows what they contain You can use the functions / data stored in the package The upshot of this: A package must be installed before you can load it A package must be loaded before you can use it
Why does it work like that??? R is big 5000+ packages means that different authors will use the same name to refer to different functions! e.g., there are several packages that define a logit() function.
Why does it work like that??? R is big 5000+ packages means that different authors will use the same name to refer to different functions! e.g., there are several packages that define a logit() function. Separating install from load avoids inconsistency: If every installed package were also loaded, it would introduce a lot of naming conflicts Install everything you might want to use sometime Load only those things you need to use now!
Let s load the MASS package
Let s load the MASS package library("mass", lib.loc="/library/frameworks/ R.framework/Versions/3.2/Resources/library") this pops up in the R console automatically... this is the actual loading command
Now let s load the Matrix package you don t usually have to specify the lib.loc bit: R knows where the default locations are, so you can just type the package name > library( "Matrix" ) Loading required package: lattice
Now let s load the Matrix package > library( "Matrix" ) Loading required package: lattice R keeps track of dependencies : some packages rely on content of other packages. So if you try to load package A, but it requires content from package B (which you don t have loaded), R will load package B too.
Try it yourself (Exercise 1.3.1)
How do I know if there s a conflict? > library( psych ) > library( car ) Attaching package: car psych and car both contain a function called logit(). When I load both packages, the more recently loaded one (car) takes precedence... The following object is masked from package:psych : logit
How do I know if there s a conflict? > library( psych ) > library( car ) Attaching package: car The following object is masked from package:psych : logit This is the warning message that R prints out. It says that logit exists in both packages... and that the version in psych is masked (i.e., you can t* access it)
Installing packages
Installing packages
Where to install from? (ignore this) Installing packages
Where to install to? (ignore this) Installing packages
Should dependencies be installed? Leave this checked, because the answer is almost always yes Installing packages
Which packages to install? Installing packages
Start typing, and notice that Rstudio gives a list of possible packages Installing packages
Installing packages > install.packages("psych") trying This URL is 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ the command that psych_1.3.10.12.tgz' appears in the R console Content type 'application/x-gzip' length 2687899 bytes (2.6 Mb) opened URL ================================================== downloaded 2.6 Mb The downloaded binary packages are in /var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/t//rtmpo8qt8n/ downloaded_packages
Installing packages > install.packages("psych") Where is it downloading from? trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ psych_1.3.10.12.tgz' Content type 'application/x-gzip' length 2687899 bytes (2.6 Mb) opened URL ================================================== downloaded 2.6 Mb The downloaded binary packages are in /var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/t//rtmpo8qt8n/ downloaded_packages
Installing packages > install.packages("psych") trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ psych_1.3.10.12.tgz' Content type 'application/x-gzip' length 2687899 bytes (2.6 Mb) opened URL ================================================== downloaded 2.6 Mb What gets down loaded? The downloaded binary packages are in /var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/t//rtmpo8qt8n/ downloaded_packages
Installing packages > install.packages("psych") trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ psych_1.3.10.12.tgz' Content type 'application/x-gzip' length 2687899 bytes (2.6 Mb) opened URL ================================================== downloaded 2.6 Mb The downloaded binary packages are in /var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/t//rtmpo8qt8n/ downloaded_packages Where did it store files?
Try it yourself (Exercise 1.3.2)
Loading a workspace (.Rdata) file
Workspace files The primary file format used by R is.rdata It is a saved workspace It contains whatever data sets, variables, functions etc that the workspace included when the file was created
Workspace files The primary file format used by R is.rdata It is a saved workspace It contains whatever data sets, variables, functions etc that the workspace included when the file was created How to load an.rdata file? Hard way: use the load() function manually Easy way #1: double click on the.rdata file in Finder/ Explorer, and (as long as Rstudio is the default application for Rdata files) it will load automatically Easy way #2: open using the Rstudio menus
Rstudio method for loading.rdata files This is the file open button
Rstudio method for loading.rdata files You can also use the File menu to do the same thing if you want to...
Opens a file open dialog box... It will look different on different operating systems... it will look like a familiar Windows thing on a Windows computer, a standard Mac thing on a Mac computer etc etc...
Browse for the file you want, and open: Clicking open will load the toydata.rdata file
And the data file is now loaded... load( ~/Work/Research/Rbook/workshop_dsto/datasets/toydata.Rdata") A command like this will appear in the R console (this command is what actually loaded the file)
And the data file is now loaded... load( ~/Work/Research/Rbook/workshop_dsto/datasets/toydata.Rdata") And the variable(s) that are stored in the file are now listed in the workspace
What does it mean to have data loaded? Loading means that you ve copied the variables in the.rdata file into your R workspace You can now use these variables for your analysis Deleting or changing variables in the workspace does not change the contents of the.rdata file i.e. R doesn t do autosave or anything of the sort This is a good thing!
Try it yourself (Exercise 1.3.3)
Saving a workspace file
Suppose you ve done some work and you want to save the workspace... I must have done something, there s all this new stuff in the workspace!
The save button is your friend
Again, it opens a system-specific dialog:
Browse, type a filename, and click save
Now the file is saved save.image( ~/Work/Research/Rbook/workshop_dsto/ datasets/toydata_modified.rdata.rdata") Again, the actual save command shows up in the R console
Try it yourself (Exercise 1.3.4)
Importing data from text files (CSV)
CSV is a standard universal format The raw data is just a plain text file: CSV stands for comma separated value
CSV is a standard universal format CSV files are often opened by spreadsheets, and produce tabular data like this...
CSV is a standard universal format > expt In R, a CSV file is imported as a data frame id age gender treatment hormone happy sad 1 1 25 male control 6.7 2.00 6.12 2 2 24 male drug1 38.5 3.36 3.53 3 3 25 male drug2 25.0 3.40 4.82 4 4 28 male control 98.4 5.69 0.34 5 5 23 male drug1 42.4 4.56 4.48 6 6 28 male drug2 20.3 2.89 4.57 7 7 25 female control 18.5 3.18 4.82 8 8 29 female drug1 65.2 4.78 2.24 9 9 21 female drug2 56.4 4.51 2.64 10 10 26 female control 55.7 3.90 2.71 11 11 19 female drug1 41.9 2.83 2.94 12 12 30 female drug2 54.1 3.45 1.87
Importing CSV data using Rstudio Click on this...
Importing CSV data using Rstudio And this...
Importing CSV data using Rstudio Once again, there s a standard dialog box that you can use to find and open the desired file...
This pops up
The raw data
The assumptions that R is using when it imports the data
What the data frame will look like if you import using these settings
The name of the data frame to create
Import when ready...
Rstudio opens a tab showing you the contents of the data frame you just imported
> toydata <- read.csv("~/work/research/rbook/ workshop_dsto/datasets/toydata.csv") > View(toydata) These are the actual R commands that Rstudio used to import the data
Try it yourself (Exercise 1.3.5)
Scripts: A great tool for storing the commands for your analyses
Background What do we know how do to? Load data from.rdata files and.csv files Type commands to get R to make output Save data / R output to.rdata files Install and load packages to extend R functionality
Background What do we know how do to? Load data from.rdata files and.csv files Type commands to get R to make output Save data / R output to.rdata files Install and load packages to extend R functionality What s missing? How to save a collection of R commands to run later i.e. scripts
Scripts What is an R script? R scripts are text files, and have a.r extension They contain a sequence of R commands that R will execute when the script is sourced (i.e., run)
Scripts What is an R script? R scripts are text files, and have a.r extension They contain a sequence of R commands that R will execute when the script is sourced (i.e., run) How do I use scripts? Type (or paste) R commands into the text file Save the script (usually in the same folder as the data) Use the source button to run it. Here s how...
Open an existing script? Click here
Click here Create a new script...
And here Create a new script...
Here s the new (empty) script
Type some R commands into the script:
This tells you that the script has some unsaved changes Type some R commands into the script:
These comments won t do anything, but it s a good idea to write extensive comments Type some R commands into the script:
These are the actual R commands that will do something! Type some R commands into the script:
Click here We should probably save the script!
Hey look, another standard dialog box... Save it as something like myscript.r...
Now to run the script... Click here
This is the R command that actually ran the script...
These two lines were executed first... and they created two variables
These two lines were executed first... and they created two variables
This line comes next...
And produced this output
Typical workflow Load data from CSV or Rdata file Play around with some analyses Copy some commands to a script Copy/paste helps, as does the history tab in Rstudio Often have separate scripts for different jobs Save the script(s) Save the variables created (and the raw data) to new Rdata file(s)
Try it yourself (Exercise 1.3.6)
Getting around the file system (briefly)
The working directory What is the working directory? Because R interacts with files on your computer, it thinks of itself as working in a particular location. Anytime you try to open a file without specifying a folder, R will look in the working directory. > getwd() [1] "/Users/dan"
Getting around the computer R commands getwd(), setwd() etc powerful but tedious to use Rstudio is cleaner Use the file panel to navigate graphically set as working directory option is equivalent to the R command setwd()
Initially, the file panel probably looks like this: a list of files and folders Using the file panel in Rstudio
Which folder are we currently looking at? This opens a Finder window (Mac) or Explorer window (Windows) which you can use to select a new folder to look at Clickable links 104
You can use the file panel to change the R working directory by clicking more, and then set as working directory
When you do, you ll see a setwd() command appear in the console, And the console panel itself will tell you what folder R is looking at...
End of this section