An Introduction to R 1.3 Some important practical matters when working with R

Similar documents
An Introduction to R 1.1 Getting started

8.1 Come analizzare i dati: R

An Introduction to R 2.2 Statistical graphics

Instruction: Download and Install R and RStudio

Module 1: Introduction RStudio

Introduction to R. base -> R win32.exe (this will change depending on the latest version)

Adding content to your Blackboard 9.1 class

RMassBank for XCMS. Erik Müller. January 4, Introduction 2. 2 Input files LC/MS data Additional Workflow-Methods 2

Overview of R. Biostatistics

A whirlwind introduction to using R for your research

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

LAB #1: DESCRIPTIVE STATISTICS WITH R

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Reproducible Homerange Analysis

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction

Introduction to R. 1 Introduction 2. 2 What You Need 2

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Project Collaboration

GIS LAB 1. Basic GIS Operations with ArcGIS. Calculating Stream Lengths and Watershed Areas.

Google Drive: Access and organize your files

Matlab for FMRI Module 1: the basics Instructor: Luis Hernandez-Garcia

Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

Computer lab 2 Course: Introduction to R for Biologists

Creating an HTML file (Mac)

Using Dreamweaver CS6

Lab 1: Getting started with R and RStudio Questions? or

Using Dreamweaver CC. 3 Basic Page Editing. Planning. Viewing Different Design Styles

7/2/2013 R packaging with Rstudio Topics:

Customizing DAZ Studio

SISG/SISMID Module 3

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

Tutorial: SeqAPass Boxplot Generator

Introduction to R Commander

Getting and Cleaning Data. Biostatistics

Word: Print Address Labels Using Mail Merge

How Your First Program Works

1/22/2018. Multivariate Applications in Ecology (BSC 747) Ecological datasets are very often large and complex

Getting Started With Squeeze Server

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch

APPM 2460 Matlab Basics

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Workshop. Import Workshop

Introduction to Stata - Session 1

Every project requires communication and collaboration and usually a lot of

CSCU9B2 Practical 1: Introduction to HTML 5

UTILITY FUNCTIONS IN R

CSCI 1100L: Topics in Computing Lab Lab 1: Introduction to the Lab! Part I

2007, 2008 FileMaker, Inc. All rights reserved.

R Website R Installation and Folder R Packages R Documentation R Search R Workspace Interface R Common and Important Basic Commands

STAT 213: R/RStudio Intro

A Step by Step Guide to Postcard Marketing Success

Knowledgebase Article. Queue Member Report. BMC Remedyforce

How to Download and Install R The R software can be downloaded from: Click on download R link.

Digging into File Formats: Poking around at data using file, DROID, JHOVE, and more

POL 345: Quantitative Analysis and Politics

One of Excel 2000 s distinguishing new features relates to sharing information both

Code Plug Management: Contact List Import/Export. Version 1.0, Dec 16, 2015

Retrospect 8 for Windows Reviewer s Guide

Furl Furled Furling. Social on-line book marking for the masses. Jim Wenzloff Blog:

MIGSA: Getting pbcmc datasets

Instructions for Using the Databases

CS 1301 Fall 2008 Lab 2 Introduction to UNIX

The hi4em Dummies Guide To Business Objects InfoView

MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 15) Creating Interactive Dashboards

STAT 113: R/RStudio Intro

Java Program Structure and Eclipse. Overview. Eclipse Projects and Project Structure. COMP 210: Object-Oriented Programming Lecture Notes 1

Introduction to R (BaRC Hot Topics)

the star lab introduction to R Day 2 Open R and RWinEdt should follow: we ll need that today.

Automating Digital Downloads

Creating your first JavaServer Faces Web application

FILE ORGANIZATION. GETTING STARTED PAGE 02 Prerequisites What You Will Learn

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

Exercise 1: Introduction to MapInfo

(Updated 29 Oct 2016)

Chapter 3 Running Totals

An Introduction to Stata Exercise 1

History, installation and connection

ENCM 339 Fall 2017: Editing and Running Programs in the Lab

Linking Reports to your Database in Crystal Reports 2008

Lab #1: A Quick Introduction to the Eclipse IDE

Silk Performance Manager Installation and Setup Help

Introduction to R. Andy Grogan-Kaylor October 22, Contents

One of the fundamental kinds of websites that SharePoint 2010 allows

FTP Frequently Asked Questions

Remodeling Your Office A New Look for the SAS Add-In for Microsoft Office

Using Microsoft Excel

XP: Backup Your Important Files for Safety

Subversion was not there a minute ago. Then I went through a couple of menus and eventually it showed up. Why is it there sometimes and sometimes not?

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

A system for statistical analysis. Instructions for installing software. R, R-studio and the R-commander

Introduction to Programming

Google Earth: Significant Places in Your Life Got Maps? Workshop June 17, 2013

Sucuri Webinar Q&A HOW TO IDENTIFY AND FIX A HACKED WORDPRESS WEBSITE. Ben Martin - Remediation Team Lead

Question: How do I move my mobile account from the Corporate to my Personal Account?

Generic Language Technology

IN-CLASS EXERCISE: INTRODUCTION TO R

National Child Measurement Programme 2017/18. IT System User Guide part 3. Pupil Data Management

Transcription:

An Introduction to R 1.3 Some important practical matters when working with R Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop, 29-Apr-2015

Several boring tasks... Mundane but important things: Installing and loading packages Moving around the file system on the computer Loading and saving data, writing scripts

Several boring tasks... Mundane but important things: Installing and loading packages Moving around the file system on the computer Loading and saving data, writing scripts R or Rstudio? R has (of course) many tools for doing these tasks Rstudio has very good tools too, and easier ones These tasks aren t interesting data analysis stuff, so there s not much value in exploring the more powerful command line versions

Installing and loading packages

Packages What is a package? A collection of R functions and data sets that someone has contributed to the R ecosystem Packages extend the functionality of R: most of the value to R comes from the 5000+ packages out there

Packages What is a package? A collection of R functions and data sets that someone has contributed to the R ecosystem Packages extend the functionality of R: most of the value to R comes from the 5000+ packages out there Where do they come from? Most packages are distributed centrally via CRAN (comprehensive R archive network) There are lots of mirrors of CRAN. In Australia: CSIRO Canberra, University of Melbourne You can also get packages in other ways (not discussed)

Base R comes with about 30 packages >.packages(true) [1] "base" "boot" "class" "cluster" [5] "codetools" "compiler" "datasets" "foreign" [9] "graphics" "grdevices" "grid" "KernSmooth" [13] "lattice" "MASS" "Matrix" "methods" [17] "mgcv" "nlme" "nnet" "rpart" [21] "spatial" "splines" "stats" "stats4" [25] "survival" "tcltk" "tools" "utils" [29] "manipulate" (ignore the.packages command, I ll show an easier way in a moment)

After a while you end up with lots more >.packages(true) [1] "alr3" "base" "BayesFactor" "bitops" [5] "boot" "brew" "car" "class" [9] "cluster" "coda" "codetools" "coin" [13] "colorspace" "compiler" "datasets" "devtools" [17] "dichromat" "digest" "effects" "evaluate" [21] "ez" "foreign" "formatr" "Formula" [25] "ggplot2" "GPArotation" "graphics" "grdevices" [29] "grid" "gtable" "hexbin" "highr" [33] "Hmisc" "httr" "KernSmooth" "knitr" [37] "labeling" "lattice" "lavaan" "leaps"... [101] "stats" "stats4" "stringr" "survey" [105] "survival" "tcltk" "testit" "testthat" [109] "tools" "utils" "whisker" "XML" [113] "xtable" "zoo" (the only reason why there aren t more is that this is a new-ish machine, and I only install packages when I need them)

Rstudio has a nice display showing which packages you have installed

the package name

description of what it does

what version do you have installed?

click this button to uninstall the package

check to see if there are newer versions of your packages on CRAN

refresh the list (e.g., if you ve just done something new)

click this to install new packages (we ll come back to this)

is the package loaded? (check the box to load, uncheck to unload)

Terminology Installed means... That the package files are stored on your computer Your version of R is able to load the package

Terminology Installed means... That the package files are stored on your computer Your version of R is able to load the package Loaded means... That R has opened the package files (sort of), and now knows what they contain You can use the functions / data stored in the package

Terminology Installed means... That the package files are stored on your computer Your version of R is able to load the package Loaded means... That R has opened the package files (sort of), and now knows what they contain You can use the functions / data stored in the package The upshot of this: A package must be installed before you can load it A package must be loaded before you can use it

Why does it work like that??? R is big 5000+ packages means that different authors will use the same name to refer to different functions! e.g., there are several packages that define a logit() function.

Why does it work like that??? R is big 5000+ packages means that different authors will use the same name to refer to different functions! e.g., there are several packages that define a logit() function. Separating install from load avoids inconsistency: If every installed package were also loaded, it would introduce a lot of naming conflicts Install everything you might want to use sometime Load only those things you need to use now!

Let s load the MASS package

Let s load the MASS package library("mass", lib.loc="/library/frameworks/ R.framework/Versions/3.2/Resources/library") this pops up in the R console automatically... this is the actual loading command

Now let s load the Matrix package you don t usually have to specify the lib.loc bit: R knows where the default locations are, so you can just type the package name > library( "Matrix" ) Loading required package: lattice

Now let s load the Matrix package > library( "Matrix" ) Loading required package: lattice R keeps track of dependencies : some packages rely on content of other packages. So if you try to load package A, but it requires content from package B (which you don t have loaded), R will load package B too.

Try it yourself (Exercise 1.3.1)

How do I know if there s a conflict? > library( psych ) > library( car ) Attaching package: car psych and car both contain a function called logit(). When I load both packages, the more recently loaded one (car) takes precedence... The following object is masked from package:psych : logit

How do I know if there s a conflict? > library( psych ) > library( car ) Attaching package: car The following object is masked from package:psych : logit This is the warning message that R prints out. It says that logit exists in both packages... and that the version in psych is masked (i.e., you can t* access it)

Installing packages

Installing packages

Where to install from? (ignore this) Installing packages

Where to install to? (ignore this) Installing packages

Should dependencies be installed? Leave this checked, because the answer is almost always yes Installing packages

Which packages to install? Installing packages

Start typing, and notice that Rstudio gives a list of possible packages Installing packages

Installing packages > install.packages("psych") trying This URL is 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ the command that psych_1.3.10.12.tgz' appears in the R console Content type 'application/x-gzip' length 2687899 bytes (2.6 Mb) opened URL ================================================== downloaded 2.6 Mb The downloaded binary packages are in /var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/t//rtmpo8qt8n/ downloaded_packages

Installing packages > install.packages("psych") Where is it downloading from? trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ psych_1.3.10.12.tgz' Content type 'application/x-gzip' length 2687899 bytes (2.6 Mb) opened URL ================================================== downloaded 2.6 Mb The downloaded binary packages are in /var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/t//rtmpo8qt8n/ downloaded_packages

Installing packages > install.packages("psych") trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ psych_1.3.10.12.tgz' Content type 'application/x-gzip' length 2687899 bytes (2.6 Mb) opened URL ================================================== downloaded 2.6 Mb What gets down loaded? The downloaded binary packages are in /var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/t//rtmpo8qt8n/ downloaded_packages

Installing packages > install.packages("psych") trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ psych_1.3.10.12.tgz' Content type 'application/x-gzip' length 2687899 bytes (2.6 Mb) opened URL ================================================== downloaded 2.6 Mb The downloaded binary packages are in /var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/t//rtmpo8qt8n/ downloaded_packages Where did it store files?

Try it yourself (Exercise 1.3.2)

Loading a workspace (.Rdata) file

Workspace files The primary file format used by R is.rdata It is a saved workspace It contains whatever data sets, variables, functions etc that the workspace included when the file was created

Workspace files The primary file format used by R is.rdata It is a saved workspace It contains whatever data sets, variables, functions etc that the workspace included when the file was created How to load an.rdata file? Hard way: use the load() function manually Easy way #1: double click on the.rdata file in Finder/ Explorer, and (as long as Rstudio is the default application for Rdata files) it will load automatically Easy way #2: open using the Rstudio menus

Rstudio method for loading.rdata files This is the file open button

Rstudio method for loading.rdata files You can also use the File menu to do the same thing if you want to...

Opens a file open dialog box... It will look different on different operating systems... it will look like a familiar Windows thing on a Windows computer, a standard Mac thing on a Mac computer etc etc...

Browse for the file you want, and open: Clicking open will load the toydata.rdata file

And the data file is now loaded... load( ~/Work/Research/Rbook/workshop_dsto/datasets/toydata.Rdata") A command like this will appear in the R console (this command is what actually loaded the file)

And the data file is now loaded... load( ~/Work/Research/Rbook/workshop_dsto/datasets/toydata.Rdata") And the variable(s) that are stored in the file are now listed in the workspace

What does it mean to have data loaded? Loading means that you ve copied the variables in the.rdata file into your R workspace You can now use these variables for your analysis Deleting or changing variables in the workspace does not change the contents of the.rdata file i.e. R doesn t do autosave or anything of the sort This is a good thing!

Try it yourself (Exercise 1.3.3)

Saving a workspace file

Suppose you ve done some work and you want to save the workspace... I must have done something, there s all this new stuff in the workspace!

The save button is your friend

Again, it opens a system-specific dialog:

Browse, type a filename, and click save

Now the file is saved save.image( ~/Work/Research/Rbook/workshop_dsto/ datasets/toydata_modified.rdata.rdata") Again, the actual save command shows up in the R console

Try it yourself (Exercise 1.3.4)

Importing data from text files (CSV)

CSV is a standard universal format The raw data is just a plain text file: CSV stands for comma separated value

CSV is a standard universal format CSV files are often opened by spreadsheets, and produce tabular data like this...

CSV is a standard universal format > expt In R, a CSV file is imported as a data frame id age gender treatment hormone happy sad 1 1 25 male control 6.7 2.00 6.12 2 2 24 male drug1 38.5 3.36 3.53 3 3 25 male drug2 25.0 3.40 4.82 4 4 28 male control 98.4 5.69 0.34 5 5 23 male drug1 42.4 4.56 4.48 6 6 28 male drug2 20.3 2.89 4.57 7 7 25 female control 18.5 3.18 4.82 8 8 29 female drug1 65.2 4.78 2.24 9 9 21 female drug2 56.4 4.51 2.64 10 10 26 female control 55.7 3.90 2.71 11 11 19 female drug1 41.9 2.83 2.94 12 12 30 female drug2 54.1 3.45 1.87

Importing CSV data using Rstudio Click on this...

Importing CSV data using Rstudio And this...

Importing CSV data using Rstudio Once again, there s a standard dialog box that you can use to find and open the desired file...

This pops up

The raw data

The assumptions that R is using when it imports the data

What the data frame will look like if you import using these settings

The name of the data frame to create

Import when ready...

Rstudio opens a tab showing you the contents of the data frame you just imported

> toydata <- read.csv("~/work/research/rbook/ workshop_dsto/datasets/toydata.csv") > View(toydata) These are the actual R commands that Rstudio used to import the data

Try it yourself (Exercise 1.3.5)

Scripts: A great tool for storing the commands for your analyses

Background What do we know how do to? Load data from.rdata files and.csv files Type commands to get R to make output Save data / R output to.rdata files Install and load packages to extend R functionality

Background What do we know how do to? Load data from.rdata files and.csv files Type commands to get R to make output Save data / R output to.rdata files Install and load packages to extend R functionality What s missing? How to save a collection of R commands to run later i.e. scripts

Scripts What is an R script? R scripts are text files, and have a.r extension They contain a sequence of R commands that R will execute when the script is sourced (i.e., run)

Scripts What is an R script? R scripts are text files, and have a.r extension They contain a sequence of R commands that R will execute when the script is sourced (i.e., run) How do I use scripts? Type (or paste) R commands into the text file Save the script (usually in the same folder as the data) Use the source button to run it. Here s how...

Open an existing script? Click here

Click here Create a new script...

And here Create a new script...

Here s the new (empty) script

Type some R commands into the script:

This tells you that the script has some unsaved changes Type some R commands into the script:

These comments won t do anything, but it s a good idea to write extensive comments Type some R commands into the script:

These are the actual R commands that will do something! Type some R commands into the script:

Click here We should probably save the script!

Hey look, another standard dialog box... Save it as something like myscript.r...

Now to run the script... Click here

This is the R command that actually ran the script...

These two lines were executed first... and they created two variables

These two lines were executed first... and they created two variables

This line comes next...

And produced this output

Typical workflow Load data from CSV or Rdata file Play around with some analyses Copy some commands to a script Copy/paste helps, as does the history tab in Rstudio Often have separate scripts for different jobs Save the script(s) Save the variables created (and the raw data) to new Rdata file(s)

Try it yourself (Exercise 1.3.6)

Getting around the file system (briefly)

The working directory What is the working directory? Because R interacts with files on your computer, it thinks of itself as working in a particular location. Anytime you try to open a file without specifying a folder, R will look in the working directory. > getwd() [1] "/Users/dan"

Getting around the computer R commands getwd(), setwd() etc powerful but tedious to use Rstudio is cleaner Use the file panel to navigate graphically set as working directory option is equivalent to the R command setwd()

Initially, the file panel probably looks like this: a list of files and folders Using the file panel in Rstudio

Which folder are we currently looking at? This opens a Finder window (Mac) or Explorer window (Windows) which you can use to select a new folder to look at Clickable links 104

You can use the file panel to change the R working directory by clicking more, and then set as working directory

When you do, you ll see a setwd() command appear in the console, And the console panel itself will tell you what folder R is looking at...

End of this section