An Introduction to Statistical Computing in R

Size: px
Start display at page:

Download "An Introduction to Statistical Computing in R"

Transcription

1 An Introduction to Statistical Computing in R K2I Data Science Boot Camp - Day 1 AM Session May 15, 2017 Statistical Computing in R May 15, / 55

2 AM Session Outline Intro to R Basics Plotting In R Data Manipulation Statistical Computing in R May 15, / 55

3 R Basics Here we will give a quick overview of the R language and the RStudio IDE. Our emphasis will be to explore the most used features of R, especially those used in later courses. This won t cover all the details, but will the most important parts. Statistical Computing in R May 15, / 55

4 Working with Rstudio Before beginning with R let s orient ourselves with RStudio. Statistical Computing in R May 15, / 55

5 Our initial view of RStudio is: Statistical Computing in R May 15, / 55

6 Go to: File -> New File -> R Script. This gives: Statistical Computing in R May 15, / 55

7 Statistical Computing in R May 15, / 55

8 Try It Out Type the following into console?lm??linear plot(1:20, 1:20) Statistical Computing in R May 15, / 55

9 There are several useful shortcut keys in RStudio. A few popular ones: Ctrl+Enter - When pressed in Editor, sends current line to console. Ctrl+1, Ctrl+2 - switch between editor and console Ctrl+Shift+Enter - run entire script in console tab completion - this is perhaps the most used feature For vim/emacs users Tools -> Global Options -> Code -> Keybindings will give you your prefered bindings. Statistical Computing in R May 15, / 55

10 It s important to know our working directory. Given a file name, R will assume it is located in your current working directory. R will also save output to the working directory by default. It is important to set your working directory to the correct location or specify full path names. Statistical Computing in R May 15, / 55

11 Try out the following in the console window: getwd() list.files() To change your working directory go to: Session -> Set Working Directory -> Choose Directory Alternatively, setwd("/path/to/directory") Statistical Computing in R May 15, / 55

12 Reading, Writing, Saving, and Loading Here we ll look at bringing data into R and getting it out We ll also see how to save R objects and environments Statistical Computing in R May 15, / 55

13 Reading In Data read.table read.csv read.fwf Check out options for each?read.table Statistical Computing in R May 15, / 55

14 Syntax?read.table?read.csv read.table("/path/to/your/file.ext", header=true, sep=",", stringsasfactors = FALSE) Statistical Computing in R May 15, / 55

15 Most Common Options sep tells how fields/variables are separated. Commons values are:, (comma) (single space) \t (tab escape character) stringsasfactors tells whether to treat non numeric values as factor/categorical variables. header tells whether first line of file has variable names na.strings tells how missing values are encoded in the file. Statistical Computing in R May 15, / 55

16 Standard Procedure Open file in text editor Check items relevant to options. Header? Separator type? For big files, Linux tools are helpful: head -n10 BigFile.txt > OpenMe Statistical Computing in R May 15, / 55

17 Try it Out Let s read in the ReadMeInX.txt files into R. Try it on your own before looking at the answer on the next slides. Example workflow: 1 Set your working directory to the directory containing the files. 2 Examine the files in a text editor to check for common options (header, separator, etc.) Statistical Computing in R May 15, / 55

18 # read.table's default seperator ok for this one set0 <- read.table("readmein0.txt", header=true) # specify new seperator set1 <- read.table("readmein1.txt", header=true, sep=',') # Or use read.csv set1 <- read.csv("readmein1.txt", header=true) Statistical Computing in R May 15, / 55

19 # another change of seperator set2 <- read.table("readmein2.txt", header=true, sep=';') # check for missing set3 <- read.table("readmein3.txt", header=false, sep=',', na.strings = '') Statistical Computing in R May 15, / 55

20 Writing Data write.table write.csv Statistical Computing in R May 15, / 55

21 Syntax and Common Options?write.csv write.csv(myrobject, file="/path/to/save/spot/file.csv", row.names=false) Options largely the same as their read counterparts row.names = FALSE is helpful to avoid have 1,2,3,... as a variable/column Statistical Computing in R May 15, / 55

22 Try It Out Write out one of the files you imported. Try to varying options like sep, quote. Statistical Computing in R May 15, / 55

23 Saving Objects saverds/readrds are used to save (compressed version of) individual R objects # save our data set saverds(set1,file="tstobj.rds") # get it back newtst <- readrds("tstobj.rds") # can save any R object. Try a vector my.vector <- c(1,8,-100) saverds(my.vector, file="justavector.rds") Statistical Computing in R May 15, / 55

24 Saving Environment We can save all variables in the current R workspace with save.image We can load in a saved workspace with load R will ask you save your work when you exit # Save all our work save.image("allmywork.rdata") # Reload it load("allmywork.rdata") # name given to default save load(".rdata") Statistical Computing in R May 15, / 55

25 The Basics of R Let s do a whirlwind tour of R: it s syntax and data structures This won t cover all the details, but will the most important parts Statistical Computing in R May 15, / 55

26 Basic R Data Types # numeric types: interger, double 348 # character "my string" # logical TRUE FALSE # artithmetic as you'd expect * 2^4 # so too logical operators/comparison TRUE FALSE 1 + 7!= 7 # Other logical operators: # &,,! # <,>,<=,>=, ==,!= Statistical Computing in R May 15, / 55

27 Data Types Cont. # variables assignment is done with the <- operator my.number <- 483 # the '.' above does nothing. we could have done: # mynumber <- 483 # instead # it's an Rism to use.'s in variable names. # typeof() tells use type typeof(my.number) ## [1] "double" # we can convert between types my.int <- as.integer(my.number) typeof(my.int) ## [1] "integer" Statistical Computing in R May 15, / 55

28 R Data Structures - Vectors # the vector is the most important data structure # create it with c() my.vec <- c(1,2,67,-98) # get some properties str(my.vec) ## num [1:4] length(my.vec) ## [1] 4 # access elements with [] my.vec[3] ## [1] 67 my.vec[c(3,4)] ## [1] # can do assignment too my.vec[5] < Statistical Computing in R May 15, / 55

29 Vectors - Cont. # other ways to create vectors x <- 1:6 y <- seq(7,12,by=1) # Operations get recycled through whole vector x + 1 ## [1] x > 3 ## [1] FALSE FALSE FALSE TRUE TRUE TRUE # Can do component wise operations between vectors x * y ## [1] x / y ## [1] y %/% x ## [1] Statistical Computing in R May 15, / 55

30 Try It Out # Try guess what the following lines will do # Will it run at all? If so, what will it give? # Think about it and run to confirm 7 -> w w <- z < TRUE 0 15 & 3 my.vec[2:4] my.vec[-2] my.vec[c(true,false,false,true,false)] my.vec[ sum( c(true,false,false,true,true) ) ] <- TRUE my.vec[3] <- "I'm a string" as.numeric(my.vec) x[x>3] x + c(1,2) Statistical Computing in R May 15, / 55

31 Matrices # matricies are 2d vectors. # create using matrix() my.matrix <- matrix(rnorm(20),nrow=4,ncol=5) # rnorm() draws 20 random samples from a n(0,1) distribution my.matrix ## [,1] [,2] [,3] [,4] [,5] ## [1,] ## [2,] ## [3,] ## [4,] # note matricies loaded by column # Get details dim(my.matrix) ## [1] 4 5 nrow(my.matrix) ## [1] 4 ncol(my.matrix) ## [1] 5 Statistical Computing in R May 15, / 55

32 Matrices - Cont. # Indexing is similar to vectors but with 2 dimensions # get second row my.matrix[2,] ## [1] # get first,last columns of row three my.matrix[3,c(1,4)] ## [1] # transposing done with t() Statistical Computing in R May 15, / 55

33 Lists # lists similar to vectors but contain different types # create with list my.list <- list("just a string", 44, my.matrix, c(true,true,false)) # access items via double brackets [[]] my.list[[4]] ## [1] TRUE TRUE FALSE # access multiple items my.list[1:2] ## [[1]] ## [1] "just a string" ## ## [[2]] ## [1] 44 # list items can be named too named.list <- list(item1="my string", Item2=my.list) # access of named item is via dollar sign operator # [[]] also works c(named.list$item1,named.list[[1]]) ## [1] "my string" "my string" Statistical Computing in R May 15, / 55

34 Putting it together Let s practice with R data types by doing PCA on the iris data. data("iris") head(iris) str(iris) Note iris is a data.frame data type; this is simply a list. Statistical Computing in R May 15, / 55

35 PCA outline Save the numeric columns of iris as a matrix. (Hint:?as.matrix) Center and scale the matrix (Hint:?scale) Compute the correlation matrix R = 1 n 1 X T X Here X is our (centered and scaled) data matrix, n is the number of rows/observations in our data, and X T is the transpose of X. (Hint: t(x) is transpose operator and A%*%B performs matrix multiplication on the matricies A and B) Statistical Computing in R May 15, / 55

36 PCA outline cont. Obtain the two leading eigenvectors of the correlation matrix R. Denote these as v 1, v 2. (Hint:?eigen) Compute the first and second principle components via z 1 = X v 1 z 2 = X v 2 Produce a scatter plot of z 1 vs z 2 (Hint:?plot) Take a few moments to try it yourself before looking at the answers on the next slides. Statistical Computing in R May 15, / 55

37 PCA from scratch data("iris") # get numeric portions of list and make a matrix X <- as.matrix(iris[1:4]) # center and scale X <- scale(x,center = TRUE,scale=TRUE) # get the number of rows n <- nrow(x) # compute correlation matrix R <- (1/(n-1))*t(X)%*%X # perform eigen decomposition Reig <- eigen(r) # get eigen vectors Reig.vecs <- Reig$vectors # create principle components pc1 <- X%*%Reig.vecs[,1] pc2 <- X%*%Reig.vecs[,2] Statistical Computing in R May 15, / 55

38 PCA from scratch cont. # compare to R's PCA function their.pcs <-prcomp(iris[1:4],center = TRUE,scale. = TRUE) head(their.pcs$x[,1:2]) ## PC1 PC2 ## [1,] ## [2,] ## [3,] ## [4,] ## [5,] ## [6,] # our result head(cbind(pc1,pc2)) ## [,1] [,2] ## [1,] ## [2,] ## [3,] ## [4,] ## [5,] ## [6,] Statistical Computing in R May 15, / 55

39 PCA from scratch cont. plot(pc1,pc2,col=iris$species) pc pc1 Statistical Computing in R May 15, / 55

40 Factors # Factors are like vector, but with predefined allowed values called levels # Factors are used to represent categorical variables in R # create a factor factor1 <- factor(c('good','bad','ugly')) # find it's levels levels(factor1) ## [1] "Bad" "Good" "Ugly" # below gives warning, but not error factor1[4] <- 17 ## Warning in [<-.factor ( *tmp*, 4, value = 17): invalid factor level, NA generated # see what happened factor1 ## [1] Good Bad Ugly <NA> ## Levels: Bad Good Ugly factor1[4] <- 'Bad' # get the breakdown table(factor1) ## factor1 ## Bad Good Ugly ## Statistical Computing in R May 15, / 55

41 Note one of our previous examples R filled in the improper factor value with NA NA is R s way of specifying missing data Note the missing data is handled differently than ordinary values, as we will see as we go along. Statistical Computing in R May 15, / 55

42 Questions What will the following lines of code do? my.matrix[3:4,1:2] <- c(4,5) my.matrix[4,5] <- 'string' mf.strings <- c('f','f','m','f') factor2 <- as.factor(mf.strings) c(factor1, factor2) factor1 == 'Ugly' my.list[[3]][2,] sum(c(1,2,3,na)) sum(c(1,2,3,na),na.rm = TRUE) Statistical Computing in R May 15, / 55

43 Data Frames The data.frame is how R represents data sets. They are simply lists, with a few additional restrictions. # create your own my.df <- data.frame( age = c(45,27,19,59,71,13,5), gender = factor(c('m','m','m','f','m','f','f')) ) str(my.df) ## 'data.frame': 7 obs. of 2 variables: ## $ age : num ## $ gender: Factor w/ 2 levels "F","M": Statistical Computing in R May 15, / 55

44 Data Frames - Cont. Individual variables can be accessed via $ operator my.df$age ## [1] summary(my.df$age) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## table(my.df$gender) ## ## F M ## 3 4 # data frames are really just lists my.df[[2]] ## [1] M M M F M F F ## Levels: F M Statistical Computing in R May 15, / 55

45 Data Frames - Cont. # data.frames can be subsetted like matrcies my.df[1:3,c("age")] ## [1] # logical subsetting especially useful for.data.frames # get ages over 40 age.logic <- my.df$age > 40 # take a subset of these rows my.df[age.logic,] ## age gender ## 1 45 M ## 4 59 F ## 5 71 M # create a new variable age.sq my.df$age.sq <- my.df$age^2 Statistical Computing in R May 15, / 55

46 Try It Out Let s use R s internal iris data set to practice with data frames my.iris <- iris my.iris 1 Create two new variables Length.Sum and Width.Sum which are the sum of Sepal and Petal length/width respectively. 2 Use subsetting and R s mean function to find the average Length.Sum of setosa species Statistical Computing in R May 15, / 55

47 my.iris$length.sum = my.iris$sepal.length + my.iris$petal.length my.iris$width.sum = my.iris$sepal.width + my.iris$petal.width setosa.inds <- my.iris$species == 'setosa' mean(my.iris[setosa.inds,]$length.sum) ## [1] Statistical Computing in R May 15, / 55

48 Control Structures R has all the typical control structures: if-else statements for loops while loops Statistical Computing in R May 15, / 55

49 Syntax if(logical_expression){ execute_code } else{ executre_other_code } for(value in sequence){ work_with_value } while(expression_is_true){ execute_code } Statistical Computing in R May 15, / 55

50 Functions Defining functions is R is easy # use function key word with assignment <- my.mean <- function(input.vector){ sum = 0 for(val in input.vector) { sum = sum + val } # the expression get retuned return.me <- sum / length(input.vector) } my.mean(1:10) Statistical Computing in R May 15, / 55

51 Functions cont. my.mean <- function(input.vector){ sum = 0 for(val in input.vector) { sum = sum + val } # returns 1 now retrun.me <- sum / length(input.vector) 1 } my.mean(1:10) ## [1] 1 Statistical Computing in R May 15, / 55

52 Try It Out Create a function my.summary which inputs a vector, x, calculates the mean, standard deviation, max, and min of x, and returns these in a list Try out R s internal functions mean, sd, max,min Statistical Computing in R May 15, / 55

53 my.summary <- function(x) { list( mean = mean(x), sd = sd(x), max = max(x), min = min(x) ) } Statistical Computing in R May 15, / 55

54 Try It Out cont. Loop through the variables in my.iris, evaluating my.summary on each (provided the variable is numeric) and printing the maximum. Hint: Use is.numeric to test each variable before applying my.summary Statistical Computing in R May 15, / 55

55 for(var in my.iris) { if(is.numeric(var)){ tmp <- my.summary(var) print(tmp$max) } } Statistical Computing in R May 15, / 55

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri)

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri) R: BASICS Andrea Passarella (plus some additions by Salvatore Ruggieri) BASIC CONCEPTS R is an interpreted scripting language Types of interactions Console based Input commands into the console Examine

More information

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics BIO5312: R Session 1 An Introduction to R and Descriptive Statistics Yujin Chung August 30th, 2016 Fall, 2016 Yujin Chung R Session 1 Fall, 2016 1/24 Introduction to R R software R is both open source

More information

STAT 540 Computing in Statistics

STAT 540 Computing in Statistics STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External

More information

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html Intro to R R is a functional programming language, which means that most of what one does is apply functions to objects. We will begin with a brief introduction to R objects and how functions work, and

More information

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here: Lab 1. Introduction to R & SAS R is free, open-source software. Get it here: http://tinyurl.com/yfet8mj for your own computer. 1.1. Using R like a calculator Open R and type these commands into the R Console

More information

SISG/SISMID Module 3

SISG/SISMID Module 3 SISG/SISMID Module 3 Introduction to R Ken Rice Tim Thornton University of Washington Seattle, July 2018 Introduction: Course Aims This is a first course in R. We aim to cover; Reading in, summarizing

More information

Introduction to R. Daniel Berglund. 9 November 2017

Introduction to R. Daniel Berglund. 9 November 2017 Introduction to R Daniel Berglund 9 November 2017 1 / 15 R R is available at the KTH computers If you want to install it yourself it is available at https://cran.r-project.org/ Rstudio an IDE for R is

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

Introduction to Statistics using R/Rstudio

Introduction to Statistics using R/Rstudio Introduction to Statistics using R/Rstudio R and Rstudio Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations sent) R on Windows R on MACs

More information

R Basics / Course Business

R Basics / Course Business R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats

More information

Intermediate Programming in R Session 1: Data. Olivia Lau, PhD

Intermediate Programming in R Session 1: Data. Olivia Lau, PhD Intermediate Programming in R Session 1: Data Olivia Lau, PhD Outline About Me About You Course Overview and Logistics R Data Types R Data Structures Importing Data Recoding Data 2 About Me Using and programming

More information

Lab 1: Getting started with R and RStudio Questions? or

Lab 1: Getting started with R and RStudio Questions? or Lab 1: Getting started with R and RStudio Questions? david.montwe@ualberta.ca or isaacren@ualberta.ca 1. Installing R and RStudio To install R, go to https://cran.r-project.org/ and click on the Download

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

POL 345: Quantitative Analysis and Politics

POL 345: Quantitative Analysis and Politics POL 345: Quantitative Analysis and Politics Precept Handout 1 Week 2 (Verzani Chapter 1: Sections 1.2.4 1.4.31) Remember to complete the entire handout and submit the precept questions to the Blackboard

More information

Reading in data. Programming in R for Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen

Reading in data. Programming in R for Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen Reading in data Programming in R for Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen Data Import R can import data many ways. Packages exists that handles import from software systems

More information

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Data input & output Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University June 2012 1. Working directories 2. Loading data 3. Strings and factors

More information

Data Input/Output. Andrew Jaffe. January 4, 2016

Data Input/Output. Andrew Jaffe. January 4, 2016 Data Input/Output Andrew Jaffe January 4, 2016 Before we get Started: Working Directories R looks for files on your computer relative to the working directory It s always safer to set the working directory

More information

A Brief Introduction to R

A Brief Introduction to R A Brief Introduction to R Babak Shahbaba Department of Statistics, University of California, Irvine, USA Chapter 1 Introduction to R 1.1 Installing R To install R, follow these steps: 1. Go to http://www.r-project.org/.

More information

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio Introduction to R and R-Studio 2018-19 Toy Program #1 R Essentials This illustration Assumes that You Have Installed R and R-Studio If you have not already installed R and RStudio, please see: Windows

More information

Introduction to R Reading, writing and exploring data

Introduction to R Reading, writing and exploring data Introduction to R Reading, writing and exploring data R-peer-group QUB February 12, 2013 R-peer-group (QUB) Session 2 February 12, 2013 1 / 26 Session outline Review of last weeks exercise Introduction

More information

Reading and wri+ng data

Reading and wri+ng data An introduc+on to Reading and wri+ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 4 Course outline Review Data types and structures Reading data How should data look

More information

Introduction to R Commander

Introduction to R Commander Introduction to R Commander 1. Get R and Rcmdr to run 2. Familiarize yourself with Rcmdr 3. Look over Rcmdr metadata (Fox, 2005) 4. Start doing stats / plots with Rcmdr Tasks 1. Clear Workspace and History.

More information

getting started in R

getting started in R Garrick Aden-Buie // Friday, March 25, 2016 getting started in R 1 / 70 getting started in R Garrick Aden-Buie // Friday, March 25, 2016 INFORMS Code & Data Boot Camp Today we ll talk about Garrick Aden-Buie

More information

Getting Started in R

Getting Started in R Getting Started in R Giles Hooker May 28, 2007 1 Overview R is a free alternative to Splus: a nice environment for data analysis and graphical exploration. It uses the objectoriented paradigm to implement

More information

R Notebook Introduction to R in Immunobiology course

R Notebook Introduction to R in Immunobiology course R Notebook Introduction to R in Immunobiology course This is the R file I used to go through the introduction to R. I hope it will help you to understand things better. We will also use the presenters

More information

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012 An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

R basics workshop Sohee Kang

R basics workshop Sohee Kang R basics workshop Sohee Kang Math and Stats Learning Centre Department of Computer and Mathematical Sciences Objective To teach the basic knowledge necessary to use R independently, thus helping participants

More information

Stat405. More about data. Hadley Wickham. Tuesday, September 11, 12

Stat405. More about data. Hadley Wickham. Tuesday, September 11, 12 Stat405 More about data Hadley Wickham 1. (Data update + announcement) 2. Motivating problem 3. External data 4. Strings and factors 5. Saving data Slot machines they be sure casinos are honest? CC by-nc-nd:

More information

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015 R syntax guide Richard Gonzalez Psychology 613 August 27, 2015 This handout will help you get started with R syntax. There are obviously many details that I cannot cover in these short notes but these

More information

Getting Started. Slides R-Intro: R-Analytics: R-HPC:

Getting Started. Slides R-Intro:   R-Analytics:   R-HPC: Getting Started Download and install R + Rstudio http://www.r-project.org/ https://www.rstudio.com/products/rstudio/download2/ TACC ssh username@wrangler.tacc.utexas.edu % module load Rstats %R Slides

More information

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org

More information

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic

More information

Getting Started in R

Getting Started in R Getting Started in R Phil Beineke, Balasubramanian Narasimhan, Victoria Stodden modified for Rby Giles Hooker January 25, 2004 1 Overview R is a free alternative to Splus: a nice environment for data analysis

More information

Lecture 1: Getting Started and Data Basics

Lecture 1: Getting Started and Data Basics Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate

More information

social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40

social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 welcome Course Description The objective of this course is to learn how to

More information

Basic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1)

Basic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1) Basic R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-3_basic_r.html#(1) 1/21 Preliminary R is case sensitive: a is not the same as A.

More information

Mails : ; Document version: 14/09/12

Mails : ; Document version: 14/09/12 Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary

More information

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. 3 confint A metafor package function that gives you the confidence intervals of effect sizes.

More information

Practical 2: Plotting

Practical 2: Plotting Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory

More information

Module 4. Data Input. Andrew Jaffe Instructor

Module 4. Data Input. Andrew Jaffe Instructor Module 4 Data Input Andrew Jaffe Instructor Data Input We used several pre-installed sample datasets during previous modules (CO2, iris) However, 'reading in' data is the first step of any real project/analysis

More information

A whirlwind introduction to using R for your research

A whirlwind introduction to using R for your research A whirlwind introduction to using R for your research Jeremy Chacón 1 Outline 1. Why use R? 2. The R-Studio work environment 3. The mock experimental analysis: 1. Writing and running code 2. Getting data

More information

Statistics for Biologists: Practicals

Statistics for Biologists: Practicals Statistics for Biologists: Practicals Peter Stoll University of Basel HS 2012 Peter Stoll (University of Basel) Statistics for Biologists: Practicals HS 2012 1 / 22 Outline Getting started Essentials of

More information

Input/Output Data Frames

Input/Output Data Frames Input/Output Data Frames Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Input/Output Importing text files Rectangular (n rows, c columns) Usually you want to use read.table read.table(file,

More information

IST 3108 Data Analysis and Graphics Using R. Summarizing Data Data Import-Export

IST 3108 Data Analysis and Graphics Using R. Summarizing Data Data Import-Export IST 3108 Data Analysis and Graphics Using R Summarizing Data Data Import-Export Engin YILDIZTEPE, PhD Working with Vectors and Logical Subscripts >xsum(x) how many of the values were less than

More information

An introduction to R WS 2013/2014

An introduction to R WS 2013/2014 An introduction to R WS 2013/2014 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Dr. Martin Hutzenthaler (previously AG Metzler, now University of Frankfurt) course development,

More information

MATLAB TUTORIAL WORKSHEET

MATLAB TUTORIAL WORKSHEET MATLAB TUTORIAL WORKSHEET What is MATLAB? Software package used for computation High-level programming language with easy to use interactive environment Access MATLAB at Tufts here: https://it.tufts.edu/sw-matlabstudent

More information

Where s the spreadsheet?

Where s the spreadsheet? Reading in Data Where s the spreadsheet? Statistical packages normally have a spreadsheet R has a minimal-but-usable spreadsheet More emphasis on data generated/curated externally Very powerful data import

More information

Basic matrix math in R

Basic matrix math in R 1 Basic matrix math in R This chapter reviews the basic matrix math operations that you will need to understand the course material and how to do these operations in R. 1.1 Creating matrices in R Create

More information

Variables: Objects in R

Variables: Objects in R Variables: Objects in R Basic R Functionality Introduction to R for Public Health Researchers Common new users frustations 1. Different versions of software 2. Data type problems (is that a string or a

More information

Programming for Chemical and Life Science Informatics

Programming for Chemical and Life Science Informatics Programming for Chemical and Life Science Informatics I573 - Week 7 (Statistical Programming with R) Rajarshi Guha 24 th February, 2009 Resources Download binaries If you re working on Unix it s a good

More information

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics Introduction to S-Plus 1 Input: Data files For rectangular data files (n rows,

More information

Intro to R h)p://jacobfenton.s3.amazonaws.com/r- handson.pdf. Jacob Fenton CAR Director InvesBgaBve ReporBng Workshop, American University

Intro to R h)p://jacobfenton.s3.amazonaws.com/r- handson.pdf. Jacob Fenton CAR Director InvesBgaBve ReporBng Workshop, American University Intro to R h)p://jacobfenton.s3.amazonaws.com/r- handson.pdf Jacob Fenton CAR Director InvesBgaBve ReporBng Workshop, American University Overview Import data Move around the file system, save an image

More information

A brief introduction to R

A brief introduction to R A brief introduction to R Cavan Reilly September 29, 2017 Table of contents Background R objects Operations on objects Factors Input and Output Figures Missing Data Random Numbers Control structures Background

More information

Basics of R. > x=2 (or x<-2) > y=x+3 (or y<-x+3)

Basics of R. > x=2 (or x<-2) > y=x+3 (or y<-x+3) Basics of R 1. Arithmetic Operators > 2+2 > sqrt(2) # (2) >2^2 > sin(pi) # sin(π) >(1-2)*3 > exp(1) # e 1 >1-2*3 > log(10) # This is a short form of the full command, log(10, base=e). (Note) For log 10

More information

Brief cheat sheet of major functions covered here. shoe<-c(8,7,8.5,6,10.5,11,7,6,12,10)

Brief cheat sheet of major functions covered here. shoe<-c(8,7,8.5,6,10.5,11,7,6,12,10) 1 Class 2. Handling data in R Creating, editing, reading, & exporting data frames; sorting, subsetting, combining Goals: (1) Creating matrices and dataframes: cbind and as.data.frame (2) Editing data:

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Business Statistics: R tutorials

Business Statistics: R tutorials Business Statistics: R tutorials Jingyu He September 29, 2017 Install R and RStudio R is a free software environment for statistical computing and graphics. Download free R and RStudio for Windows/Mac:

More information

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N.

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N. LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I Part Two Introduction to R Programming RStudio November 2016 Written by N.Nilgün Çokça Introduction to R Programming 5 Installing R & RStudio 5 The R Studio

More information

Importing data sets in R

Importing data sets in R Importing data sets in R R can import and export different types of data sets including csv files text files excel files access database STATA data SPSS data shape files audio files image files and many

More information

Data Analysis in Paleontology Using R. Looping Basics

Data Analysis in Paleontology Using R. Looping Basics Data Analysis in Paleontology Using R Session 4 26 Jan 2006 Gene Hunt Dept. of Paleobiology NMNH, SI Looping Basics Situation: you have a set of objects (sites, species, measurements, etc.) and want to

More information

R in Linguistic Analysis. Week 2 Wassink Autumn 2012

R in Linguistic Analysis. Week 2 Wassink Autumn 2012 R in Linguistic Analysis Week 2 Wassink Autumn 2012 Today R fundamentals The anatomy of an R help file but first... How did you go about learning the R functions in the reading? More help learning functions

More information

Statistical Software Camp: Introduction to R

Statistical Software Camp: Introduction to R Statistical Software Camp: Introduction to R Day 1 August 24, 2009 1 Introduction 1.1 Why Use R? ˆ Widely-used (ever-increasingly so in political science) ˆ Free ˆ Power and flexibility ˆ Graphical capabilities

More information

Reading and writing data

Reading and writing data 25/10/2017 Reading data Reading data is one of the most consuming and most cumbersome aspects of bioinformatics... R provides a number of ways to read and write data stored on different media (file, database,

More information

Introduction to Matlab

Introduction to Matlab Introduction to Matlab Andreas C. Kapourani (Credit: Steve Renals & Iain Murray) 9 January 08 Introduction MATLAB is a programming language that grew out of the need to process matrices. It is used extensively

More information

Introduction into R. A Short Overview. Thomas Girke. December 8, Introduction into R Slide 1/21

Introduction into R. A Short Overview. Thomas Girke. December 8, Introduction into R Slide 1/21 Introduction into R A Short Overview Thomas Girke December 8, 212 Introduction into R Slide 1/21 Introduction Look and Feel of the R Environment R Library Depositories Installation Getting Around Basic

More information

Part I { Getting Started & Manipulating Data with R

Part I { Getting Started & Manipulating Data with R Part I { Getting Started & Manipulating Data with R Gilles Lamothe February 21, 2017 Contents 1 URL for these notes and data 2 2 Origins of R 2 3 Downloading and Installing R 2 4 R Console and Editor 3

More information

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017 R and parallel libraries Introduction to R for data analytics Bologna, 26/06/2017 Outline Overview What is R R Console Input and Evaluation Data types R Objects and Attributes Vectors and Lists Matrices

More information

Lecture 06: Feb 04, Transforming Data. Functions Classes and Objects Vectorization Subsets. James Balamuta STAT UIUC

Lecture 06: Feb 04, Transforming Data. Functions Classes and Objects Vectorization Subsets. James Balamuta STAT UIUC Lecture 06: Feb 04, 2019 Transforming Data Functions Classes and Objects Vectorization Subsets James Balamuta STAT 385 @ UIUC Announcements hw02 is will be released Tonight Due on Wednesday, Feb 13th,

More information

Bioinformatics Workshop - NM-AIST

Bioinformatics Workshop - NM-AIST Bioinformatics Workshop - NM-AIST Day 2 Introduction to R Thomas Girke July 24, 212 Bioinformatics Workshop - NM-AIST Slide 1/21 Introduction Look and Feel of the R Environment R Library Depositories Installation

More information

Intro to R. Fall Fall 2017 CS130 - Intro to R 1

Intro to R. Fall Fall 2017 CS130 - Intro to R 1 Intro to R Fall 2017 Fall 2017 CS130 - Intro to R 1 Intro to R R is a language and environment that allows: Data management Graphs and tables Statistical analyses You will need: some basic statistics We

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

Experimental epidemiology analyses with R and R commander. Lars T. Fadnes Centre for International Health University of Bergen

Experimental epidemiology analyses with R and R commander. Lars T. Fadnes Centre for International Health University of Bergen Experimental epidemiology analyses with R and R commander Lars T. Fadnes Centre for International Health University of Bergen 1 Click to add an outline 2 How to install R commander? - install.packages("rcmdr",

More information

Exploring and Understanding Data Using R.

Exploring and Understanding Data Using R. Exploring and Understanding Data Using R. Loading the data into an R data frame: variable

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

2013 Eric Pitman Summer Workshop in Computational Science....an introduction to R, statistics, programming, and getting to know datasets

2013 Eric Pitman Summer Workshop in Computational Science....an introduction to R, statistics, programming, and getting to know datasets 2013 Eric Pitman Summer Workshop in Computational Science...an introduction to R, statistics, programming, and getting to know datasets Data is Everywhere For example: Science/engineering/medicine Environmental

More information

Lab of COMP 406. MATLAB: Quick Start. Lab tutor : Gene Yu Zhao Mailbox: or Lab 1: 11th Sep, 2013

Lab of COMP 406. MATLAB: Quick Start. Lab tutor : Gene Yu Zhao Mailbox: or Lab 1: 11th Sep, 2013 Lab of COMP 406 MATLAB: Quick Start Lab tutor : Gene Yu Zhao Mailbox: csyuzhao@comp.polyu.edu.hk or genexinvivian@gmail.com Lab 1: 11th Sep, 2013 1 Where is Matlab? Find the Matlab under the folder 1.

More information

Lab 1: Introduction, Plotting, Data manipulation

Lab 1: Introduction, Plotting, Data manipulation Linear Statistical Models, R-tutorial Fall 2009 Lab 1: Introduction, Plotting, Data manipulation If you have never used Splus or R before, check out these texts and help pages; http://cran.r-project.org/doc/manuals/r-intro.html,

More information

Stat 579: Objects in R Vectors

Stat 579: Objects in R Vectors Stat 579: Objects in R Vectors Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu, 1/23 Logical Vectors I R allows manipulation of logical

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

Desktop Command window

Desktop Command window Chapter 1 Matlab Overview EGR1302 Desktop Command window Current Directory window Tb Tabs to toggle between Current Directory & Workspace Windows Command History window 1 Desktop Default appearance Command

More information

the star lab introduction to R Day 2 Open R and RWinEdt should follow: we ll need that today.

the star lab introduction to R Day 2 Open R and RWinEdt should follow: we ll need that today. R-WinEdt Open R and RWinEdt should follow: we ll need that today. Cleaning the memory At any one time, R is storing objects in its memory. The fact that everything is an object in R is generally a good

More information

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also

More information

Reading and writing data

Reading and writing data An introduction to WS 2017/2018 Reading and writing data Dr. Noémie Becker Dr. Sonja Grath Special thanks to: Prof. Dr. Martin Hutzenthaler and Dr. Benedikt Holtmann for significant contributions to course

More information

Introduction to R, Github and Gitlab

Introduction to R, Github and Gitlab Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and

More information

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts 6 Subscripting 6.1 Basics of Subscripting For objects that contain more than one element (vectors, matrices, arrays, data frames, and lists), subscripting is used to access some or all of those elements.

More information

Data Import and Export

Data Import and Export Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you

More information

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi 0. Downloading Data from the Book Website 1. Go to http://bcs.whfreeman.com/ips8e 2. Click on Data Sets 3. Click on Data Sets: PC Text 4. Click on Click here to download. 5. Right Click PC Text and choose

More information

Introduction to scientific programming in R

Introduction to scientific programming in R Introduction to scientific programming in R John M. Drake & Pejman Rohani 1 Introduction This course will use the R language programming environment for computer modeling. The purpose of this exercise

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution

More information

MATLAB: The greatest thing ever. Why is MATLAB so great? Nobody s perfect, not even MATLAB. Prof. Dionne Aleman. Excellent matrix/vector handling

MATLAB: The greatest thing ever. Why is MATLAB so great? Nobody s perfect, not even MATLAB. Prof. Dionne Aleman. Excellent matrix/vector handling MATLAB: The greatest thing ever Prof. Dionne Aleman MIE250: Fundamentals of object-oriented programming University of Toronto MIE250: Fundamentals of object-oriented programming (Aleman) MATLAB 1 / 1 Why

More information

Statistics 13, Lab 1. Getting Started. The Mac. Launching RStudio and loading data

Statistics 13, Lab 1. Getting Started. The Mac. Launching RStudio and loading data Statistics 13, Lab 1 Getting Started This first lab session is nothing more than an introduction: We will help you navigate the Statistics Department s (all Mac) computing facility and we will get you

More information

Data types and structures

Data types and structures An introduc+on to Data types and structures Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 3 Review GeFng started with R Crea+ng Objects Data types in R Data structures in R

More information

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center What is R? R is a statistical computing environment with graphics capabilites It is fully scriptable

More information

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

What is KNIME? workflows nodes standard data mining, data analysis data manipulation KNIME TUTORIAL What is KNIME? KNIME = Konstanz Information Miner Developed at University of Konstanz in Germany Desktop version available free of charge (Open Source) Modular platform for building and

More information

Introduction to the R Language

Introduction to the R Language Introduction to the R Language Data Types and Basic Operations Starting Up Windows: Double-click on R Mac OS X: Click on R Unix: Type R Objects R has five basic or atomic classes of objects: character

More information

References R's single biggest strenght is it online community. There are tons of free tutorials on R.

References R's single biggest strenght is it online community. There are tons of free tutorials on R. Introduction to R Syllabus Instructor Grant Cavanaugh Department of Agricultural Economics University of Kentucky E-mail: gcavanugh@uky.edu Course description Introduction to R is a short course intended

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

Chapter 7. The Data Frame

Chapter 7. The Data Frame Chapter 7. The Data Frame The R equivalent of the spreadsheet. I. Introduction Most analytical work involves importing data from outside of R and carrying out various manipulations, tests, and visualizations.

More information

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming Intro to Programming Unit 7 Intro to Programming 1 What is Programming? 1. Programming Languages 2. Markup vs. Programming 1. Introduction 2. Print Statement 3. Strings 4. Types and Values 5. Math Externals

More information