Introduction to R. Dr. Emile R. Chimusa Department of Integrative Biomedical Sciences University of Cape Town. May 9, 2016

Size: px
Start display at page:

Download "Introduction to R. Dr. Emile R. Chimusa Department of Integrative Biomedical Sciences University of Cape Town. May 9, 2016"

Transcription

1 Introduction to R Dr. Emile R. Chimusa Department of Integrative Biomedical Sciences University of Cape Town May 9,

2 CONTENTS CONTENTS Contents 1 Getting started in R-RStudio Getting R and RStudio Started on your PC R Packages Key Things to Know About R Getting Help R as a Calculator Assignment, Object names and Basic data Types Computing on data vector 8 3 Functions and Expressions Conditions Statements and Loops in R Data Manupilations Load and Read R data Using Your Own Data Basic Data Operations Basic Operations with Matrices Running scripts 24 6 Important R Tips 24 7 Tutorial 25 2

3 Getting started in R-RStudio CONTENTS 1 Getting started in R-RStudio 1.1 Getting R and RStudio Started on your PC R is a system for statistical computation and graphics. RStudio is an alternative graphical interface to R. We use R-RStudio for several reasons: (1) R is open-source and freely available for Mac, PC, and Linux machine. (2) R is user-extensible and user extensions can easily be made available to others. (3) It is the package of choice for many statisticians and those who use statistics frequently. (4) R is becoming very popular with statisticians and scientists, especially in certain subdisciplines,like genetics. (5) It is gaining new features every day. New statistical methods are often available first in R. You can downloaded R freely from (Windows, Linux or MacOS). Follow the instructions and after a little patience you should be able to start R after which a screen is opened with the prompt >. It is also possible to download RStudio server and set up your own server or RStudio desktop for stand-alone processing. Once you have logged in to an RStudio server, you will see something like in Figure below. 3

4 1.2 R Packages CONTENTS Notice that RStudio divides its world into four panels. Several of the panels are further subdivided into multiple tabs. RStudio offers the user some control over which panels are located where and which tabs are in which panels, so you initial configuration might not be exactly like the one illustrated here. The console panel is where we type commands that R will execute. 1.2 R Packages All functionalities of R are well-organized in so-called packages and R provides many more features through a (large) number of packages. To use a package, it must be installed (one time), and loaded (each session). A number of packages are already available in RStudio. The packages tab in RStudio will show you the list of installed packages and indicate which of these are loaded. Alternatively, use the function library() to see which packages are currently installed on your operating system. 4

5 1.3 Key Things to Know About R CONTENTS You can install other packages by clicking on the Install Package button in RStudio and following the directions or to download a specific package you can use the following command. > rep <-" > #install.packages(c("teachingdemos"),repo=rep,dep=true) From the button Packages at cran.r-project.org it can be seen that R has a huge number of packages available for a wide scale of statistical procedures. 1.3 Key Things to Know About R (1) R is case-sensitive. If you mis-capitalize something in R it won t do what you want. (2) Functions in R use the following syntax: > functionname( argument1, argument2,... ) ˆ The arguments are always surrounded by (round) parentheses and separated by commas. Some functions (like data()) have no required arguments, but you still need the parentheses. ˆ If you type a function name without the parentheses, you will see the code for that function (this probably isn t what you want at this point). (3) TAB completion and arrows can improve typing speed and accuracy. If you begin a command and hit the TAB key, RStudio will show you a list of possible ways to complete the command. If you hit TAB after the opening parenthesis of a function, it will show you the list of arguments it expects. The up and down arrows can be used to retrieve past commands. (4) If you see a + prompt, it means R is waiting for more input. Often this means that you have forgotten a closing parenthesis or made some other syntax error. If you have messed up and just want to get back to the normal plot, hit the escape key and start the command fresh. In this course, we will often use packages from Bioconductor, a very useful open source software project for the analysis and comprehension of genomic data. All these packages are asuming to be installed already from the preamble material. To follow the course it is essential to install Bioconductor on your PC or network. Bioconductor is primarily based on R and can be installed, as follows. > source(" > bioclite() Then to download ALL packages from a repository to your system, to load it, and to make the ALL data (Chiaretti, et. al, 2004) available for usag, you can use the following > bioclite("all") > library(all) > data(all) 5

6 1.4 R as a Calculator CONTENTS Getting Help If something doesn t go quite right, or if you can t remember something, it s good to know where to turn for help. In addition to asking your friends and neighbors, you can use the R help system. To get help on a specific function or data set, simply precede its name with a?: >?plot() If you don t know the exact name of a function, you can give part of the name and R will find all functions that match. Quotation marks are mandatory here. > apropos('hist') # must include quotes. If the above fails, you can do a broader search using help.search(), which will find matches not only in the names of functions and data sets, but also in the documentation for them. >??histogram # any of these will work >??"histogram" >??'histogram' > help.search('histogram') In addition, to obtain an overview of the content of a package use ls(package:stats) or library(help= stats ). 1.4 R as a Calculator R can be used as a calculator. Try typing the following commands in the console panel. > 15.3 * 23.4 [1] > sqrt(16) [1] 4 You can save values to named variables for later reuse. > my_product = 15.3 * 23.4 # save result > my_product # show the result [1] > product < * 23.4 # <- is assignment operator, same as = > product [1]

7 1.5 Assignment, Object names and Basic data Types CONTENTS > 15.3 * > newproduct # -> assigns to the right > newproduct [1] Once variables are defined, they can be referenced with other operators and functions. > 5 * product # half of the product [1] > log(product) # (natural) log of the product [1] > log10(product) # base 10 log of the product [1] > log(product, base=2) # base 2 log of the product [1] The semi-colon can be used to place multiple commands on one line. One frequent use of this is to save and print a value all in one go: > 15.3 * > product; product # save result and show it [1] Assignment, Object names and Basic data Types It is often convenient to assign numbers and values to variables (objects) to be used later. The proper way to assign values to a variable is with the <- operator (with a space on either side). The = symbol works too, but it is recommended by the R masters to reserve = for specifying arguments to functions. > x <- 7*41/pi # don't see the calculated value > x # take a look [1] By choosing a variable name you can use letters, numbers, dots., or underscore characters. You cannot use mathematical operators, and a leading dot may not be followed by a number. Examples of valid names are: x, x2, z.value, and z hat. Objects can be of many types, modes, and classes. At this level, it is not necessary to investigate all of the intricacies of the respective type, but there are some with which you need to become familiar: 7

8 Computing on data vector CONTENTS (1) integer: the values 0, ±1, ±2,...; these are represented exactly by R. (2) double: real numbers (rational and irrational); these numbers are not represented exactly (save integers or fractions with a denominator that is a power of 2). (3) character: elements that are wrapped with pairs of or ; (4) logical: includes TRUE, FALSE, and NA (which are reserved words); the NA stands for not available, i.e., a missing value. You can determine an object s type with the typeof function. In addition to the above, there is the complex data type: > sqrt(-1) # isn't defined > sqrt(-1+0i) # is defined > sqrt(as.complex(-1)) # same thing > (0 + 1i)^2 # should be -1 > typeof((0 + 1i)^2) Note that you can just type (1i) 2 to get the same answer. The NaN stands for not a number ; it is represented internally as double. 2 Computing on data vector All of this time we have been manipulating vectors of length 1. Now let us move to vectors with multiple entries. A data vector is simply a collection of numbers obtained as outcomes from measurements. If you would like to enter the data 74, 31, 95, 61, 76, 34, 23, 54, 96 into R, you may create a data vector with the c function (which is short for concatenate). > my_vector <- c(74,31,95,61,76,34,23,54,96) > my_student <-c("emile","eric","peter","anna") This can be illustrated by a simple example on expression values of a gene. Suppose that gene expression values 1, 1.5, and 1.25 from the persons Eric, Peter, and Anna are available. To store these in a vector we use the concatenate command c(), as follows. > gene1 <- c(1.00,1.50,1.25) > gene_person <-c("eric", 'Peter',"Anna") Now we have created the object gene1 containing three gene expression values. We can compute the sum, mean, and standard deviation of the gene expression values we use the corresponding built-in-functions (We see it later on). In order to compute so-called quantiles of distributions (see later on) or plots of functions (see next sections), we may need to generate sequences of numbers. The easiest way to construct a sequence of numbers is by > 1:10 8

9 Computing on data vector CONTENTS [1] This sequence can also be produced by the function seq, which allows for various sizes of steps to be chosen. For example, we may want to generate numbers between zero and one with step size equal to 0.1. > seq(0,1,0.1) [1] > seq(from=1, to = 5) [1] > x <- seq(from = 2,by =-0.1, length.out =4) Indexing data vectors Sometimes we do not want the whole vector, but just a piece of it. We can access the intermediate parts with the [] operator. Observe (with x defined above) > x <- c(74, 31, 95, 61, 76, 34, 23, 54, 96) > x[1] [1] 74 > x[2:4] [1] > x[c(1, 3, 4, 8)] [1] In addition, the vector LETTERS has the 26 letters of the English alphabet in uppercase and letters has all of them in lowercase. > LETTERS [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" [20] "T" "U" "V" "W" "X" "Y" "Z" > LETTERS[1:5] [1] "A" "B" "C" "D" "E" We can use the minus sign to specify those elements that we do not want. > x[-c(1, 3, 4, 8)] [1]

10 Functions and Expressions CONTENTS > letters[-(6:24)] [1] "a" "b" "c" "d" "e" "y" "z" Another type of sequence, called a factor. factor is designed to indicate an experimental condition of a measurement or group to which a patient (observation) belongs. When, for example, for each of 7 experimental conditions there are measurements from 6 patients, the corresponding factor can be generated as follows. > factor <- gl(7,6) The 7 conditions are often called levels of a factor. Each of these levels has 6 repeats corresponding to the number of observations (patients) within each level (type of disease). We shall further illustrate the idea of a factor soon because it is very useful for purposes of visualization. 3 Functions and Expressions A function takes arguments as input and returns an object as output. There are functions to do all sorts of things. We show some examples below. > x <- c(74, 31, 95, 61, 76, 34, 23, 54, 96) > sum(x) [1] 544 > length(x) [1] 9 > min(x) # max(x) [1] 23 > mean(x) # sample mean [1] > sd(x) # sample standard deviation [1]

11 Functions and Expressions CONTENTS > plot(x,x) x x By typing the name of the function without any parentheses or arguments, if you are lucky then the code for the entire function will be printed, right there looking at you. For instance, suppose that we would like to see how the intersect function works: > intersect function (x, y) { y <- as.vector(y) unique(y[match(as.vector(x), y, 0L)]) } <bytecode: 0x12947e8> <environment: namespace:base> You can extend the R functions and language by writing your own functions. Bellow is the syntax for designing your own functions in R namefunction <- function(args) {... code... } 11

12 Functions and Expressions CONTENTS Example of your Functions in R: > #Example 1 of functions: > y<- c(3.1,10.5,14,30,15,19) > x<- c(4,12,12,20,16,22) > z<- cbind(x,y) > circle.area <- function(radius) { + area <- pi*radius^2 + return(area) + } > circle.area(4) # calling or using your function [1]

13 Functions and Expressions CONTENTS > #Example 2 of functions: > mystudy <- function(x){ + par(mfrow=c(3,1)) + hist(x[,1]) + hist(x[,2]) + plot(x[,1],x[,2]) + par(mfrow=c(1,1)) + apply(x,2,summary) + } > mystudy(z) x y Min st Qu Median Mean rd Qu Max Histogram of x[, 1] Frequency x[, 1] Histogram of x[, 2] Frequency x[, 2] x[, 2] x[, 1] Figure 1: Multi-plots in one figure. 13

14 3.1 Conditions Statements and Loops in R CONTENTS 3.1 Conditions Statements and Loops in R The if condition and statement syntax if (...condition...) {...code 1... } else {...code 2... } The while loop syntax while (...condition...) {...code...} The for loop syntax for(rank of indices) {...code...} Example 1: If conditions > x <- 10 > y <- 2 > if (y >1){ + x <- 2*x + y <- 2*y + } else{ + x < x <-2*x + } > x [1] 20 > y [1] 4 Example 2: The for loop > cunt <- c(0,0,0,0) > n <- c(2,4,6,4) > for(i in 1:length(n)){ + cunt <- c(cunt,rep(i,n[i])) + } > cunt 14

15 Data Manupilations CONTENTS [1] Example 3: The for and while loops > for (i in 1:10){print(i)} > n<-10 > while (n > 0) { + print(n,"is greater than 0 \n") + n <- n-1 + } 4 Data Manupilations 4.1 Load and Read R data Data analysis involves a large amount of manupilation and cleaning to facilitate downstream data analysis. This section covers basic data manupilation using R default functions. Many packages contain data sets. You can see a list of all data sets in all loaded packages using > data() You can use data sets by simply typing their names. But if you have already used that name for something or need to refresh the data after making some changes you no longer want, you can explicitly load the data using the data() function with the name of the data set you want. > data(iris) Data sets are usually stored in a special structure called a data frame. Data frames have a 2-dimensional structure. (1) Rows correspond to observational units (people, animals, plants, or other objects we are collecting data about). (2) Columns correspond to variables (measurements collected on each observational unit). R comes with some data and ready for to be used. For Example, the iris data frame contains 5 variables measured for each of 150 iris plants (the obervational units). The iris data set is included with the default R installation and located in a package called datasets which is always available. There are several ways we can get some idea about what is in the iris data frame. > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num $ Sepal.Width : num $ Petal.Length: num $ Petal.Width : num $ Species : Factor w/ 3 levels "setosa","versicolor",..:

16 4.1 Load and Read R data CONTENTS > summary(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.000 Min. :1.000 Min. : st Qu.: st Qu.: st Qu.: st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3.758 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 Species setosa :50 versicolor:50 virginica :50 > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa setosa setosa setosa setosa setosa > View(iris) # In interactive mode, you can also try >?iris # to get the documentation about for the data set. Access to an individual variable in a data frame uses the $ operator in the following syntax: dataframe$variable or with(dataframe, variable). For example, either of > iris$sepal.length[1:10] or [1] > with(iris, Sepal.Length)[1:10] [1] The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes, but we strongly discourage its use, as it often leads to name conflicts. 16

17 4.2 Using Your Own Data CONTENTS 4.2 Using Your Own Data RStudio will help you import your own data. To do so use the Import Dataset button in the Workspace tab. You can load data from text files, from the web, or from google spreadsheets. Using read.csv() and read.table() If you are not using RStudio, or if you want to automate the loading of data from files, instead of using the RStudio menus, you can read files using read.csv() or read.table() (for white space delimited files). Now we can load data file class data.csv, it contains 50 observation or indivuals and 13 variables include years, birthmonth, gender, siblings, height, handspan, footlength, breath, armcross, tongue, dice, beans and handed > mytable <- read.csv("class_data.csv", header = TRUE) > head(mytable) years birthmonth gender siblings height handspan footlength breath armcross 1 26 September M right 2 28 August F right 3 26 January M right 4 28 May M left 5 25 M left 6 30 April M right tongue dice beans handed 1 yes yes yes no yes yes Each of these functions also accepts a URL in place of a file name, which provides an easy way to distribute data via the Internet: > web.data <-' > births <- read.table(web.data, header=true) > head(births) # number of live births in the US each day of 1978 date births datenum dayofyear 1 1/1/ /2/ /3/ /4/ /5/ /6/ The mosaic package includes a function called read.file() that uses slightly different default settings and infers whether it should use read.csv(), read.table(), or load() based on the file 17

18 4.3 Basic Data Operations CONTENTS name. load() is used for opening files that store R objects in native format. Using RStudio Server menus (1) Get the file onto the server. Upload (in the Files tab) your csv file to the server, where you can create folders and store files in your personal account. (2) Load the data from the server into your R session. Now import from a text file in the Workspace tab. In either case, be sure to do the following: 1. Choose good variables names. 2. Put your variables names in the first row. 3. Use each subsequent row for one observational unit. 4. Give the resulting data frame a good name. write.csv(mydata, file = "MyData.csv",row.names=FALSE) write.table(mydata, file = "MyData.csv",row.names=FALSE, na="",col.names=false, sep=", Where "MyData" should be a R data frame object. 4.3 Basic Data Operations R has a number of default funtions to deal with variables, below are some of them (1) rbind: combines rows of data. (2) merge: match merges two data frames. (3) dimnames: lists or assigns names of data frames. (4) cbind: combines columns of data. (5) sapply: applies a function to elements of a list. (6) tapply: applies a function to each cell of a ragged array. (7) factor: creates a categorical variable with value labels if desired. (8) table: creates frequency table. (9) head: display first n observations. (10) colmeans: column means. (11) colsums: column sums. 18

19 4.3 Basic Data Operations CONTENTS (11) rowsums: row sums. (12) length: calculates the count of an object uch list, vectors etc. (13) names: list all varaible of data frame. Now let use, the dataset class data.csv to illustrate basic data manupilation in R. Let load it again > gl <- read.csv("class_data.csv",header=t) > attach(gl) > names(gl) [1] "years" "birthmonth" "gender" "siblings" "height" [6] "handspan" "footlength" "breath" "armcross" "tongue" [11] "dice" "beans" "handed" To remove rows from gl with missing data, the R function to check for this is complete.cases() or na.omit() > gl <- gl[complete.cases(gl), ] # or gl <- na.omit(gl) To keep only the observations from gl where the siblings score is 5 or higher, we can do the follow, > gl_sub <- gl[siblings >= 5, ] > nrow(gl_sub);nrow(gl) [1] 17 [1] 44 To separate the data frame gl into two groups, female when tongue is yes and male where dice is equal to 5 or greater than 5 > gl_female <- gl[gender== "F" & tongue == "yes", ] > gl_male <- gl[gender=="m" & dice >= 5,] > length(gl_male);length(gl_female) [1] 13 [1] 13 To use the rbind function when we stack data because we combine rows of data from gl male and gl female, as follow > gl_mf <- rbind(gl_female,gl_male) > nrow(gl_mf) [1] 22 19

20 4.3 Basic Data Operations CONTENTS Let keep only the variables years, birthmonth, footlength and breath from the gel data frame. > gl <- read.csv("class_data.csv",header=t) > gl1.kept <- gl[, c(1, 2, 7, 8)] > names(gl1.kept) [1] "years" "birthmonth" "footlength" "breath" Keeping only the variables year, gender, siblings, handspan, armcross, tongue from the gel data frame. > gl2.kept <- gl[, c(1,3,4,6,9,10)] > names(gl2.kept) [1] "years" "gender" "siblings" "handspan" "armcross" "tongue" To dropping some variables from data.frame use -c() Merge two data frames (gl1.kept and gl2.kept) on a variable (or a list of variables). We use variable year which has the same name in both data sets. Specifying T in the all argument indicates that we want to keep all the observations from each data set rather than only keeping the observations that came from both data sets. > merge_gl.kept <- merge(gl1.kept, gl2.kept, by="years", all=t) > names(merge_gl.kept) [1] "years" "birthmonth" "footlength" "breath" "gender" [6] "siblings" "handspan" "armcross" "tongue" Note: 1. To dropping some variables from data.frame use -c() 2. If the variable that we were merging on had different names in each data frame then we could use the by.x and by.y arguments. In the by.x argument we would list the name of the variable(s) that was in the data frame listed first in the merge function and in the by.y argument we would name the variable(s) that was in the data frame listed second. > gl2.kept <- gl2.kept[,2:4] > merge_gl.kept2 <- merge(gl1.kept, gl2.kept, by.x="years", by.y="gender", all=t) > names(merge_gl.kept2) [1] "years" "birthmonth" "footlength" "breath" "siblings" [6] "handspan" > nrow(merge_gl.kept2) 20

21 4.3 Basic Data Operations CONTENTS [1] 100 Let us look at different way to grub a portion of a data frame and print them using an R package xtable that can also a latex file, > detach() # This make us unable to access gl data frame using its variable names. W > library(xtable) > data1= gl[,1:4] > s=summary(data1) # basic summary of the data > tab = xtable(s, caption = "My Tables", align =c(" c", " c", " c", " c", " c ")) # > print(tab, file = "assign.tex", append = T, table.placement = "h", caption.placeme Let re-attach the gl data frame, look at some frequency of armcross variable and plot the observed frequencies of armcross with respect to months of year. However, we will explore more visualization in the next chapter. > attach(gl) # so can use names directly > tabarmcross = table(armcross) # create frequency table of values > par(mfrow=c(1,3)) > pie(tabarmcross,main="arm on top when crossing") > barplot(tabarmcross,main="arm on top when crossing") > # turn birthmonth into 'ordered factor' called month > month=factor(birthmonth,levels=c("january","february","march","april","may","june" > boxplot(as.vector(table(month))) Arm on top when crossing Arm on top when crossing right left left right Let us plot footlength against height by differentiate female and male by color, 21

22 4.4 Basic Operations with Matrices CONTENTS > plotcolours=c(1,2)[gender] # chooses 1 or 2 according to Gender > plot(height,footlength,pch=16,col=plotcolours,cex=1.5) # big coloured blobs footlength height 4.4 Basic Operations with Matrices Impoortant, we can also convert the data frame to matric using the R function as.matrix(), ad follow > data_matrix <- gl[,4:7] To illustrate matrices operation, we will use a variable footlength from gel data frame to create 7x7 matrice (a matrice of 7 rows and columns. > length(footlength) [1] 50 > x <- footlength[1:49] > x <- matrix(data=x,nrow=7,ncol=7) > dimnames(x) <- list(c("r1","r2","r3","r4","r5","r6","r7"),c("a","b","c","d","e","f 22

23 4.4 Basic Operations with Matrices CONTENTS > apply(x,1,sum) # sum across the 1st dimension, namely rows r1 r2 r3 r4 r5 r6 r > apply(x,2,sum) # sum across the 2nd dimension, columns a b c d e f g > apply(x,1,min) r1 r2 r3 r4 r5 r6 r Basic Linear Algebra: > gl <- read.csv("class_data.csv",header=t) > attach(gl) > x <- footlength[1:49] > x <- matrix(data=x,nrow=7,ncol=7) > t(x) # transpose a matrix [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] [2,] [3,] [4,] [5,] [6,] [7,] > diag(x) # diagonal matrix [1] > sum(diag(x)) # trace of a matrix [1] > x %*% x [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] [2,] [3,] [4,] [5,] [6,] [7,]

24 Running scripts CONTENTS > det(x) # determinant of a matrix [1] > eigen(x) # eigenvalues and eigenvectors $values [1] i i i [4] i i i [7] i $vectors [,1] [,2] [,3] [,4] [1,] i i i i [2,] i i i i [3,] i i i i [4,] i i i i [5,] i i i i [6,] i i i i [7,] i i i i [,5] [,6] [,7] [1,] i i i [2,] i i i [3,] i i i [4,] i i i [5,] i i i [6,] i i i [7,] i i i 5 Running scripts It is very convenient to use a plain text writer like Notepad, gedit, Kate, Emacs, or WinEdt for the formulation of several consecutive R commands as separated lines (called scripts and your script must have extension.r or.r). Such command lines can be executed by simply using copy and paste into the command line editor of R. Another possibility is to execute a script from a file (source(my script.r) To illustrate the latter consider the following. We can load the a script file calles Example1.R 6 Important R Tips (1) It is unnecessary to retype commands repeatedly, since R remembers what you have recently entered on the command line. (2) To cycle through the previous commands just push the (up arrow) key. More generally, the command history() will show a whole list of recently entered commands. 24

25 Tutorial CONTENTS (3) To find out what all variables are in the current work environment, use the commands objects() or ls(). These list all available objects in the workspace. If you wish to remove one or more variables, use remove(var1, var2, var3), or more simply use rm(var1,var2, var3), and to remove all objects use rm(list = ls()). (4) Use of scan is when you have a long list of numbers (separated by spaces or on different lines) already typed somewhere else, say in a text file. To enter all the data in one fell swoop, first highlight and copy the list of numbers to the Clipboard with Edit Copy (or by right-clicking and selecting Copy ). Next type the x <- scan() command in the R console, and paste the numbers at the 1: prompt with Edit Paste. All of the numbers will automatically be entered into the vector x. (5) Ctrl+l to clear the screen, Ctrl+l (6) When exiting R the user is given the option to save the workspace. I recommend that beginners DO NOT save the workspace when quitting. If Yes is selected, then all of the objects and data currently in R s memory is saved in a file located in the working directory called.rdata. This file is then automatically loaded the next time R starts (in which case R will say [previouslysavedworkspacerestored]). This is a valuable feature for experienced users of R, but I find that it causes more trouble than it saves with beginners. 7 Tutorial 0. What is the meaning of the following abbreviations: rm, sum, prod, seq, sd, nrow, grep, apply, gl, library, source, setwd, history, str. 1. Reading data into R (a) Use the file Women.txt from the course website and read this into R using read.table(), calling the new R object women. (b) What is the class and dimension of the object women? 2. Matrix manipulations (a) Use the file Women.txt from the course website and read this into R using A new woman joined the study, she is 66 tall, 165 lbs and is 34 years. Use rbind to append a row, containing her information to women (b) Use the file Women.txt from the course website and read this into R using How many women have a weight under 140? (c) Use the file Women.txt from the course website and read this into R using What is the average height of women who weigh between 135 and 145 pounds (hint: first select the data and then find the mean. See the section in lecture 1 on Boolean terms and subsetting). 25

26 Tutorial CONTENTS (d) Use the file Women.txt from the course website and read this into R usingget help on the command colnames. (e) Use the file Women.txt from the course website and read this into R using Change the rownames of women to the letters of the alphabet, eg A, B, C, D etc. (f) Use the file Women.txt from the course website and read this into R using There is a correction to the women is row D, her age should be 39. Change the age in row D to 39. (g) Use the file Women.txt from the course website and read this into R using Sort the matrix women by weight and store the result in newwomen 3. Matrix manipulations Using apply, loop, and writing an R function (a) Use apply to generate a summary report, with the mean, median, sd of height, weight and age. (Hint: use the apply function to get the mean, median and sd of the columns and use rbind to create a matrix with rownames; mean, median and sd). (b) Write a function to calculate BMI. The function should have 2 inputs; weight(lb) and height(in) and should return one value; BMI. The formula for BMI is: bmi = (weight(lb)/[height(in)] 2 ) 703 So for example, if weight = 150 lbs, height = 65. The BMI is ( which is The input to your new function bmi should (65) 2 be > bmi(weight=150, height=65) [1] (c) Do the women have a BMI within the recommend range for their height? (Normal )? (Hint: create women$bmi < bmi(women$weight, women$height) and then test if women$bmi were within normal range). (d) Create a data.frame of 5 columns, called df1, which contains 100 random numbers drawn from the normal distribution with a mean of 8.2. (e) Write a function, called cumsumfn to print the cumulative sum of the row means of this data.frame. Hint: create a new function. Within it, first use apply to get the row means (rmeans). Then write a for loop, which iterates over rmeans to add them to the cumulative sum. (f) Is your output equal to cumsum(rowmeans(df1))? 4. Construct a factor. Construct factors that correspond to the following setting. (a) An experiment with two conditions each with four measurements. (b) Five conditions each with three measurements. (c) Three conditions each with five measurements. 26

Mails : ; Document version: 14/09/12

Mails : ; Document version: 14/09/12 Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary

More information

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics BIO5312: R Session 1 An Introduction to R and Descriptive Statistics Yujin Chung August 30th, 2016 Fall, 2016 Yujin Chung R Session 1 Fall, 2016 1/24 Introduction to R R software R is both open source

More information

A Brief Introduction to R

A Brief Introduction to R A Brief Introduction to R Babak Shahbaba Department of Statistics, University of California, Irvine, USA Chapter 1 Introduction to R 1.1 Installing R To install R, follow these steps: 1. Go to http://www.r-project.org/.

More information

Lecture 1: Getting Started and Data Basics

Lecture 1: Getting Started and Data Basics Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate

More information

An Introduction to Statistical Computing in R

An Introduction to Statistical Computing in R An Introduction to Statistical Computing in R K2I Data Science Boot Camp - Day 1 AM Session May 15, 2017 Statistical Computing in R May 15, 2017 1 / 55 AM Session Outline Intro to R Basics Plotting In

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

Introduction to Statistics using R/Rstudio

Introduction to Statistics using R/Rstudio Introduction to Statistics using R/Rstudio R and Rstudio Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations sent) R on Windows R on MACs

More information

POL 345: Quantitative Analysis and Politics

POL 345: Quantitative Analysis and Politics POL 345: Quantitative Analysis and Politics Precept Handout 1 Week 2 (Verzani Chapter 1: Sections 1.2.4 1.4.31) Remember to complete the entire handout and submit the precept questions to the Blackboard

More information

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here: Lab 1. Introduction to R & SAS R is free, open-source software. Get it here: http://tinyurl.com/yfet8mj for your own computer. 1.1. Using R like a calculator Open R and type these commands into the R Console

More information

Lab 1: Getting started with R and RStudio Questions? or

Lab 1: Getting started with R and RStudio Questions? or Lab 1: Getting started with R and RStudio Questions? david.montwe@ualberta.ca or isaacren@ualberta.ca 1. Installing R and RStudio To install R, go to https://cran.r-project.org/ and click on the Download

More information

STAT 113: R/RStudio Intro

STAT 113: R/RStudio Intro STAT 113: R/RStudio Intro Colin Reimer Dawson Last Revised September 1, 2017 1 Starting R/RStudio There are two ways you can run the software we will be using for labs, R and RStudio. Option 1 is to log

More information

Introduction to R 21/11/2016

Introduction to R 21/11/2016 Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced

More information

Reading and wri+ng data

Reading and wri+ng data An introduc+on to Reading and wri+ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 4 Course outline Review Data types and structures Reading data How should data look

More information

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri)

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri) R: BASICS Andrea Passarella (plus some additions by Salvatore Ruggieri) BASIC CONCEPTS R is an interpreted scripting language Types of interactions Console based Input commands into the console Examine

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

A brief introduction to R

A brief introduction to R A brief introduction to R Cavan Reilly September 29, 2017 Table of contents Background R objects Operations on objects Factors Input and Output Figures Missing Data Random Numbers Control structures Background

More information

Introduction to MATLAB

Introduction to MATLAB Introduction to MATLAB Introduction: MATLAB is a powerful high level scripting language that is optimized for mathematical analysis, simulation, and visualization. You can interactively solve problems

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. 3 confint A metafor package function that gives you the confidence intervals of effect sizes.

More information

Tutorial (Unix Version)

Tutorial (Unix Version) Tutorial (Unix Version) S.f.Statistik, ETHZ February 26, 2010 Introduction This tutorial will give you some basic knowledge about working with R. It will also help you to familiarize with an environment

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Contents 1 Suggested ahead activities 1 2 Introduction to R 2 2.1 Learning Objectives......................................... 2 3 Starting

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 2: Software Introduction Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University jacoby@msu.edu Getting Started with R What is R? A tiny R session

More information

STAT 540 Computing in Statistics

STAT 540 Computing in Statistics STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External

More information

R Basics / Course Business

R Basics / Course Business R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats

More information

Stat 579: Objects in R Vectors

Stat 579: Objects in R Vectors Stat 579: Objects in R Vectors Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu, 1/23 Logical Vectors I R allows manipulation of logical

More information

Introduction to R: Using R for statistics and data analysis

Introduction to R: Using R for statistics and data analysis Why use R? Introduction to R: Using R for statistics and data analysis George W Bell, Ph.D. BaRC Hot Topics November 2014 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/

More information

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic

More information

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org

More information

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015 R syntax guide Richard Gonzalez Psychology 613 August 27, 2015 This handout will help you get started with R syntax. There are obviously many details that I cannot cover in these short notes but these

More information

Introduction to R Software

Introduction to R Software 1. Introduction R is a free software environment for statistical computing and graphics. It is almost perfectly compatible with S-plus. The only thing you need to do is download the software from the internet

More information

Introduction to R. Daniel Berglund. 9 November 2017

Introduction to R. Daniel Berglund. 9 November 2017 Introduction to R Daniel Berglund 9 November 2017 1 / 15 R R is available at the KTH computers If you want to install it yourself it is available at https://cran.r-project.org/ Rstudio an IDE for R is

More information

Using R for statistics and data analysis

Using R for statistics and data analysis Introduction ti to R: Using R for statistics and data analysis BaRC Hot Topics October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/r2011/ Why use R? To perform inferential statistics (e.g.,

More information

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics Introduction to S-Plus 1 Input: Data files For rectangular data files (n rows,

More information

Week 1: Introduction to R, part 1

Week 1: Introduction to R, part 1 Week 1: Introduction to R, part 1 Goals Learning how to start with R and RStudio Use the command line Use functions in R Learning the Tools What is R? What is RStudio? Getting started R is a computer program

More information

Introduction to MATLAB. Simon O Keefe Non-Standard Computation Group

Introduction to MATLAB. Simon O Keefe Non-Standard Computation Group Introduction to MATLAB Simon O Keefe Non-Standard Computation Group sok@cs.york.ac.uk Content n An introduction to MATLAB n The MATLAB interfaces n Variables, vectors and matrices n Using operators n Using

More information

IN-CLASS EXERCISE: INTRODUCTION TO R

IN-CLASS EXERCISE: INTRODUCTION TO R NAVAL POSTGRADUATE SCHOOL IN-CLASS EXERCISE: INTRODUCTION TO R Survey Research Methods Short Course Marine Corps Combat Development Command Quantico, Virginia May 2013 In-class Exercise: Introduction to

More information

Introduction to R Commander

Introduction to R Commander Introduction to R Commander 1. Get R and Rcmdr to run 2. Familiarize yourself with Rcmdr 3. Look over Rcmdr metadata (Fox, 2005) 4. Start doing stats / plots with Rcmdr Tasks 1. Clear Workspace and History.

More information

Introduction to R: Using R for statistics and data analysis

Introduction to R: Using R for statistics and data analysis Why use R? Introduction to R: Using R for statistics and data analysis George W Bell, Ph.D. BaRC Hot Topics November 2015 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/

More information

You will have to download all of the data used from the internet before R can access the data.

You will have to download all of the data used from the internet before R can access the data. 0. Downloading Data You will have to download all of the data used from the internet before R can access the data. If the file accessed via a link, then right click on the file name and save it to a directory

More information

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first Why use R? Introduction to R: Using R for statistics ti ti and data analysis BaRC Hot Topics October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/r2011/ To perform inferential statistics

More information

SISG/SISMID Module 3

SISG/SISMID Module 3 SISG/SISMID Module 3 Introduction to R Ken Rice Tim Thornton University of Washington Seattle, July 2018 Introduction: Course Aims This is a first course in R. We aim to cover; Reading in, summarizing

More information

Why use R? Getting started. Why not use R? Introduction to R: It s hard to use at first. To perform inferential statistics (e.g., use a statistical

Why use R? Getting started. Why not use R? Introduction to R: It s hard to use at first. To perform inferential statistics (e.g., use a statistical Why use R? Introduction to R: Using R for statistics ti ti and data analysis BaRC Hot Topics November 2013 George W. Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ To perform inferential

More information

STAT 213: R/RStudio Intro

STAT 213: R/RStudio Intro STAT 213: R/RStudio Intro Colin Reimer Dawson Last Revised February 10, 2016 1 Starting R/RStudio Skip to the section below that is relevant to your choice of implementation. Installing R and RStudio Locally

More information

Tutorial for the R Statistical Package

Tutorial for the R Statistical Package Tutorial for the R Statistical Package University of Colorado Denver Stephanie Santorico Mark Shin Contents 1 Basics 2 2 Importing Data 10 3 Basic Analysis 14 4 Plotting 22 5 Installing Packages 29 This

More information

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to

More information

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor.

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor. Reading data into R There is a famous, but apocryphal, story about Mrs Beeton, the 19th century cook and writer, which says that she began her recipe for rabbit stew with the instruction First catch your

More information

Dr. Barbara Morgan Quantitative Methods

Dr. Barbara Morgan Quantitative Methods Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

Statistical Bioinformatics (Biomedical Big Data) Notes 2: Installing and Using R

Statistical Bioinformatics (Biomedical Big Data) Notes 2: Installing and Using R Statistical Bioinformatics (Biomedical Big Data) Notes 2: Installing and Using R In this course we will be using R (for Windows) for most of our work. These notes are to help students install R and then

More information

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012 An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.

More information

This document is designed to get you started with using R

This document is designed to get you started with using R An Introduction to R This document is designed to get you started with using R We will learn about what R is and its advantages over other statistics packages the basics of R plotting data and graphs What

More information

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N.

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N. LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I Part Two Introduction to R Programming RStudio November 2016 Written by N.Nilgün Çokça Introduction to R Programming 5 Installing R & RStudio 5 The R Studio

More information

Introduction to R. Stat Statistical Computing - Summer Dr. Junvie Pailden. July 5, Southern Illinois University Edwardsville

Introduction to R. Stat Statistical Computing - Summer Dr. Junvie Pailden. July 5, Southern Illinois University Edwardsville Introduction to R Stat 575 - Statistical Computing - Summer 2016 Dr. Junvie Pailden Southern Illinois University Edwardsville July 5, 2016 Why R R offers a powerful and appealing interactive environment

More information

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi 0. Downloading Data from the Book Website 1. Go to http://bcs.whfreeman.com/ips8e 2. Click on Data Sets 3. Click on Data Sets: PC Text 4. Click on Click here to download. 5. Right Click PC Text and choose

More information

BGGN 213 Working with R packages Barry Grant

BGGN 213 Working with R packages Barry Grant BGGN 213 Working with R packages Barry Grant http://thegrantlab.org/bggn213 Recap From Last Time: Why it is important to visualize data during exploratory data analysis. Discussed data visualization best

More information

Chapter 3: The IF Function and Table Lookup

Chapter 3: The IF Function and Table Lookup Chapter 3: The IF Function and Table Lookup Objectives This chapter focuses on the use of IF and LOOKUP functions, while continuing to introduce other functions as well. Here is a partial list of what

More information

Lab 1: Introduction, Plotting, Data manipulation

Lab 1: Introduction, Plotting, Data manipulation Linear Statistical Models, R-tutorial Fall 2009 Lab 1: Introduction, Plotting, Data manipulation If you have never used Splus or R before, check out these texts and help pages; http://cran.r-project.org/doc/manuals/r-intro.html,

More information

Copyright 2018 by KNIME Press

Copyright 2018 by KNIME Press 2 Copyright 2018 by KNIME Press All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval

More information

Outline. Mixed models in R using the lme4 package Part 1: Introduction to R. Following the operations on the slides

Outline. Mixed models in R using the lme4 package Part 1: Introduction to R. Following the operations on the slides Outline Mixed models in R using the lme4 package Part 1: Introduction to R Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009, Rennes, France

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Basics of R. > x=2 (or x<-2) > y=x+3 (or y<-x+3)

Basics of R. > x=2 (or x<-2) > y=x+3 (or y<-x+3) Basics of R 1. Arithmetic Operators > 2+2 > sqrt(2) # (2) >2^2 > sin(pi) # sin(π) >(1-2)*3 > exp(1) # e 1 >1-2*3 > log(10) # This is a short form of the full command, log(10, base=e). (Note) For log 10

More information

Bjørn Helge Mevik Research Computing Services, USIT, UiO

Bjørn Helge Mevik Research Computing Services, USIT, UiO 23.11.2011 1 Introduction to R and Bioconductor: Computer Lab Bjørn Helge Mevik (b.h.mevik@usit.uio.no), Research Computing Services, USIT, UiO (based on original by Antonio Mora, biotek) Exercise 1. Fundamentals

More information

DOING MORE WITH EXCEL: MICROSOFT OFFICE 2013

DOING MORE WITH EXCEL: MICROSOFT OFFICE 2013 DOING MORE WITH EXCEL: MICROSOFT OFFICE 2013 GETTING STARTED PAGE 02 Prerequisites What You Will Learn MORE TASKS IN MICROSOFT EXCEL PAGE 03 Cutting, Copying, and Pasting Data Basic Formulas Filling Data

More information

Chapter 7. The Data Frame

Chapter 7. The Data Frame Chapter 7. The Data Frame The R equivalent of the spreadsheet. I. Introduction Most analytical work involves importing data from outside of R and carrying out various manipulations, tests, and visualizations.

More information

USE IBM IN-DATABASE ANALYTICS WITH R

USE IBM IN-DATABASE ANALYTICS WITH R USE IBM IN-DATABASE ANALYTICS WITH R M. WURST, C. BLAHA, A. ECKERT, IBM GERMANY RESEARCH AND DEVELOPMENT Introduction To process data, most native R functions require that the data first is extracted from

More information

Statistics for Biologists: Practicals

Statistics for Biologists: Practicals Statistics for Biologists: Practicals Peter Stoll University of Basel HS 2012 Peter Stoll (University of Basel) Statistics for Biologists: Practicals HS 2012 1 / 22 Outline Getting started Essentials of

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Introduction This handout briefly outlines most of the basic uses and functions of Excel that we will be using in this course. Although Excel may be used for performing statistical

More information

R package

R package R package www.r-project.org Download choose the R version for your OS install R for the first time Download R 3 run R MAGDA MIELCZAREK 2 help help( nameofthefunction )? nameofthefunction args(nameofthefunction)

More information

command.name(measurement, grouping, argument1=true, argument2=3, argument3= word, argument4=c( A, B, C ))

command.name(measurement, grouping, argument1=true, argument2=3, argument3= word, argument4=c( A, B, C )) Tutorial 3: Data Manipulation Anatomy of an R Command Every command has a unique name. These names are specific to the program and case-sensitive. In the example below, command.name is the name of the

More information

Graphing Calculator How To Packet

Graphing Calculator How To Packet Graphing Calculator How To Packet The following outlines some of the basic features of your TI Graphing Calculator. The graphing calculator is a useful tool that will be used extensively in this class

More information

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts 6 Subscripting 6.1 Basics of Subscripting For objects that contain more than one element (vectors, matrices, arrays, data frames, and lists), subscripting is used to access some or all of those elements.

More information

Instruction: Download and Install R and RStudio

Instruction: Download and Install R and RStudio 1 Instruction: Download and Install R and RStudio We will use a free statistical package R, and a free version of RStudio. Please refer to the following two steps to download both R and RStudio on your

More information

EXCEL BASICS: MICROSOFT OFFICE 2010

EXCEL BASICS: MICROSOFT OFFICE 2010 EXCEL BASICS: MICROSOFT OFFICE 2010 GETTING STARTED PAGE 02 Prerequisites What You Will Learn USING MICROSOFT EXCEL PAGE 03 Opening Microsoft Excel Microsoft Excel Features Keyboard Review Pointer Shapes

More information

EXCEL 2003 DISCLAIMER:

EXCEL 2003 DISCLAIMER: EXCEL 2003 DISCLAIMER: This reference guide is meant for experienced Microsoft Excel users. It provides a list of quick tips and shortcuts for familiar features. This guide does NOT replace training or

More information

Applied Calculus. Lab 1: An Introduction to R

Applied Calculus. Lab 1: An Introduction to R 1 Math 131/135/194, Fall 2004 Applied Calculus Profs. Kaplan & Flath Macalester College Lab 1: An Introduction to R Goal of this lab To begin to see how to use R. What is R? R is a computer package for

More information

1 Introduction to Using Excel Spreadsheets

1 Introduction to Using Excel Spreadsheets Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)

More information

Getting Started in R

Getting Started in R Getting Started in R Giles Hooker May 28, 2007 1 Overview R is a free alternative to Splus: a nice environment for data analysis and graphical exploration. It uses the objectoriented paradigm to implement

More information

EXCEL BASICS: MICROSOFT OFFICE 2007

EXCEL BASICS: MICROSOFT OFFICE 2007 EXCEL BASICS: MICROSOFT OFFICE 2007 GETTING STARTED PAGE 02 Prerequisites What You Will Learn USING MICROSOFT EXCEL PAGE 03 Opening Microsoft Excel Microsoft Excel Features Keyboard Review Pointer Shapes

More information

University of Wollongong School of Mathematics and Applied Statistics. STAT231 Probability and Random Variables Introductory Laboratory

University of Wollongong School of Mathematics and Applied Statistics. STAT231 Probability and Random Variables Introductory Laboratory 1 R and RStudio University of Wollongong School of Mathematics and Applied Statistics STAT231 Probability and Random Variables 2014 Introductory Laboratory RStudio is a powerful statistical analysis package.

More information

Introduction into R. A Short Overview. Thomas Girke. December 8, Introduction into R Slide 1/21

Introduction into R. A Short Overview. Thomas Girke. December 8, Introduction into R Slide 1/21 Introduction into R A Short Overview Thomas Girke December 8, 212 Introduction into R Slide 1/21 Introduction Look and Feel of the R Environment R Library Depositories Installation Getting Around Basic

More information

Why must we use computers in stats? Who wants to find the mean of these numbers (100) by hand?

Why must we use computers in stats? Who wants to find the mean of these numbers (100) by hand? Introductory Statistics Lectures Introduction to R Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the author 2009 (Compile

More information

Business Statistics: R tutorials

Business Statistics: R tutorials Business Statistics: R tutorials Jingyu He September 29, 2017 Install R and RStudio R is a free software environment for statistical computing and graphics. Download free R and RStudio for Windows/Mac:

More information

Tutorial (Unix Version)

Tutorial (Unix Version) Tutorial (Unix Version) S.f.Statistik, ETHZ April 11, 2011 Introduction This tutorial will give you some basic knowledge about working with R. It will also help you to familiarize with an environment to

More information

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler JMP in a nutshell 1 HR, 17 Apr 2018 The software JMP Pro 14 is installed on the Macs of the Phonetics Institute. Private versions can be bought from

More information

Introduction to R statistical environment

Introduction to R statistical environment Introduction to R statistical environment R Nano Course Series Aishwarya Gogate Computational Biologist I Green Center for Reproductive Biology Sciences History of R R is a free software environment for

More information

Introduction to R. base -> R win32.exe (this will change depending on the latest version)

Introduction to R. base -> R win32.exe (this will change depending on the latest version) Dr Raffaella Calabrese, Essex Business School 1. GETTING STARTED Introduction to R R is a powerful environment for statistical computing which runs on several platforms. R is available free of charge.

More information

Bioinformatics Workshop - NM-AIST

Bioinformatics Workshop - NM-AIST Bioinformatics Workshop - NM-AIST Day 2 Introduction to R Thomas Girke July 24, 212 Bioinformatics Workshop - NM-AIST Slide 1/21 Introduction Look and Feel of the R Environment R Library Depositories Installation

More information

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center What is R? R is a statistical computing environment with graphics capabilites It is fully scriptable

More information

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Command Line and Python Introduction Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Today Assignment #1! Computer architecture Basic command line skills Python fundamentals

More information

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows Oxford Spring School, April 2013 Effective Presentation ti Monday morning lecture: Crash Course in R Robert Andersen Department of Sociology University of Toronto And Dave Armstrong Department of Political

More information

Introduction to R. Course in Practical Analysis of Microarray Data Computational Exercises

Introduction to R. Course in Practical Analysis of Microarray Data Computational Exercises Introduction to R Course in Practical Analysis of Microarray Data Computational Exercises 2010 March 22-26, Technischen Universität München Amin Moghaddasi, Kurt Fellenberg 1. Installing R. Check whether

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

3. Data Tables & Data Management

3. Data Tables & Data Management 3. Data Tables & Data Management In this lab, we will learn how to create and manage data tables for analysis. We work with a very simple example, so it is easy to see what the code does. In your own projects

More information

Learning Worksheet Fundamentals

Learning Worksheet Fundamentals 1.1 LESSON 1 Learning Worksheet Fundamentals After completing this lesson, you will be able to: Create a workbook. Create a workbook from a template. Understand Microsoft Excel window elements. Select

More information

Chapter 1: An Overview of MATLAB

Chapter 1: An Overview of MATLAB Chapter 1: An Overview of MATLAB MATLAB is: A high-level language and interactive environment for numerical computation, visualization, and programming MATLAB can: Be used as a calculator, easily create

More information

MATLAB COURSE FALL 2004 SESSION 1 GETTING STARTED. Christian Daude 1

MATLAB COURSE FALL 2004 SESSION 1 GETTING STARTED. Christian Daude 1 MATLAB COURSE FALL 2004 SESSION 1 GETTING STARTED Christian Daude 1 Introduction MATLAB is a software package designed to handle a broad range of mathematical needs one may encounter when doing scientific

More information

Customization Manager

Customization Manager Customization Manager Release 2015 Disclaimer This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references, may change without

More information