R for Wildlife Ecologists (Quick Reference Guide)

Size: px
Start display at page:

Download "R for Wildlife Ecologists (Quick Reference Guide)"

Transcription

1 R for Wildlife Ecologists (Quick Reference Guide) Bret Collier Institute of Renewable Natural Resources, Texas A&M University, College Station, Texas 77845; 979/595/50706 Contents 1 Course Introduction 3 2 Starting in R R Basics Simple Programming R Objects Classes and Modes R Project and Data Management Working directories Importing and exporting data Creation of, types, and working with data: a super short primer Basic Mathematical/Operators R Creating Graphics Scatterplots Other Simple plots Statistical Models with R Contingency Tables Linear Regression Generalized Linear Models Writing Functions in R Functions Wildlife-Specic Methods Capture-Recapture Analysis Distance Sampling Spatial Models Contact After 1 March: School of Renewable Natural Resources, Louisiana State University, bcollier.work@gmail.com or 979/595/5076 1

2 8 Literature Too Look At! Here is a pretty short list of good books to get on your shelves R packages that I use regularly and a few websites that will make your life easier,

3 1 Course Introduction First, since you are reading this you have taking the rst steps towards freeing yourselves from the forced servitude of point and click statistical programs that control your data's structure and drive your data analyses. No longer will you be told what numbers will be available for you to interpret or what statistical tests you should use. Rather, we are going to move, together today, into computing on the data and questions of interest where you decide the how to develop, manipulate, examine, and interpret statistical results. Our philosophy for today is simple, using R to become better analysts, as described by Dr. Harrell, library(fortunes) fortune("good data analyst") Can one be a good data analyst without being a half-good programmer? The short answer to that is, 'No.' The long answer to that is, 'No.' -- Frank Harrell 1999 S-PLUS User Conference, New Orleans (October 1999) Additionally, for todays's work, we are going with the motto that GUIs normally make it simple to accomplish simple actions and impossible to accomplish complex actions. (I read this somewhere but cannot remember who said it, if you do let me know). Thus, today is going to be all about programming, get ready! Since everyone is a scientist here, you have probably realized that you are going to need at least a basic understanding of R. But, that's ok, because understanding R will benet you long-term. The good thing(s) about R, to list a few, include: 1. There is a wealth of online documentation related to the use of R. Just look at the R homepage ( for a host of useful links. 2. There are huge numbers of freely available R packages that can be used to perform specic analyses and you can develop packages that archive your data and code so that other folks can see/use it just as easy ( packages/). 3. Because R is a exible environment, there are entire elds of study (e.g., Analysis of Spatial Data) on which there are a wide range of approaches developed to conduct various analyses (some of which can be seen at Additionally, Springer has an entire series called Use-R ( statistics) consisting of books published on various R-related statistical topics. 4. R is not just for analysis, but merges seemlessly into the writing of theses & dissertations, books, articles, presentations, course notes, etc. Our course notes for today were written entirely in LYX (pronounced 'Licks') using the R package knitr ( knitr/) to 'knit the R code and the text (ported via TEX/L A TEX, which is pronounced 'Tech' and 'La Tech'-note that all pronunciation is open to interpretation depending on whether you are a American English or English English speaker it seems) and is entirely reproducible on your computers. This integration of R into dynamic document presentations is the foundation of literate programming and is well grounded in the process of reproducible research ( 3

4 5. R is libre (open source) and gratis (freeware) software ( gpl.html)-think: freedom of speech (libre) and free as in beer (gratis). Now, the R downside that I want to put right out front for you. 1. R is a programming environment. If you are not used to developing programming code, or just don't have any experience programming, then the learning curve will be steep initially (but we will solve some of that today). 2. Because R is a programming environment, it will not 'do' things for you that you are used to having done for you by various programs. If you want R to do something with your data, you have to tell it to and you have to know what the outcome should look like so you can ensure what you told R to give you and what R gave you are the same thing. Throughout this document, I am using notes that I have pulled together, presented elsewhere to other audiences, borrowed from friends, etc. So, I am probably not giving enough credit where it is due (I am admitting to plagarism right here) but since these are notes and I want them to be comprehensive as possible while focused on the issues I think you all need to know, too bad. The following is repeated from the quick start guide I sent out earlier, but I wanted it in here as well just for consistency: Your rst stop(s) (preferably before we meet ) are listed below, these are the main ones, but there are tons of other sites you can frequent if your interested and do a little searching: R Project website: R FAQ: (general/os specic FAQs on here) R Manuals: CRAN (Comprehensive R Archive Network): R Search: Texas A&M University that has worked up some R video's that are interesting: //dist.stat.tamu.edu/pub/rvideos/. http: 4

5 2 Starting in R 2.1 R Basics First and foremost, these are not notes on how to do your particular kind of statistics. I won't be teaching statistics, I will be talking more about programming on the R language. Thus, you will see more text when I am talking about how R works, and less when I am applying R to a specic instance (e.g., to linear regression). With that in mind, we will not cover every bit of code/text in this document during the workshop, rather, I wanted this reference to be useful to you in the future, but I will highlight the immediately relevant parts as we work through the code examples for today. Working with a language like R requires 2 basic things: time and interaction, thus, you have to practice. So, your rst goal will be to stop using whatever program you have been using for data management, manipulation, and analysis. For some of you, this is excel, which is bad, because excel sucks, makes bad graphs, and gives wrong answers, and well, it sucks and you should not use it for anything other than simplifying to.csv data, cause it sucksexcel SUCKS DONT USE IT! fortune("microsoft excel") Friends don't let friends use Excel for statistics! -- Jonathan D. Cryer (about problems with using Microsoft Excel for statistics) JSM 2001, Atlanta (August 2001) I should note, that as I started to work up these course notes, I am probably starting at a level that is very basic to many of you. The reason for this is multi-fold; 1) I don't know what level of experience you all have, so I gure it is best to begin at the beginning, 2) thorough understanding of the basics is important, as you will waste considerably more time with data formatting and getting it into R early in your career than you will actually running any analyses, and 3) I want the basics to be included so someone could run through everything without help. However, these notes are by no stretch of the imagination anything near comprehensive. 2.2 Simple Programming Lets start with the most basic, R as a really nice calculator. For instance, if I need to add, I can add [1] 4 Amazing, right? I can, if I want, create a sequence of numbers going one direction or the other. 1:10 [1] :1 [1]

6 Yep, that's cool. I can create a plot (more on this later). hist(rnorm(100), xlab = "", las = 1, main = "Yay, a plot!") Yay, a plot! Frequency Wow, addition, number sequences, plots, all in little code snippets that are completely reproducible. What is this magic elixir you are showing us...well, I can write a function that tells you... my.function = function(x) { ifelse(x > 1, "Bud Light is not good beer", "R is like good beer") } my.function(1) [1] "R is like good beer" my.function(2) [1] "Bud Light is not good beer" 2.3 R Objects Everything in R is a object, and each object has a set of attributes associated with it that describe the objects contents and how it can and should be used. Frequently, when working in R, several calculations may be dependent upon each other. Thus, you will want to save those results for future use by assigning those results to a object. In R, the usual assignment operator is <- (e.g., x<-1, so 1 is assigned to x, or 'x gets 1'). You are probably wondering why you cannot use a = sign (e.g., x=1, so 1 is equal to x). Based on my work, they are fairly interchangable, although I have run into some situations where <- was required when I was doing some simulations that required a for() loop. Most folks use <-, and unless you are writing functions (where you have use argument=value) or the boolean equality operator (==) you can use either, but I prefer to just use an = sign. So, using the assignment operator, you can assign objects any name you choose. Object names can be upper- or lower-case letters, numbers, underscore (_), or periods(.). Good 6

7 programming practices are to assign objects to begin with a letter, not a number or period. Also, realize that R is case sensitive (I have made this mistake many times), for instance as I show in the below, I assigned a lower case x to be 2+2, a upper case X to be 3+3, and then I printed both and show that R will tell you whether or not x and X are equal or not (note this is one of those places where the = sign would cause problems as if I had not use the negation character (!), x would get X, or be equal to 3+3. x = X = x [1] 4 X [1] 6 x!= X [1] TRUE Important to note here, there are certain letters, words, etc. that are used in R that you should not use. For instance, never, ever, ever, EVER call your datasets data, quoting B. Ripley from a R fortune You would not call your dog, dog would you. data() is actually at R function for loading datasets, so changing what it means is kind of a problem. Another example is c(), which is a function that concantenates data together into a vector and allows you to give the vector a name and operate on it as shown below, thus changing what its does is probably not a good idea. x <- c(2, 4, 6, 8) x [1] x * 2 [1] Finding out what you can or cannot label things is somethings a trial and error process, but, when in doubt, use a underscore or a period in your object names as that reduces the chance of mis-naming something. But, when in doubt, you can always type in the name into R and see if it is already used, such as, c function (..., recursive = FALSE).Primitive("c") 2.4 Classes and Modes This part will be kind of painful, but its important so read it once and then move on. Since we now know how R uses the assignment operator to specify an object, we need to consider that each R object has attributes associated with it. Attributes describe the contents of the R 7

8 object and how the object can and should be used. Probably the most important attributes of an R object is the class and mode of the object. There are several functions to evaluate the structure of your data, mainly 'mode', 'class', and 'str' (which, if you see below, will tell you the mode and the value(s) of the object, which is very useful when dealing with data frames or lists). x < x [1] 4 mode(x) [1] "numeric" class(x) [1] "numeric" str(x) num 4 Thus, you can see that a number gets a mode 'numeric' and a class 'numeric'. Additionally, there are several other modes, mainly atomic, complex, and raw; none of which you should expect to see with any frequency in your work unless you really get into the programming end of things. There is also a logical model, which has values TRUE, and FALSE (never T or F) and character mode, which is a character (specied with quotation marks). mode(true) [1] "logical" class(true) [1] "logical" mode("lyla") [1] "character" class("lyla") [1] "character" Finally, you can verify whether an object has a particular mode or is a member of a particular class using one of several R function to test if an object is of a specic type where you can see the x is in fact numeric (TRUE) and not factor (FALSE). Some of these predicate functions are: is.numeric, is.factor, is.list, and so on. x <- 2 mode(x) [1] "numeric" 8

9 is.numeric(x) [1] TRUE is.factor(x) [1] FALSE mode(as.integer(2)) [1] "numeric" class(as.integer(2)) [1] "integer" But, you can also have mixed vectors of numeric and character, which R will convert all to character values but you can change character values back to numeric using as.numeric. test2 <- c("a", 2, "C") test2 [1] "A" "2" "C" class(test2) [1] "character" as.numeric(c("1", "2", "3")) [1] Finally, in addition to having classes and modes, vectors have length attributes, which you can get using the length function. length(c("a", 2, "C")) [1] 3 Now, before we nish, we need to real quick touch on factors and how they are stored in R as factor objects are of numeric mode, but with a class attribute such that character attributes are displayed even though the storage mode is numeric. For example, stealing some notes from my friend Je, see the below. What is actually being stored in my.factor is a numeric vector c(2, 1, 3) because the levels are alphabetical, hence B is the second so by default it gets a 2, A is rst so it gets a 1, and so on. my.factor <- c("b", "A", "C") my.factor <- factor(my.factor) mode(my.factor) [1] "numeric" class(my.factor) [1] "factor" 9

10 my.factor [1] B A C Levels: A B C 10

11 3 R Project and Data Management Here I want to talk a bit about managing R projects, importing and managing data, and manipulation of datasets in R. So, rst of all, this will be a ridiculously short primer on the topic, as there are entire books written on data manipulation in R (e.g., Spector's Data Manipulation with R book) and there are quite a few really great packages for working with various types of data (if you have not ever heard of Hadley, then see his R packages for data manipulation at I will highlight a few of the base R methods, and point out a a few functions I nd extremely useful, and then your o on your own to go forth and prosper. 3.1 Working directories So, now you have R installed and started on your computer. One thing that some folks nd handy is to set a working directory, or a place where a particular project will be house. You don't have to use a working directory, but it can be helpful to set a working directory for projects in R that involve more than 30 seconds of thought. In simple terms, a working directory is exactly that, a directory where all the work on a particular project will be conducted, where you R session information will be saved, where R will look for any les or source data functions you want to use when you are working, and where any output you create and write from R will go. There are several ways to set a working directory, for example, in Windows you could open R and go to File >Change dir and set the working directory to any location (e.g., for instance, you can create a folder called RCourse and put in the Documents section of your computer). However, when I used working directories I tend set the working directory specic to each analysis project that I conduct using setwd() using the PATH format for each of the 3 standard operating systems (these are based o of my various work machines I use for R package builds and computer programming stu, your paths will be dierent). Note that the slashes are forward (/) not back (\) slashes in the PATH name: Linux: setwd(/home/bret/bretresearch/workshops/txtws_rworkshop/) Windows: setwd(c:/users/bret.collier/documents/workshops/txtws_rworkshop/) Mac: setwd(/users/bretcollier/bretresearch/workshops/txtws_rworkshop/) There are several nice things about working directories, but the main one is that after you set a working directory, then when you need to load data into your workspace, or save data or graphs, then having a working directory set saves lots of time. For example you could write a code snippet where you dene where you want R to go look for the data you are interested in analyzing: example.data <- read.csv("f:/rio209.csv", header = TRUE) head(example.data) ID Lat Lon Date :57: :57: :57: :57:34 11

12 :57: :57:32 Which shows R where the datale you want to load is, tells R to go out and load it, and then gives that datale the R object name 'data'. If you type something wrong (which you will), you will get this: bad.data <- read.csv("f:/rrio209.csv", header = TRUE) Warning: cannot open file 'F:/RRio209.csv': No such file or directory Error: cannot open the connection head(bad.data) Error: object 'bad.data' not found However, if you are importing multiple datasets, or planning on exporting multiple datasets or graphics, the perhaps a better option is below where you set a working directory rst, then R knows where to go to look for your data, and where to put anything you output. setwd("f:/") same.data <- read.csv("rio209.csv", header = TRUE) head(same.data) ID Lat Lon Date :57: :57: :57: :57: :57: :57:32 Working directories have some downfalls, in that if you are sourcing in from various workspaces, or if all your R work is housed in a single workspace to simplify project management and package development (like mine is, ask if you want to see my setup), then using a setwd() can be a pain. And yes, I know I just showed you how to load data and that was supposed to come later, don't freak out, it was just an example. 3.2 Importing and exporting data Probably the simplest method for loading a small (or large) dataset when all the data is of the same mode is to use the brilliantly named set of read functions read.foo functions where 'foo' is a name such as.txt,.csv, etc.. So, just as an example, using the Rio209.csv le above, you can read it into your R session in a variety of ways. First, you can just read it straight in: example.data <- read.csv("f:/rio209.csv", header = TRUE) head(example.data) ID Lat Lon Date :57:32 12

13 :57: :57: :57: :57: :57:32 You can identify a working directory and read it in from there: setwd("f:/") same.data <- read.csv("rio209.csv", header = TRUE) head(same.data) ID Lat Lon Date :57: :57: :57: :57: :57: :57:32 Ok, your probably thinking: how in the heck do I know what the function is to read data in? Well, R has a nice little object identier for the question mark when, if typed into the console in front of a function name, will open the help les for the R function of interes. For instance, using?read.table will open the help les for the read.table() function, and all the other read.foo() functions available in base R. For those of you who work with databases on a regular basis, then there is a R package RODBC that is extremely useful for opening connections with various ODBC database structures and importing tables of data, either as is or using SQL language queries to specify exactly what is needed. 3.3 Creation of, types, and working with data: a super short primer Vectors We all know you can really not do too much fancy mathematics on a scalar (vector with 1 value) so, we need to look into the wide variety other methods for working with data in R. Now, we are going to start stepping into the creation and manipulation of several dierent types of data within R. You will see a wide variety of things coming up here, creating of data using random number generators, sequences of data, combinations of numeric and factor data, creation and manipulation of vectors, matrices and operations on these vectors and matrices. This is probably what you were all more interested in using as I will start outlining some specic R functions for doing specic tasks. First, remember that you can create a simple vector as: c.data <- c(10, 21, 13, 34, 25) c.data [1] Ok, so now we have a vector called c.data in our workspace. So, for instance, R excels at vectorized operations, so we can do vectorized arithmetic on it, or perhaps do write some code to estimate summary statistics for the data in the c.data vector: 13

14 c.data/2 [1] xbar <- sum(c.data)/length(c.data) xbar [1] 20.6 std.dev <- sqrt(sum((c.data - xbar)^2)/(length(c.data) - 1)) std.dev [1] Or, since R is a statistical program, we could just use the R internal functions for mean and standard deviation to get the same answers mean(c.data) [1] 20.6 sd(c.data) [1] Back to vectors for the time being. There are lots of other ways to create vector data using functions that create sequences of data. For example, we can use a colon (:), a sequence operator or repeat function (seq, rep). For numeric arguments, a:b will generate a sequence of ordered data from a to b. If a and b are integers, then so is the sequence, if not, the are of type double. 1:10 [1] :11 [1] :5 [1] :-5 [1] The colon (:) operator cannot be used with letters (e.g., A:F will not get you a vector = a, b, c, d, e, f) as R will expect the values to be named objects. So, you would typically work with sequences of letters and numbers by combining them (either brute force or via the interaction function) data frames: num.factor <- factor(1:4) alpha.factor <- factor(c("a", "b", "c", "d")) num.factor:alpha.factor 14

15 [1] 1:a 2:b 3:c 4:d 16 Levels: 1:a 1:b 1:c 1:d 2:a 2:b 2:c 2:d 3:a 3:b 3:c 3:d 4:a 4:b... 4:d interaction(alpha.factor, num.factor) [1] a.1 b.2 c.3 d.4 16 Levels: a.1 b.1 c.1 d.1 a.2 b.2 c.2 d.2 a.3 b.3 c.3 d.3 a.4 b.4... d.4 The colon operator is a simple method for vector creation, but we could also use the seq (sequence) function which can be used with numeric, dates, times, and we can make the sequences change by values other than +1 or -1: seq(from = 2, to = 6, by = 1) [1] seq(2, 6, 1) [1] seq(2, 6, 0.5) [1] seq(-2, 2, 0.5) [1] seq(from = as.date(" "), to = as.date(" "), by = 5) [1] " " " " " " " " " " [6] " " In addition, there is a general rep (repeat) function that can be used to generate repeated sequences of vectors combining data of any mode. Underlying rep are internal options called 'each' and 'time' both which will aect how your data is put into the sequence: rep(1:3, each = 3) [1] rep(1:3, time = 3) [1] rep(alpha.factor, each = 2) [1] a a b b c c d d Levels: a b c d rep(alpha.factor, time = 2) [1] a b c d a b c d Levels: a b c d Because R is so handy, we can actually nest various functions to create data sequences: 15

16 rep(seq(1, 4, 1), each = 3) [1] rep(rep(c(1, 2), 2), each = 3) [1] # or alternatively rep(c(rep(1, 3), rep(2, 3)), 2) [1] Now, one of the things we often want to do is look at a specic value in a vector. Luckily, values in your vector (or whatever data object you are using) are subscripted by R, so we can extract a subset of a vector simply and eciently, which not surprisingly is called subscripting. Subscripting can be inclusive (what to include) or exclusive (what to exclude) and syntax can use names, numeric, or logical subscripts. So, as an example, consider the sequence of data from 5 to 50 by 5's and our interest is in extracting the 7th element. sub.seq <- seq(5, 50, 5) sub.seq [1] sub.seq[7] [1] 35 We could be interested in every value except for the 4th value, which we want to exclude: sub.seq[-4] [1] You can extract or exclude >1 element: sub.seq[c(2, 3, 6)] [1] sub.seq[-(3:5)] [1] In addition to regular subscripting, logical subscripting is a powerful method of subsetting data. Remember, there are quite a few logical operators (<, >, <=, >=; less than, greater than, less than or equal to, greater than or equal to) you have seen before. Equality uses a double = (==) and exclusion of equality uses the not (!) operator. Logical operation compares two objects using a operator and returns a logical vector (e.g., the vector will consist of TRUE or FALSE values): 16

17 sub.seq [1] sub.seq > 30 [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE sub.seq < 30 [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE sub.seq == 25 [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE sub.seq!= 25 [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE sub.seq == 26 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Logical operators can be combined with other operators like & (and) (or) and! (not): lt.50 <- sub.seq < 50 gt.20 <- sub.seq > 20 lt.50 & gt.20 [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE!lt.50!gt.20 [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE and you can subscript with logicals as well: sub.seq[lt.50 & gt.20] [1] sub.seq[!lt.50!gt.20] [1] Matrices A matrix is two-dimensional array of vectors, viewed as row vectors or column vectors, where each vector is of the same length and mode. Matricies are used for many research purposes in statistics, and typically consist of numeric variables. Matricies have 2 dimensions, the number of rows and the number of columns. One convenient way to create matrices is through the matrix function, or by using the diag() function to create a diagonal matrix. For example, consider a simple vector x that we want to put into a matrix 'x' with 4 rows and 2 columns: 17

18 x <- 1:8 dim(x) <- c(4, 2) x [,1] [,2] [1,] 1 5 [2,] 2 6 [3,] 3 7 [4,] 4 8 Of course, we lled that matrix with data (vector x), but we could have just as easily created a matrix with all the same values, or create the matrix above using the matrix function: my.matrix <- matrix(1, nrow = 4, ncol = 2) my.matrix [,1] [,2] [1,] 1 1 [2,] 1 1 [3,] 1 1 [4,] 1 1 my.matrix2 <- matrix(1:8, nrow = 4, byrow = TRUE) my.matrix2 [,1] [,2] [1,] 1 2 [2,] 3 4 [3,] 5 6 [4,] 7 8 You can look at the dimensions of your matrix using some of the internal functions in R, and you can alter the dimensions of a matrix as long as the overall size is the same: ncol(my.matrix) [1] 2 nrow(my.matrix) [1] 4 dim(my.matrix) [1] 4 2 dim(my.matrix) = c(2, 4) my.matrix [,1] [,2] [,3] [,4] [1,] [2,] and we can create diagonal matrices: 18

19 my.matrix <- diag(1, nrow = 4, ncol = 4) my.matrix [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] diag(my.matrix) <- 1:4 my.matrix [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] Obviously you can specify in what order you want the cells of a matrix lled: matrix(1:16, nrow = 4) [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] matrix(1:16, nrow = 4, byrow = TRUE) [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] Now, we are not going to get into matrix algebra now (although the commands are readily available for things like multiplication (%*%, transpose using (t), inverse using solve(x)). But, we are going to talk about subscripting and some basic mathematics you can do on matricies. Multi-dimension objects like matricies require a dierent approach to subsetting than vectors as there is an option of an empty subscript or a null dimension. Consider the below matrix: set.seed(10) mat <- matrix(rpois(16, 4), nrow = 4) mat [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] Which have 4 rows and 4 columns. If we were interested in the element in the matrix that was in the 3rd row and the 3rd column, then we would extract that element (5), or, if 19

20 we wanted to extract an entire row we could identify rows 1 through 4 of the rst column as below: mat[3, 3] [1] 5 mat[1:4, 1] [1] However, matrix subscripting has been made much simpler, because of the null dimension, we can extract entire rows and columns simply and eciently. The trick is to use a comma (,). Thus, for accessing entire rows and/or columns, you can just leave out the subscript for the dimension you are not interested in. Remember, R will return these values as a vector, not a matrix, so, if you want the information you extract to remain a matrix, you need to add drop=false to you code (which means you could actually subscript from that matrix as well): mat[, 1] [1] mat[2, ] [1] smaller.mat <- mat[1,, drop = FALSE] smaller.mat [,1] [,2] [,3] [,4] [1,] smaller.mat[1] [1] 4 Remember our earlier discussion on logical subscripting, works here too: mat > 3 [,1] [,2] [,3] [,4] [1,] TRUE FALSE TRUE FALSE [2,] FALSE FALSE FALSE TRUE [3,] FALSE FALSE TRUE FALSE [4,] TRUE FALSE TRUE FALSE mat[mat > 3] [1] mat[mat > 3] = -22 mat [,1] [,2] [,3] [,4] 20

21 [1,] [2,] [3,] [4,] Dataframes Dataframes are the typical structures folks use to store data for analysis (most of you would call dataframes a spreadsheet). Similar to matrices in that dataframes have column vectors of the same length (same number of rows), but dierent in that dataframes can have column vectors of dierent modes. Most data in ecology is of mixed modes, consisting of some combination of numeric, character, or factor information. So, it benets us to learn how R treats data and what options there are for managing data. First, just so it is easiest, I am going to use a data le that currently resides in R called iris which is the famous Andersons/Fisher Iris measurement data (cm) for 50 owers from 3 species of Iris's. There are a number of datasets provided with the base distribution of R, which you can access by typing data() into the R console. iris is actually a internal dataset that is distributed with R so we will just load it from within R below. For this part, sometimes we just want to look over the data that you imported into R. Luckily, there are a couple of simple ways to look at the datale, or parts of it, within R. Using the iris data, we can extract relevant information on the dataframe using functions such as str and names, for instance: data(iris) head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa setosa setosa setosa setosa setosa str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num $ Sepal.Width : num $ Petal.Length: num $ Petal.Width : num $ Species : Factor w/ 3 levels "setosa","versicolor",..: names(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" [5] "Species" summary(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3 21

22 Median :5.80 Median :3.00 Median :4.35 Median :1.3 Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8 Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5 Species setosa :50 versicolor:50 virginica :50 But, we could also be interested in working with specic columns within a data frame. There are a couple of ways to access and manipulate/summarize specic colums in a dataframe in R. First, you can extract from a specic column by using the $, such as (just showing the rst 10 records for simplicity): iris$sepal.length[1:10] [1] You can summarize those columns individually if you so choose: mean(iris$sepal.length) [1] var(iris$sepal.length) [1] sd(iris$sepal.length) [1] summary(iris$sepal.length) Min. 1st Qu. Median Mean 3rd Qu. Max Or, another option that some people like when working in R is to attach their data using the attach function (see?attach). Then you can direct access your data based on the column names without identifying the dataframe. I tend not to do this as I don't like having dataframes attached, especially if I am working with multiple frames with the same column names (e.g., GPS data from multiple critters that have the same data columns), but I will quickly for this example do it once: attach(iris) Sepal.Length[1:10] [1] mean(sepal.length) [1] detach(iris) 22

23 Lists Lists are the most general structure in R and provide a way for the user to store a collection of data objects in one location, primarily because there is no limitations on the mode of the objects that a list may hold. Lists can have elements that can contain any other object, such as a dataframe, a matrix, a vector, scalar, etc. A list is a vector with mode list. But, lists are often weird for folks to understand, so as an example, rst I am going to create a fairly simple list and do some subscripting and manipulation of that list, then, I will compile a more complicated list and show how to manipulate that list. First, consider a simple set of vectors: x <- c(11, 34, 56, 17) y <- c("bret", "Reagan", "Kennedy", "Lyla") z <- c(10) x [1] y [1] "Bret" "Reagan" "Kennedy" "Lyla" z [1] 10 Here, I am combining the vectors above into a list, which has a mode of 'list' and 3 unique objects (list.a, list.b, list.c): simple.list <- list(list.a = x, list.b = y, list.c = z) mode(simple.list) [1] "list" simple.list $list.a [1] $list.b [1] "Bret" "Reagan" "Kennedy" "Lyla" $list.c [1] 10 Now, we can extract (via subscripting) elements from the list: simple.list[1] $list.a [1] simple.list[2] $list.b [1] "Bret" "Reagan" "Kennedy" "Lyla" 23

24 simple.list[3] $list.c [1] 10 Okay, so, based on what we know about R, we should be able to use a internal function like mean() on a list element and get the mean of the list.a portion, for instance. mean(simple.list[1]) Warning: argument is not numeric or logical: returning NA [1] NA What, it gave us an NA? This is because $list.a is actually a list containing the vector x. So, to apply operations to elements of a list, you have to identify specically the elements you want to analyze. In our (and most) situation, the elements of the list have been named, so you can access said elements using the name of the element with a dollar sign ($), like you would to extract columns from a dataframe (which, you may or may not have noticed, dataframes are lists where the list elements are the dataframe columns). Additionally, sometimes you want to access list elements via their index or a name, then you can use double bracketing ([[]]) to subscript lists (this is especially important when writing functions that return lists as the function result). mean(simple.list$list.a) [1] 29.5 mean(simple.list[[1]]) [1] 29.5 mean(simple.list[["list.a"]]) [1] 29.5 Well, that is simple enough. Lets try a more complicated list just so we have an example in our notes. So, my little complicated list will be the c.data from earlier, which is a numeric vector of length 5, the iris dataframe, and a made up character vector from earlier with my families names in it (Bret, Reagan, Kennedy, Lyla): complicated <- list(c.data = c.data, iris.data = iris, family = y) str(complicated) List of 3 $ c.data : num [1:5] $ iris.data:'data.frame': 150 obs. of 5 variables:..$ Sepal.Length: num [1:150] $ Sepal.Width : num [1:150] $ Petal.Length: num [1:150] $ Petal.Width : num [1:150] $ Species : Factor w/ 3 levels "setosa","versicolor",..: $ family : chr [1:4] "Bret" "Reagan" "Kennedy" "Lyla" 24

25 Now, lets assume I want to extract the rst 10 rows of the list object iris.data and nd the mean and variance for the iris dataframe element Sepal.Length. In addition, lets try to use the internal R function summary() to summarize the irisdata for us as well using a couple of dierent approaches. complicated$iris.data[1:10, ] Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa mean(complicated$iris.data$sepal.length) [1] mean(complicated[[2]]$sepal.length) [1] var(complicated[[2]]$sepal.length) [1] summary(complicated$iris.data) Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3 Median :5.80 Median :3.00 Median :4.35 Median :1.3 Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8 Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5 Species setosa :50 versicolor:50 virginica :50 summary(complicated[[2]]) Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3 Median :5.80 Median :3.00 Median :4.35 Median :1.3 Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8 25

26 Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5 Species setosa :50 versicolor:50 virginica :50 A few thoughts on data manipulation We need to talk about summarizing and/or aggregating data as this is probably something, that at one time or another, you will have to do. Now, the dierent ways you can summarize data are pretty much limited only by your imagination or programming skills, so it is a huge waste of eort to focus on all the dierent ways to aggregate data so I am just going to scratch the surface here to give you a general idea of what can be done. First, R has a variety of internal functions set up that allow for ecient summarization of various data types we have discussed early, things like mean, median or range so just to repeat those here using the iris data: summary(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3 Median :5.80 Median :3.00 Median :4.35 Median :1.3 Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8 Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5 Species setosa :50 versicolor:50 virginica :50 mean(iris$sepal.length) [1] median(iris$sepal.length) [1] 5.8 range(iris$sepal.length) [1] Often our interest is in aggregatting data, and there are a ton of ways to do that, including table or subset: 26

27 dogs <- c("springer", "Bulldog", "Springer", "Mutt", "Chihuahua", "Bulldog") dog.table <- table(dogs) dog.table dogs Bulldog Chihuahua Mutt Springer dog.table["springer"] Springer 2 as.data.frame(dog.table) dogs Freq 1 Bulldog 2 2 Chihuahua 1 3 Mutt 1 4 Springer 2 subset(iris, iris$sepal.length > mean(iris$sepal.length))[1:10, ] Sepal.Length Sepal.Width Petal.Length Petal.Width Species versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor Ok, getting a bit more complicated, this section is going to be about applying functions, either pre-dened or user-dened, to repeatedly conduct a set of calculations specic to dierent values of the data. Makes no sense, does it? Well, it will. As a real quick example, consider a simple matrix with 2 rows and 2 columns. Now, based on our previous examples you would know how to create this matrix, px}erhaps using the matrix function. Now, matrices have dimensions as you have often seen them described such as 2x3, or 3x5, or 1x1 (which is a scalar by the way). Now, in R, the dimensions of the matrix are referred to as margins, which will be important later. So, consider the following loop set up for getting the row sums and columns sums from that matrix: loop.matrix <- matrix(1:4, nrow = 2, ncol = 2) loop.matrix [,1] [,2] [1,] 1 3 [2,] 2 4 row.sums <- vector("numeric", nrow(loop.matrix)) # Loop over the rows and sum the elements 27

28 for (i in 1:nrow(loop.matrix)) row.sums[i] = sum(loop.matrix[i, ]) row.sums [1] 4 6 col.sums <- vector("numeric", ncol(loop.matrix)) # Loop over the columns and sum the elements for (i in 1:ncol(loop.matrix)) col.sums[i] = sum(loop.matrix[, i]) col.sums [1] 3 7 What do you know, we have written a short piece of code to estimate row and column sums from a matrix. But, come on, this is R, there has to be something better. Luckily, there is something better, its the family of apply statements. Now, you can do?apply to look at the specics, but in a nutshell is apply(yourdata, margin you are interested in looking at, function you want to apply to that margin). Remember that in a matrix there 2 margins, rows (margin=1) and columns (margin=2). loop.matrix [,1] [,2] [1,] 1 3 [2,] 2 4 apply(loop.matrix, 1, sum) [1] 4 6 apply(loop.matrix, 2, sum) [1] 3 7 So, lets make up a little bit bigger matrix so we can mess with it some data. Now, we just did a simple sum of the rows (margin=1) and the columns (margin=2): big.matrix <- matrix(1:12, nrow = 3, ncol = 4) big.matrix [,1] [,2] [,3] [,4] [1,] [2,] [3,] apply(big.matrix, 1, sum) #rows [1] apply(big.matrix, 2, sum) #columns [1] apply(big.matrix, 1, mean) #mean rows [1] apply(big.matrix, 2, mean) #mean columns [1]

29 Note, however, R also has pretty nice little functions for simple cases like sum, mean, etc. that will work in this case as well: rowsums(big.matrix) [1] colsums(big.matrix) [1] rowmeans(big.matrix) [1] colmeans(big.matrix) [1] Also, note that if NA's exist, you can use na.rm in these apply functions: big.matrix[2, 2] = NA big.matrix [,1] [,2] [,3] [,4] [1,] [2,] 2 NA 8 11 [3,] apply(big.matrix, 1, sum) [1] 22 NA 30 apply(big.matrix, 1, sum, na.rm = TRUE) [1] rowsums(big.matrix, na.rm = TRUE) [1] Most of the time, you will probably want the structure of the data you are looping over to be returned to you in the same form as your original data. If you have a list, then lapply is your friend. Making up a quick list of data, and evaluating the list using lapply will return a list. Notice that c is a vector of character values, so when you try to take the mean, you should get an NA: my.list <- list(a = 10:20, b = rnorm(10), c = c("a", "b", "A", "b", "A", "b", "A", "b")) lapply(my.list, mean) Warning: argument is not numeric or logical: returning NA $a [1] 15 29

30 $b [1] $c [1] NA If we don't want a list returned, we could use sapply which would return a vector or a matrix: sapply(my.list, mean) Warning: argument is not numeric or logical: returning NA a b c NA Uses for the various apply statements are wide-ranging, so I am showing only quick examples here as you will need to just go and play with them some to see what works best for you. Here is an example I use for estimating survival in a simulation model with demographic stochasticity (e.g., everyone survives based on a random draw from a binomial with probability equal to the user dened survival estimate): No.alive <- c(100) low.survival < high.survival < low <- sapply(lapply(1, function(i) sample(x = c(1, 0), replace = T, size = No.alive, prob = c(low.survival, 1 - low.survival))), sum) high <- sapply(lapply(1, function(i) sample(x = c(1, 0), replace = T, size = No.alive, prob = c(high.survival, 1 - high.survival))), sum) low [1] 21 high [1] 73 There are some pretty useful internal R functions for aggregating data, such as oh, I don't know, aggregate which work pretty well, for instance, with the iris data aggregate(iris[, 1:4], list(species = iris[, 5]), mean) Species Sepal.Length Sepal.Width Petal.Length Petal.Width 1 setosa versicolor virginica Basic Mathematical/Operators First, how do we use R as a calculator (and why are we doing this now and not at the beginning)? Since R is interactive, you want to use R to do some basic calculations so you get the hang of it as the basic calculations are what build up to be fairly complex calculations. 30

31 So, here are a few really quick examples showing how R can be used to get the result for any equation by typing in the equation and it will return to you as shown below: [1] 2 sqrt(8) [1] exp(1) [1] For each example, the result is a vector containing a single number. The [1] that you see before each value represents the fact that after R computes a result, it is calling a generic (default) print function to display the contends of the vector. For example, you could call the print function explicitly: print(1 + 1) [1] 2 print(sqrt(8)) [1] print(sqrt(8), digits = 5) [1] print(sqrt(8), digits = 10) [1] and, to be honest, you can probably get more precision than you would ever need: print(sqrt(8), digits = 20) [1] R can do pretty much any basic mathematical operation you need. [1] [1] 2 2 * 2 * 2 [1] 8 2/2 31

32 [1] 1 sqrt(16) [1] 4 In addition, R has a set of logical operators (?Logic) which can be used for a wide variety of manipulations. Consider the made up data below for log.data and x. log.data <- 1 + (x <- rpois(20, 1)/3) x [1] [11] log.data [1] [12] You can do random number generation pretty simply and quickly (you will see more of this later on): rnorm(10) [1] [9] rpois(10, 10) [1] You can work with 'NA' values within your data in dierent ways. First, note that I am replacing the 3rd value in the log.data vector with a 'NA' means that using a simple function like 'mean' will return 'NA' because the log.data vector contains missing values. When this occurs, R has some handy functions for handling data with missing values, usually using na.rm= or something like that: log.data[3] = NA mean(log.data) [1] NA mean(log.data, na.rm = TRUE) [1] Or, alternatively, you could do this and get the same answer: log.data[3] = NA newlog.data = na.omit(log.data) mean(newlog.data) [1]

33 And remember, R does arithmetic on vectors and matrices just ne: c.data [1] * c.data [1] loop.matrix [,1] [,2] [1,] 1 3 [2,] 2 4 loop.matrix - 5 [,1] [,2] [1,] -4-2 [2,] -3-1 Date and time This will be a pretty quick section as there are quite a few dierent ways to deal with datetime classes, and for the most part when you deal with them it will be in a categorization or subsetting context. Dates in R are pretty simple to deal with, and there are a variety of options for working with dates. First, and probably the simplest introduction is to just create a date in some format and play with it. So, for example: Sys.time() [1] " :56:36 CST" as.date(sys.time()) [1] " " as.date("2010/04/02") [1] " " Now, whats nice about dates is you can manipulate them pretty easy by changing the format string to get them in the format you are needing. Note, very important, when you are re-formatting the dates, you have to use the exact same description in the format() command, e.g., if your date has a comma after the day, your format command has to have a comma after the %d as well or you will get a NA (see the last example using 1 September 1973). as.date(" ", format = "%Y-%m-%d") [1] " " as.date("april 2, 2010", format = "%B %d, %Y") 33

34 [1] " " as.date("2april10", format = "%d%b%y") [1] " " as.date("september 1, 1973", format = "%B %d %Y") [1] NA as.date("september 1, 1973", format = "%b %d, %Y") [1] " " Or, we can also tell R to get out the current time to play with you can do something like this to see the current time and assign it a name: Sys.time() [1] " :56:36 CST" system.time <- Sys.time() str(system.time) POSIXct[1:1], format: " :56:36" system.time [1] " :56:36 CST" So now we have a object called system.time that has a date-time combination. Also not that I used str to look at it and it was of class POSIX, which is a common format for date-time values. I tend to use POSIX classes more frequently than most other date functions (e.g., chron package) because it stores time to the nearest second. POSIX data's input format is year, then month, then day, a space, then time in hours:minutes:seconds. POSIX works similarly to other date functions for manipulating dates between formats as well: time.posix <- c(" :00:30") as.posixct(time.posix) [1] " :00:30 CDT" class.date <- strptime("2/april/2010:08:01:27", format = "%d/%b/%y:%h:%m:%s") str(class.date) POSIXlt[1:1], format: " :01:27" You can work with dates pretty easily. For example, we can create a time sequence of dates and then run some general R functions on them looking at date ranges, mean date, time between dates, etc: seq(as.date(" "), by = "days", length = 5) [1] " " " " " " " " " " 34

35 dates <- seq(as.date(" "), by = "days", length = 30) mean(dates) [1] " " range(dates) [1] " " " " summary(dates) Min. 1st Qu. Median Mean 3rd Qu. " " " " " " " " " " Max. " " dates[8] - dates[1] Time difference of 7 days difftime(dates[8], dates[1], units = "hours") Time difference of 168 hours 35

36 4 R Creating Graphics Few tools in ecology (or in any eld for that matter) are as powerful as a graphical representation of your data. We should use graphs as an analytical tool to assist with data visualizing and analysis, but lots of times folks just use graphs to summarize statistical results from their analysis (e.g., showing means). I can talk on and on about graphs in R (see Collier 2008), but as to not waste time, concisely, there are not too many things graphs you cannot make in R, period. So, to show you some examples of graphs, I am going to create several datasets, some will be simple univariate simulations, some will be a bit more complicated dataframes. The reason I am using simulated data is two-fold: 1) using simulated data helps you to understand the structure of the data because you created it, so you know what it should look like and can use the tools we learned earlier to get other datasets into the correct format, and 2) its fairly simple to simulate a wide range of data types quickly and eciently, rather than try to load individual datasets, walk through all the manipulation in this document, and then do some examples (although that is on the horizon). One thing it is important to notice, lots of time you will see the same code put into a command. That is because R can use the same command, like the function call col for dening the color you want to use. Remember this, its handy (see?par for more details). 4.1 Scatterplots So, lets start with the easy stu, a scatter plot. In the simplest sense, we can create plots in R pretty quick with short statements. plot(rnorm(100, 10, 1), main = "A scatterplot") A scatterplot rnorm(100, 10, 1) Index First, a few things to notice. First, this is just a very simple scatter of made up points, there is not rhyme or reason to them. Second, there is really no relationship between the x-axis values and the y-axis values, as basically I just simulated 100 points from a normal 36

37 distribution with a mean of 10 and a standard deviation of 1 (?Normal) and the index (xaxis) is the order they were simulated in. Also, note that R provides some default axis labels, the y-axis is basically what was called with the plot command from above, and the x-axis value Index was just the order of simulation as dened before. Nothing is really formatted, the axis labels are laying the wrong way on the y-axis, font is weird, so on and so forth. Ok, but we can obviously do more with R than some dumb scatter plot data. What if, for example, we have data where we actually have a good reason to label the axes correctly, such as data relating counts of the eyeworms in the eyes of Quaily Mc'OweMyEyesHurt 1 to the mass of Quaily Mc'OweMyEyesHurt (this is Texas and lots of people see to care about eyeworm numbers right now, so its topical but see the footnote...). Below I made up a completely ridiculous dataframe for example plotting purposes only: set.seed(10002) worms = round(rnorm(50, 66, 10), digits = 0) presence = factor(round(rbinom(50, 1, 0.7), digits = 2)) mass = worms + 3 * (round(rnorm(50, 0.25, 0.2), digits = 2)) long = worms * (round(rnorm(50, 60, 15), digits = 2)) group <- factor(rep(1:5, 10)) quaily <- data.frame(worms, presence, mass, long, group) attach(quaily) The following objects are masked _by_.globalenv: group, long, mass, presence, worms str(quaily) 'data.frame': 50 obs. of 5 variables: $ worms : num $ presence: Factor w/ 2 levels "0","1": $ mass : num $ long : num $ group : Factor w/ 5 levels "1","2","3","4",..: head(quaily) worms presence mass long group So, we can see that our dataframe quaily has a couple of continuous variables, a couple of factor variables, and is all around ridiculous. But, lets go ahead and plot some data anyway. So, what do we see when we look at this gure? ˆ The axis values look approximately correct (although notice that there are a few values on the graph >80, yet the x-axis only goes to 80) so we will probably want to adjust those; 1 There is no such thing as a Quaily and and it obviously does not reference any species in Texas and as far as I know, Mc'OweMyEyesHurt is not a real word 37

38 ˆ The numbers at each tick mark on the y-axis are parallel to the axis, which makes them harder to read; ˆ The graph is contained in a box, neither good nor bad, its more of a preference thing; ˆ The labels for each axis are correct, but they do not convey much information; ˆ There is no gure title (not that it's needed) plot(mass, worms) worms mass So, there are quite a few things we might want to change with this graph, correct? Well, lets change them. When you want to change things in your graph,?par is your friend.?par provides a detailed list of the many options for manipulating graphs in R, so, lets make it pretty: plot(mass, worms, las = 1, main = "This is a Wormy Quaily Graph", ylab = "Quaily Worms", xlab = "Quaily Fatness", pch = 19, col = "red", xlim = c(40, 90), ylim = c(40, 90)) 38

39 This is a Wormy Quaily Graph Quaily Worms Quaily Fatness Wow, pretty. Amazingly, when you want to nd a relationship, you can! For our example on quaily and eyeworms, it looks as if there is a positive relationship between worm numbers and quaily mass, so what if we wanted to add a plot of the linear regression curve to this plot? Well, we could run the regression, t the line, and just for kicks, lets t the vertical error distances as well. plot(mass, worms, las = 1, main = "This is a Wormy Quaily Graph", ylab = "Quaily Worms", xlab = "Quaily Fatness", pch = 19, col = "red", xlim = c(40, 90), ylim = c(40, 90)) quaily.reg <- lm(worms ~ mass) abline(quaily.reg, col = "blue", lwd = 2) fit.quaily <- fitted(quaily.reg) segments(mass, fit.quaily, mass, worms, col = "blue") 39

40 This is a Wormy Quaily Graph Quaily Worms Quaily Fatness So, you get the idea that there are all kinds of cool ways to manipulate data and make graphs. Below I will show a few examples of dierent types of plots that are typically used. I tried to keep most of these examples fairly consistent with what can be easily found in either the R help les for each plot type, or what you would nd when you google R barplot, so that you will be able to nd some additional examples later and match them to what we did in class. 4.2 Other Simple plots So, barplots, the workhorse of wildlife ecologists (and often called histograms, don't do that). Using the mtcars dataset in base R, a quick barplot data(mtcars) count = table(mtcars$gear) barplot(count, main = "Example Barplot", xlab = "Gear number") 40

41 Example Barplot Gear number Wow, that is, simple, how about one this one, just a little dierent example. # Grouped Bar Plot counts <- table(mtcars$cyl, mtcars$gear) barplot(counts, main = "Car Distribution by Gears and Cylinders", xlab = "Number of Cyllinders", col = c("red", "yellow", "blue"), legend = rownames(counts), beside = TRUE, las = 1) Car Distribution by Gears and Cylinders Number of Cyllinders What about condence intervals, we need to do that, right? Here are a couple of dierent ways to add condence intervals to a barplot, or just create condence intervals (straight from the plotci help le. 41

42 library(plotrix) data(warpbreaks) attach(warpbreaks) err = y = runif(10) wmeans <- by(warpbreaks$breaks, warpbreaks$tension, mean) wsd <- by(warpbreaks$breaks, warpbreaks$tension, sd) note that barplot() returns the midpoints of the bars, which plotci uses as x-coordinates plotci(barplot(wmeans, col = "gray", ylim = c(0, max(wmeans + wsd))), wmeans, wsd, add = TRUE) L M H using labels instead of points labs <- sample(letters, replace = TRUE, size = 10) plotci(1:10, y, err, pch = NA, gap = 0.02, main = "plotci with labels at points", las = 1) text(1:10, y, labs) 42

43 plotci with labels at points 1.5 y V L D C V Q P P W V Now, there are tons of ways to do this, lots of R packages can be used to add condence intervals, some more elegantly than others. But, its important to realize that you can do it for many dierent types of plots, for instance, a logistic regression 1:10 set.seed(123) mydata = data.frame(response = rbinom(100, 1, 0.5), Predictor = rnorm(100, 100, 50)) attach(mydata) test.glm = glm(response ~ Predictor, family = "binomial") predict.data = seq(4, 496, 4) y = plogis(test.glm$coefficients[1] + test.glm$coefficients[2] * predict.data) xy = data.frame(predictor = predict.data) yhat = predict(test.glm, xy, type = "link", se.fit = TRUE) upperlogit = yhat$fit * yhat$se.fit lowerlogit = yhat$fit * yhat$se.fit ucl = plogis(upperlogit) lcl = plogis(lowerlogit) plot(predict.data, y, ylim = c(0, 1), type = "l", lwd = 2, ylab = "Prob(Success)", xlab = "Predictor Variable", xaxt = "n", las = 1) axis(1) lines(predict.data, ucl, lty = 2, lwd = 2) lines(predict.data, lcl, lty = 2, lwd = 2) 43

44 1.0 Prob(Success) Another simple one is a dotchart, y1 <- runif(2) g <- c("0-50", "50-100") dotchart(y1, g, pch = 20, xlim = c(0, 1)) Predictor Variable Which can be used to create some pretty elegant graphs rather quickly that show lots of data, for instance using the mtcars dataframe. x <- mtcars[order(mtcars$mpg), ] # sort by mpg x$cyl <- factor(x$cyl) it must be a factor 44

45 x$color[x$cyl == 4] <- "red" x$color[x$cyl == 6] <- "blue" x$color[x$cyl == 8] <- "darkgreen" dotchart(x$mpg, labels = row.names(x), cex = 0.7, groups = x$cyl, main = "Gas Mileage", xlab = "Miles Per Gallon", gcolor = "black", color = x$color) Gas Mileage Toyota Corolla Fiat 128 Lotus Europa Honda Civic Fiat X1 9 Porsche Merc 240D Merc 230 Datsun 710 Toyota Corona Volvo 142E Hornet 4 Drive Mazda RX4 Wag Mazda RX4 Ferrari Dino Merc 280 Valiant Merc 280C Pontiac Firebird Hornet Sportabout Merc 450SL Merc 450SE Ford Pantera L Dodge Challenger AMC Javelin Merc 450SLC Maserati Bora Chrysler Imperial Duster 360 Camaro Z28 Lincoln Continental Cadillac Fleetwood Miles Per Gallon Again, there are many ways to create a graph, here are some examples using ggplot2 and # create factors with value labels library(ggplot2) Loading required package: methods mtcars$gear <- factor(mtcars$gear, levels = c(3, 4, 5), labels = c("3gears", 45

46 "4gears", "5gears")) mtcars$am <- factor(mtcars$am, levels = c(0, 1), labels = c("automatic", "Manual")) mtcars$cyl <- factor(mtcars$cyl, levels = c(4, 6, 8), labels = c("4cyl", "6cyl", "8cyl")) # Scatterplot of mpg vs. hp for each combination of gears and cylinders in # each facet, transmittion type is represented by shape and color qplot(hp, mpg, data = mtcars, shape = am, color = am, facets = gear ~ cyl, size = I(3), xlab = "Horsepower", ylab = "Miles per Gallon") Miles per Gallon cyl 6cyl 8cyl 3gears 4gears 5gears Horsepower am Automatic Manual And another example on the same data qplot(mtcars$gear, mtcars$mpg, data = mtcars, geom = c("boxplot", "jitter"), fill = gear, main = "Mileage by Gear Number", xlab = "", ylab = "Miles per Gallon") 46

47 35 Mileage by Gear Number 30 Miles per Gallon gear 3gears 4gears 5gears 10 3gears 4gears 5gears We can even plot spatial locations quick and easy, for instance, here are some Texas turkey GPS locations (more on this later)... suppresspackagestartupmessages(library(moveud)) data(rawturkey) par(mfrow = c(2, 1)) plot(rawturkey$lon, rawturkey$lat, main = "Unedited Points", pch = 20, col = "red", xlab = "Longitude", ylab = "Latitude", las = 1, cex.axis = 0.7) newrawturkey = rawturkey[rawturkey$lon < -98.1, ] plot(newrawturkey$lon, newrawturkey$lat, main = "Edited Points", pch = 20, col = "red", xlab = "Longitude", ylab = "Latitude", las = 1, cex.axis = 0.7) 47

48 TX TWS Workshop, Feb 2014 Unedited Points Latitude Longitude Edited Points Latitude Longitude 48

49 5 Statistical Models with R 5.1 Contingency Tables Obviously, being able to enter data in contingency tables is a pretty useful skill. As a quick example, so you get a feel for it, lets create a quick 9 by 2 contingency table in R using some nest predator data from Dreibelbis et al. (2008). Note that two way tables need to be matrix objects. Now, because we wanted the data entered column-wise, we used byrow=f (which would have been the default if we had not included the byrow=f). You can see what happens if you don't dene byrow= by changing it from T to F. Ok, so know we have dened the object class.status, but it is just a matrix, no column or row headings. This is important, as many times you will want to add column and row headings to your data. Easiest way, use colnames or rownames. class.status <- matrix(c(0, 0, 2, 4, 2, 0, 2, 1, 3, 1, 1, 1, 2, 7, 3, 0, 0, 4), nrow = 9, byrow = F) class.status [,1] [,2] [1,] 0 1 [2,] 0 1 [3,] 2 1 [4,] 4 2 [5,] 2 7 [6,] 0 3 [7,] 2 0 [8,] 1 0 [9,] 3 4 colnames(class.status) <- c("2006", "2007") rownames(class.status) <- c("nine-banded Armadillo", "Bobcat", "Feral hog", "Gray fox", "Common raccoon", "Common raven", "Striped skunk", "Texas rat snake", "Total multiple predator events") class.status Nine-banded Armadillo 0 1 Bobcat 0 1 Feral hog 2 1 Gray fox 4 2 Common raccoon 2 7 Common raven 0 3 Striped skunk 2 0 Texas rat snake 1 0 Total multiple predator events 3 4 Often we will have data in some sort of a dataframe where we have 1 row for each data point in the dataset. So, lets try some examples using our earlier dataset called quaily. Now, since you have the quaily loaded, lets play a bit with it by using the function table. Using table() we can look at the raw counts of the number of times parasites were present, a cross-tab of parasite presence by group, we can even look at the proportion of each count that falls in each category using the function prop.table(). 49

50 head(quaily) worms presence mass long group table(quaily$presence) table(quaily$presence, quaily$group) table.quaily <- (table(quaily$presence, quaily$group)) prop.table(table.quaily) Tables can get extremely complicated really quick, and R can make looking at data using tables pretty easy (e.g., see ftable or xtabs as other options for looking at tabular data). But, what if we are interested in conducting some statistical evaluations on data in tables? The list is endless of what you can do, but lets do a example of a test of independent proportions and a chi-square test on a 2 by 2 contingency table. Lets assume that our data consists of the number of juvenile and adult sh that successfully survived some experimental testing done over in the Biology (nerd's) building. fish.not.dead <- c(10, 6) fish.total.tested <- c(20, 21) prop.test(fish.not.dead, fish.total.tested) 2-sample test for equality of proportions with continuity correction data: fish.not.dead out of fish.total.tested X-squared = 1.179, df = 1, p-value = alternative hypothesis: two.sided 95 percent confidence interval: sample estimates: prop 1 prop 2 50

51 So these results indicate no dierence between the proportions (see the p-values and such in the output). What about a χ 2 test? First, we have to turn our data into a matrix as that is the required formatting (see the Arguments section under?chisq.test) (also note that because we are running a this using chisq.test the second column of the table has to be the number of negative outcomes (failures: 10 & 15) as opposed to the totals (20 & 21) as given above. For a 2 2 table, the results using prop.test and chisq.test are equivalent. chi.data <- matrix(c(10, 6, 10, 15), 2) chi.data [,1] [,2] [1,] [2,] 6 15 chisq.test(chi.data) Pearson's Chi-squared test with Yates' continuity correction data: chi.data X-squared = 1.179, df = 1, p-value = We can also do r c contingency tables. Consider the data from class.status above. class.status Nine-banded Armadillo 0 1 Bobcat 0 1 Feral hog 2 1 Gray fox 4 2 Common raccoon 2 7 Common raven 0 3 Striped skunk 2 0 Texas rat snake 1 0 Total multiple predator events 3 4 chisq.test(class.status) Warning: Chi-squared approximation may be incorrect Pearson's Chi-squared test data: class.status X-squared = 11.43, df = 8, p-value = chisq.test(class.status)$expected Warning: Chi-squared approximation may be incorrect Nine-banded Armadillo

52 Bobcat Feral hog Gray fox Common raccoon Common raven Striped skunk Texas rat snake Total multiple predator events Notice that chisq.test includes more information than is printed by default. Always remember this about R, you can see what is included in the function using some of the tricks from earlier. For example, you can use str to determine what is included in function chisq.test. But, if you want to know what is included in the information after calling textttchisq.test on our data, then you could use str and extract the contents of your function call, then, it is simply a matter of identifying what you are interested in extracting, and pulling it from the list identied above. For example, you can see that we have a list containing 8 dierent objects, all identied using the $ operator. str(chisq.test) function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000) str(chisq.test(class.status)) Warning: Chi-squared approximation may be incorrect List of 9 $ statistic: Named num attr(*, "names")= chr "X-squared" $ parameter: Named int 8..- attr(*, "names")= chr "df" $ p.value : num $ method : chr "Pearson's Chi-squared test" $ data.name: chr "class.status" $ observed : num [1:9, 1:2] attr(*, "dimnames")=list of 2....$ : chr [1:9] "Nine-banded Armadillo" "Bobcat" "Feral hog" "Gray fox" $ : chr [1:2] "2006" "2007" $ expected : num [1:9, 1:2] attr(*, "dimnames")=list of 2....$ : chr [1:9] "Nine-banded Armadillo" "Bobcat" "Feral hog" "Gray fox" $ : chr [1:2] "2006" "2007" $ residuals: num [1:9, 1:2] attr(*, "dimnames")=list of 2....$ : chr [1:9] "Nine-banded Armadillo" "Bobcat" "Feral hog" "Gray fox" $ : chr [1:2] "2006" "2007" $ stdres : num [1:9, 1:2] attr(*, "dimnames")=list of 2....$ : chr [1:9] "Nine-banded Armadillo" "Bobcat" "Feral hog" "Gray fox" $ : chr [1:2] "2006" "2007" - attr(*, "class")= chr "htest" chisq.test(class.status)$observed 52

53 Warning: Chi-squared approximation may be incorrect Nine-banded Armadillo 0 1 Bobcat 0 1 Feral hog 2 1 Gray fox 4 2 Common raccoon 2 7 Common raven 0 3 Striped skunk 2 0 Texas rat snake 1 0 Total multiple predator events 3 4 chisq.test(class.status)$expected Warning: Chi-squared approximation may be incorrect Nine-banded Armadillo Bobcat Feral hog Gray fox Common raccoon Common raven Striped skunk Texas rat snake Total multiple predator events Linear Regression The basics behind this section is to get you comfortable with general approaches to regression analysis. The methods build on each other, but for the most part remain consistent. First, I will outline a simple linear regression with one response and one predictor variable, then discuss how this relates to analysis of variance. I will follow with multiple regression 2 predictor variables and generalized linear models for binary and count data. Linear regression, the workhorse of statistical methodology, is used to explain the relationship between 2 variables, primarily focused on how one variable impacts the level of another variable. Just because I want to see how to put a formula in LYX, here is the basic equation for linear regression, y i = α + βx i + ɛ i You all have seen this, so we will not belabor the point. regression in R? We do it well... But, how do we do linear lm(quaily$worms ~ quaily$mass) Call: lm(formula = quaily$worms ~ quaily$mass) Coefficients: 53

54 (Intercept) quaily$mass Doesn't seem like much when you do it like that, does it? I mean, R pretty much just shows us the function call, and the estimated beta coecientsis that all R did? Why are we here again? Now, R does other things, but remember earlier when I said R would not give you things, you had to ask for them? Well, now its time to learn how to ask. First, you can use the summary function to extract a little bit more information (you can ignore the usefancyquotes code, it is so I could output the summary in a pdf, something screwy with Sweave and R). So, what did we get using summary? Well, our call to lm() created a model object (just as the chi-square test we used earlier did) consisting of several parts. First, we have a repeat of the function call, then a summary of the distribution of the residuals, then the model coecients are printed, followed by some various information on model t. options(usefancyquotes = FALSE) summary(lm(quaily$worms ~ quaily$mass)) Call: lm(formula = quaily$worms ~ quaily$mass) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * quaily$mass <2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 48 degrees of freedom Multiple R-squared: 0.996,Adjusted R-squared: F-statistic: 1.22e+04 on 1 and 48 DF, p-value: <2e-16 Pretty cool, huh. Now, what if we just wanted to extract the coecients without all the other stu? Remember earlier I told you we would be using some stu later? Here goes: rst you can use names to see what is contained within the model object. example.regression <- lm(quaily$worms ~ quaily$mass) names(example.regression) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" Notice there is one in there called coefficients, so we can probably get those out in a couple other ways, 54

55 example.regression$coefficients (Intercept) quaily$mass coef(example.regression) (Intercept) quaily$mass summary(example.regression)$coefficients Estimate Std. Error t value Pr(> t ) (Intercept) e-02 quaily$mass e-59 str(summary(example.regression)$coefficients) num [1:2, 1:4] attr(*, "dimnames")=list of 2..$ : chr [1:2] "(Intercept)" "quaily$mass"..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(> t )" summary(example.regression)$coefficients[2, ] Estimate Std. Error t value Pr(> t ) 1.010e e e e-59 Now, while you saw this example earlier, it is probably worthwhile to redo it here to show how you can also build plots based o of your linear regression analysis simply and eciently. plot(mass, worms, las = 1, main = "This is a Wormy Quaily Graph", ylab = "Quaily Worms", xlab = "Quaily Fatness", pch = 19, col = "red", xlim = c(40, 90), ylim = c(40, 90)) abline(example.regression, col = "blue", lwd = 2) fit.quaily <- fitted(example.regression) segments(mass, fit.quaily, mass, worms, col = "blue") 55

56 This is a Wormy Quaily Graph Quaily Worms Quaily Fatness Ok, so what if we wanted to see the values used for developing this plot, or the residuals (dierence between observered and expected)? fitted(example.regression) resid(example.regression)

57 You can obviously extend your linear regression to multiple predictor values following the same approach as above for main eects models. multi.regression <- lm(quaily$worms ~ quaily$mass + quaily$long) summary(multi.regression) Call: lm(formula = quaily$worms ~ quaily$mass + quaily$long) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) quaily$mass <2e-16 *** quaily$long Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 47 degrees of freedom Multiple R-squared: 0.996,Adjusted R-squared: F-statistic: 6.02e+03 on 2 and 47 DF, p-value: <2e-16 multi2.regression <- lm(quaily$worms ~ quaily$mass * quaily$long) summary(multi2.regression) Call: lm(formula = quaily$worms ~ quaily$mass * quaily$long) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) quaily$mass <2e-16 *** quaily$long quaily$mass:quaily$long Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 46 degrees of freedom Multiple R-squared: 0.996,Adjusted R-squared: F-statistic: 3.97e+03 on 3 and 46 DF, p-value: <2e-16 You can do lots with basic regression, see?lm for more details. 5.3 Generalized Linear Models GLM's are characterized by the use of a 'link' function which provides the relationship 57

58 between the predictor variables and the expected value response variable. Probably the most common GLM is logistic regression, but a GLM with a normal link function would give the same results as using linear regression modeling with lm. in R, there is a nice little function called glm for running generalized linear models. As an example, here is some bird count data we can use for doing some logistic regression analysis on. bird.data <- read.table("f:/bretresearch/workshops/txtws_rworkshop/birddata.txt", header = TRUE, colclasses = c("numeric", "numeric", "factor", "numeric")) str(bird.data) 'data.frame': 154 obs. of 4 variables: $ present: num $ area : num $ reg : Factor w/ 4 levels "5","6","7","8": $ canopy : num head(bird.data) present area reg canopy So, we have a simple dataset for some birds surveys on which presence or absence was measured and we want to see if presence/absence is inuenced by either the area of habitat (in hectares), the region of the state (factor variable with 4 levels), or the percentage of canopy cover (range from 0-100). Now, glm has a trick to it you have to remember, although if you do?glm you would see it in the help le. When you specify glm, you have to dene a value for 'family', which tells R which link function from the exponential family to use to relate the predictors to the response variable. Since we are dealing with binary data, we will use binomial to dene family. bird.model <- glm(bird.data$present ~ bird.data$area, family = "binomial") Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred summary(bird.model) Call: glm(formula = bird.data$present ~ bird.data$area, family = "binomial") Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) bird.data$area * 58

59 --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 153 degrees of freedom Residual deviance: on 152 degrees of freedom AIC: Number of Fisher Scoring iterations: 8 Maybe we want to see how the predicted probability of presence changes over area. Looking at the summary above, we can see that the intercept and slope are both positive, so we would expect a positive impact of area on presence. We can show that several ways, but probably using a graphic would be best. plot(bird.data$area, fitted(glm(bird.data$present ~ bird.data$area, family = "binomial")), xlab = "Area (ha)", ylab = "Probability present") Probability present Area (ha) But, this plot is really not that pretty, what with all the dots and stu. What say we try another way to clean it up a bit. First, we do a bit of data manipulation so that we can use the predict function in R, which is a pretty useful little function. Now, if you look at the above gure you see that we are pretty much plotting the predicted response (presence probability) for each level of are for which we have data. But, what if we wanted to know what the prediction looked like for area sizes we did not collect? Well, that is pretty simple to do. First, we do a bit of data manipulation where we dene a new variable for area (Area) which ranges from 0 to 10,000 (it could have been any value, 100, 10,000, etc.) and then we use predict to predict the estimate presence probability for each value of Area. I used head below to show the rst 6 values of the predictions. 59

60 attach(bird.data) bird.predict <- glm(present ~ area, family = "binomial") Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred Area <- seq(0, 10000, 1) new.area <- data.frame(area = Area) predicted.bird <- predict(bird.predict, new.area, type = "resp") head(predicted.bird) plot(predicted.bird ~ Area, type = "l", las = 1, ylab = "Probability present") Probability present Area Well, what do you know, a prettier graph. But, what would a journal editor say? Where are the condence limits? This takes a bit of tweaking and there are several ways to do this, but here is the one I tend to like. First, you have to do a little data manipulation using predict. First, notice that I changed the type to link, there is a reason. Above, when I used type="resp" I was predicting on the 'real' scale, or the actual predicted probabilities for each level of Area. But, when you build your condence intervals based on the real scale values, you can get estimates >1, which cannot happen. This is my crude hack that I use all the time to build condence intervals when I am working on logistic regression models. So, rst, I bring in the non-logit transformed estimates for each value of Area, and I build a condence interval for each level. pred.cl <- predict(bird.predict, new.area, interval = c("confidence"), level = 0.95, type = "link", se.fit = TRUE) uppercl <- pred.cl$fit * pred.cl$se.fit lowercl <- pred.cl$fit * pred.cl$se.fit Now, plotting these is pretty simple using the lines function. Note I am extending the 60

61 y-axis here so that I can add data to the graph showing the spread of the sites with detections or not (shown using the points statement below; in green on the graphic). plot(predicted.bird ~ Area, type = "l", las = 1, ylab = "Probability present", ylim = c(0, 1)) lines(plogis(uppercl), col = "blue") lines(plogis(lowercl), col = "red") points(area, present, col = "green") Probability present Area In case your wondering, the reason the condence intervals are not symmetric around the line, like you are probably used to, is because they are build on the logit scale. But, wait a minute, what the heck is plogis()? It must be important, right? Yes, it is as it keeps your values bounded between 0 and 1 (see?plogis) and its really useful. First, remember that a basic logistic regression looks like this: e β 0+β i x i 1 + e β 0+β i x i so, if we have estimates for β 0 and β 1 then we can actually use plogis to predict each probability. For example, coef(bird.predict) (Intercept) area then we have a estimate for the intercept and the slope. Say our interest was in predicting the probability of presence given a area estimate of 10. Well, using the above logistic regression formula, it would look like e = e

62 or, we can get at this a couple way, plogis( * 10) [1] plogis(summary(bird.predict)$coefficients[1] + summary(bird.predict)$coefficients[2] * 10) [1] Say we wanted predictions for area estimates of 25:50? plogis(summary(bird.predict)$coefficients[1] + summary(bird.predict)$coefficients[2] * 25:50) [1] [11] [21] I nd that plogis() is a generous friend and I use it every day! 62

63 6 Writing Functions in R 6.1 Functions Sometimes, One of the basics of R is that users can contribute code to conduct various analyses. In R, the standard contribution is a function, or something that the end user uses on their data to get some result. For instance, consider the simple function below to add 2 user-supplied values together: addtwo = function(a, b) { out = a + b return(out) } addtwo(2, 2) [1] 4 addtwo(2, 4) [1] 6 This function works with any 2 numerical values. Functions can be more complex, like the below function that creates summary output: my.summary = function(x) { my.n = length(x) my.mean = mean(x, na.rm = TRUE) my.var = var(x, na.rm = TRUE) my.sd = sd(x, na.rm = TRUE) my.median = median(x, na.rm = TRUE) out = list(samplesize = my.n, Mean = my.mean, Variance = my.var, StdDev = my.sd, Median = my.median) return(out) } sum.data = rnorm(10) my.summary(sum.data) $SampleSize [1] 10 $Mean [1] $Variance [1] $StdDev [1] $Median [1] Functions can do a ton of work for you, so I am barely (and I mean that, barely) scratching the surface. If, for instance, you wanted to see what the function bbmm.polygon() from the 63

64 moveud package looks like (and I know exactly what it looks like cause I wrote it, but bbmm.polygon creates the utilization distribution contours based on the bbmm.contour from package BBMM and exports the created contour lines as polygon shapele for further analysis in ArcMap (or GIS program of choice), then you can just type in the name of the function into R and out it pops. Eectively, it imports a dataframe, reprojects it to UTM, uses brownian.bridge() to create a BBMM and exports the contour lines via bbmm.contour, creates a raster, transforms that raster to a spatial polygon data frame, adds a couple of variables to the data frame, and writes the output to a shapele appropriate for reading into ArcMap. Not to complicated... bbmm.polygon function (x, crs.current, crs.utm, lev, plot = FALSE, path, indid) { coordinates(x) = ~Lon + Lat proj4string(x) = CRS(crs.current) x = data.frame(sptransform(x, CRS(crs.utm))) out.bbmm = brownian.bridge(x = x$lon, y = x$lat, time.lag = x$tl[-1], location.error = 15, cell.size = 20, max.lag = 180) contours = bbmm.contour(out.bbmm, levels = lev, locations = x, plot = plot) probs <- data.frame(x = out.bbmm$x, y = out.bbmm$y, z = out.bbmm$probability) out.raster <- rasterfromxyz(probs, crs = CRS(crs.utm), digits = 5) raster.contour <- rastertocontour(out.raster, levels = contours$z) raster.contour <- spchfids(raster.contour, paste(lev, "% Contour Line", sep = "")) out = sptransform(raster.contour, CRS(crs.utm)) out = SpatialLines2PolySet(out) out = PolySet2SpatialPolygons(out) out = as(out, "SpatialPolygonsDataFrame") out$udlevels = paste(rev(lev)) out$bandid = paste(indid) setwd(path) writeogr(obj = out, dsn = ".", layer = paste(indid), driver = "ESRI Shapefile") } <environment: namespace:moveud> Or, if for instance, you want to do a little simulation to look at the impacts of detection heterogeneity in deer spotlight survey count data and see how many times you would expect to over estimate, underestimate, or be correct (within 10% error bounds), of how many deer were near the road you were driving down (even though this is fraught with errors) then you could use: deer.sim = function(survey, reps) { x = replicate(reps, { pr = rnorm(survey, , ) pr[pr < 0] = 0 x = survey/pr lower = x[x < mean(x) * mean(x)] upper = x[x > mean(x) * mean(x)] c(length(lower), length(upper))/length(x) }) 64

65 } ml = mean(x[1, ]) mu = mean(x[2, ]) constant = 1 - ml - mu x = cbind(decreased = ml, Constant = constant, Increased = mu, Count = survey) return(x) And then we can use the function call to estimate how many times we might be too low, too high, or just right based on those numbers (although some would argue about it anyway because deer spotlight surveys are sacrosanct and inviolable in their eyes...). deer.sim(100, 100) Decreased Constant Increased Count [1,] That about does it for functions, we could spend time on lexical scoping and such, but that is way beyond this class... 65

66 7 Wildlife-Specic Methods 7.1 Capture-Recapture Analysis The most comprehensive software package for analysis of capture-recapture data is the program MARK (White and Burnham 1999). While it is unparalleled in the range of models, quality of the user documentation ( and active base of user-driven support ( the interface for building models can be limiting for large data sets and complex models. While there is some capability for automatic model creation in MARK, most models are built manually with a graphical user interface to specify the parameter structures and design matrices. Manual model creation can be useful during the learning process but eventually it becomes a timeconsuming and sometimes frustrating exercise that may add an unnecessary source of error in the analysis. Finally, for those that analyze data from on-going monitoring programs, there is no way to extend the capture-history in MARK, which necessitaes manual recreation of all models as data from future sampling occasions is collected. RMark is a R package that provides a formula based interface for MARK. RMark has been available since 2005 and is on the Contributed R Archive Network (CRAN) ( r-project.org). RMark contains functionality to build models for MARK from formulas, run the model with MARK, extract the output, and summarize and display the results with automatic labeling. RMark also has functions for model averaging, prediction, variance components, and exporting models back to the MARK interface. In addition, all of the tools in R are available which enable a completely scripted analysis from data to results and inclusion into a document with Sweave (Leisch 2002) and L A TEX to create a reproducible manuscript such as this one. The report which represents the appropriate citation (eective 2013) for RMARK can be found at and is included in the workshop notes as well. I have not included the Here we are going to provide an overview of the RMark package and how it can be used to benet MARK users. For more detailed documentation, refer to the online documentation at and the help within the RMark package. And, just to be fair, a signciant portion of these course notes came from various documents Je created while explaining or documenting RMark for teaching purposes and to a lesser extent from some notes I have put together for students at A&M. Background RMark does not t models to data, rather, RMark is a R package that was designed to provide a alternative user interface to MARK and its GUI. RMark uses the R language to construct models, create the input le (.inp), then call MARK which ts the model(s) to the data, extracts the results from the output le created by MARK, and allows the user to manipulate (via R or some other program) the resultant model output. Thus, RMark is a R interface to MARK, not a stand along capture-recapture modeling environment. That said, if results you got using MARK do not match the results you got when you used RMark, then you have made a mistake in one or the other. Where to find help? 66

67 Currently, or at least as best we can tell, MARK supports 140 dierent modeling options. At present, RMark does not fully replicate every option available in MARK. Although new models are added to RMark fairly regularly, not every model in MARK is available in RMark, and some things you can do in MARK such as data bootstrapping or computing median c-hat values are not available through the RMark interface. For a list of models available in RMark, you can use something like system.file("markmodels.pdf", package="rmark") which will provide you with a PATH statement telling you where you can access the pdf le containing the list of MARK models available in RMark, along with the appropriate code, parameter, and help le names (or, if you have a specic R_LIBS path where you R packages are installed locally, just go there and look for the RMark and the le MarkModels.pdf will be found there. First, it is important to remember that RMark needs MARK, so without an understanding of MARK, you will be limited in your ability to use RMark. So, your rst stop should always be the the "MARKBOOK", authored/edited by Evan Cooch and Gary White, with contributions from a wide variety of others. The MARKBOOK is freely available (all pages of it) at http: // Unequivocally, this is the primary desk reference for capture-recapture modeling approaches supported by MARK (although you should never cite it in a manuscript; see MARK FAQ at Details on RMark are found in Appendix C. Additionally, there is a very active community of ecologists who use MARK regularly that are willing to provide expertise to folks across a wide variety of capture-recapture modeling techniques, and a online forum (managed by Evan Cooch) is available at The user group of the phidot.org forum is typically extremely helpful, given you have read the MARKBOOK and have searched the archives. If you are not already a member, sign up. Finally, RMark operates just like any other R package, if you need the help/reference les for a particular function within RMark, you can access that function using the? followed by the name of the function you are interested in (e.g.,?mark). Advantages/Disadvantages So, why would one want to use RMark as an interface to MARK rather than MARK's GUI? Reasons abound, some are valid, some are not, lots of it is just individual point of view or project-specic needs. We think that there are some convincing reasons to use a scripted approach for your MARK analysis, but in the end it becomes a personal choice (one I think it is obvious that Je and I have already made). A few of the primary reasons we like to use RMark are (but not limited to): 1. RMark provides the user with the ability to automate analysis of monitoring data sets even as monitoring occassions are added. This is a signicant benet that RMark brings to MARK users as script generation of PIM and DM allow you to create the script once and if as monitoring data are collected, typically no changes to the script are needed. You just re-run the script with the new datale. 2. Design matrix creation. RMark uses a formula-based approach, which is faster and typically less error-prone (although not entirely error prone). Thus, less need to manually create the PIMS or DM. But, understanding of what the DM should look like is still necessary. 67

68 3. PIM simplication. RMark automatically creates the simplest PIM structure for each model, as opposed to MARK which uses the full DM even when reduced models are created. This will speed up model evaluation. 4. Collaborative Development: MARK and RMark play well together, so you can move analyses back and forth fairly cleanly using functions such as export.mark() and convert.inp(). 5. Entire analyses can be scripted. Although this is related to No.1 above, the scripting of analyses and the ability to use some of the functionality that comes along with R for additional computational support, publication quality graphing, among other things is quite benecial. 6. Reproducible analysis and documentation. Nearly all MARK analyses are reproducible so long as one keeps the.inp/.dbf/.fpt les and documents what was done. One thing that RMark excels at is that documentation support capabilities for R are widely applicable for MARK analyses. Thus, complete data sets and analysis, with metadata and detailed documentation, can be developed as R packages or data/code can be seemlessly integrated into L A TEX style manuscripts and documents (although Evan does a pretty good job with the MARKBOOK). We nd it really useful that the entirely of a dataset and analysis can be documented cleanly in one place (see?dipper for an example). Obviously, good data management protocols for reproducible analyses using only MARK are equally good, so this is more of a personal preference. Ok, so lets jump in with a quick example. As with most R packages, to access the functionality in RMark you type library(rmark) and R will respond with its appropriate version number and relevant information (I have it in.rprole on my system, so no output will be show below when I do it). For a quick example, we will use the ubiquitos European dipper (Cinclus cinclus) capture-recapture data from many examples in the MARKBOOK and a variety of manuscripts (it is included as a datale in RMark). For the dipper example, if we look at the structure of the dataset, we can see that it is a dataframe with 2 elds. The rst eld is the encounter history, which has a required column heading name of 'ch' and must be a character (chr) variable. The eld label ch is required for all MARK analyses, and typically a eld identifying the number of individuals with that specic encounter history (denoted 'freq') is included, along with additional elds are all optional. In this example, the eld sex species group structure (e.g., whether an individual is male or female) and is identied as a factor variable (Factor) with values 1=Female and 2=Male as ordering is alphabetic and ignores the ordering of the columns in the dipper.inp le which we can see using levels(). Finally, we can run a simple CJS analysis using the default of constant survival and constant recapture probabilities for the dipper data using the simple code mark(dipper). library(rmark) This is RMark data(dipper) str(dipper) 'data.frame': 294 obs. of 2 variables: $ ch : chr " " " " " " " "... $ sex: Factor w/ 2 levels "Female","Male":

69 levels(dipper$sex) [1] "Female" "Male" ex = mark(dipper) Output summary for CJS model Name : Phi(~1)p(~1) Npar : 2-2lnL: AICc : Beta estimate se lcl ucl Phi:(Intercept) p:(intercept) Real Parameter Phi Real Parameter p Importing and Manipulating Data Now that we have RMark up and running (and we know that it works), the rst thing we all want to do it load our data and do some analysis! RMark has several options/ways for one to create or load data for analysis in MARK. As most are familiar with the le.inp structure used by MARK, lets start with the approach that converts a encounter history inp le to a dataframe for use in RMark. For this demonstration, we will use the dipper.inp le which on my 64bit system is located in- C:\Program Files (x86)\mark\examples and the RMark function convert.inp(). Conversion of a.inp le to a dataframe using convert.inp() requires that that we specify the input le location and name, group and optional covariate names, and if the.inp le has commented areas (/* and */ in MARK parlance), that we let 69

70 RMark know. So you don't have to go look (or you can look above), the structure of dipper.inp is pretty straightforward, the encounter history has 7 encounter occasions, does include the freq column given the number of individuals with each specic encounter history, and has 2 groups (columns) representing either Male or Female (1 or 0). Because Males are in the rst column and females are in the second column, when we dene group.df= that will be the order we use. So, converting the dipper.inp data would work as follows: dipper.convert = convert.inp("c:/program Files (x86)/mark/examples/dipper.inp", group.df = data.frame(sex = c("male", "Female"))) When we look at the structure of the newly created le dipper.convert, we will see that it is now a R dataframe with 3 elds. The rst eld is the capture history (ch) which is a character values, the second eld is the frequency variable (freq) or the number of individuals with that unique encounter history (a numeric value), and the third eld is the grouping variable sex, which is a factor variable with 2 levels and can be shown using levels(). str(dipper.convert) 'data.frame': 294 obs. of 3 variables: $ ch : chr " " " " " " " "... $ freq: num $ sex : Factor w/ 2 levels "Female","Male": levels(dipper.convert$sex) [1] "Female" "Male" Once your data is in R as a dataframe, there are some handy options for manipulating data that you can do using standard R functions. A simple example is to add a numeric column representing some covariate (weight in typically used) to the newly created dipper.convert dataframe. dipper.convert$weight = rnorm(nrow(dipper.convert), mean = 11, sd = 3) summary(dipper.convert$weight) Min. 1st Qu. Median Mean 3rd Qu. Max Processing Data Many of you will be familiar with the MARK model specication window as it is where you identify the dataset you want to use for analysis, choose the model type specic for your analysis as well as providing details on the various descriptors for your dataset such as the number of encounter occasions, name and number of groups and individual covariates. 70

71 RMark (read: Je when he wrote it) takes care of some of these specications such as number of occasions, group labels and individual covariate names (drawn from the input le column names) by setting these for you. However, some of the options such as titles, number of mixtures, time intervals, among others are all argument options for the function process.data(), which takes the place of the model specication window from MARK. process.data() does exactly what it sounds like, it processes the specied input data le, and creates a R list structure that include the original dataframe, all the required attribute data, and what model the dataset should be analyzed with: dipper.proc = process.data(dipper.convert, model = "CJS", groups = "sex", begin.time = 1980) str(dipper.proc) List of 15 $ data :'data.frame': 294 obs. of 5 variables:..$ ch : chr [1:294] " " " " " " " ".....$ freq : num [1:294] $ sex : Factor w/ 2 levels "Female","Male": $ weight: num [1:294] $ group : Factor w/ 2 levels "1","2": $ model : chr "CJS" $ mixtures : num 1 $ freq :'data.frame': 294 obs. of 2 variables:..$ sexfemale: num [1:294] $ sexmale : num [1:294] $ nocc : num 7 $ nocc.secondary : NULL $ time.intervals : num [1:6] $ begin.time : num 1980 $ age.unit : num 1 $ initial.ages : num [1:2]

R 4 Ecologists. Bret A. Collier March 27, Introduction 3. 2 My Take on Things 5. 3 Places to Look for R Information 6

R 4 Ecologists. Bret A. Collier March 27, Introduction 3. 2 My Take on Things 5. 3 Places to Look for R Information 6 R 4 Ecologists Bret A. Collier March 27, 2010 Contents 1 Introduction 3 2 My Take on Things 5 3 Places to Look for R Information 6 4 Starting in R 6 4.1 Working Directories.........................................

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts 6 Subscripting 6.1 Basics of Subscripting For objects that contain more than one element (vectors, matrices, arrays, data frames, and lists), subscripting is used to access some or all of those elements.

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

EPIB Four Lecture Overview of R

EPIB Four Lecture Overview of R EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided

More information

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012 An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.

More information

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org

More information

GLY Geostatistics Fall Lecture 2 Introduction to the Basics of MATLAB. Command Window & Environment

GLY Geostatistics Fall Lecture 2 Introduction to the Basics of MATLAB. Command Window & Environment GLY 6932 - Geostatistics Fall 2011 Lecture 2 Introduction to the Basics of MATLAB MATLAB is a contraction of Matrix Laboratory, and as you'll soon see, matrices are fundamental to everything in the MATLAB

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

6.001 Notes: Section 15.1

6.001 Notes: Section 15.1 6.001 Notes: Section 15.1 Slide 15.1.1 Our goal over the next few lectures is to build an interpreter, which in a very basic sense is the ultimate in programming, since doing so will allow us to define

More information

POL 345: Quantitative Analysis and Politics

POL 345: Quantitative Analysis and Politics POL 345: Quantitative Analysis and Politics Precept Handout 1 Week 2 (Verzani Chapter 1: Sections 1.2.4 1.4.31) Remember to complete the entire handout and submit the precept questions to the Blackboard

More information

An Introduction to Statistical Computing in R

An Introduction to Statistical Computing in R An Introduction to Statistical Computing in R K2I Data Science Boot Camp - Day 1 AM Session May 15, 2017 Statistical Computing in R May 15, 2017 1 / 55 AM Session Outline Intro to R Basics Plotting In

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

Basic matrix math in R

Basic matrix math in R 1 Basic matrix math in R This chapter reviews the basic matrix math operations that you will need to understand the course material and how to do these operations in R. 1.1 Creating matrices in R Create

More information

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning

More information

Creating publication-ready Word tables in R

Creating publication-ready Word tables in R Creating publication-ready Word tables in R Sara Weston and Debbie Yee 12/09/2016 Has this happened to you? You re working on a draft of a manuscript with your adviser, and one of her edits is something

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology.

In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology. Guide to and Hi everybody! In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology. This guide focuses on two of those symbols: and. These symbols represent concepts

More information

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet. Mr G s Java Jive #2: Yo! Our First Program With this handout you ll write your first program, which we ll call Yo. Programs, Classes, and Objects, Oh My! People regularly refer to Java as a language that

More information

MITOCW watch?v=0jljzrnhwoi

MITOCW watch?v=0jljzrnhwoi MITOCW watch?v=0jljzrnhwoi The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Handling Your Data in SPSS. Columns, and Labels, and Values... Oh My! The Structure of SPSS. You should think about SPSS as having three major parts.

Handling Your Data in SPSS. Columns, and Labels, and Values... Oh My! The Structure of SPSS. You should think about SPSS as having three major parts. Handling Your Data in SPSS Columns, and Labels, and Values... Oh My! You might think that simple intuition will guide you to a useful organization of your data. If you follow that path, you might find

More information

LaTeX packages for R and Advanced knitr

LaTeX packages for R and Advanced knitr LaTeX packages for R and Advanced knitr Iowa State University April 9, 2014 More ways to combine R and LaTeX Additional knitr options for formatting R output: \Sexpr{}, results='asis' xtable - formats

More information

R Basics / Course Business

R Basics / Course Business R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats

More information

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics BIO5312: R Session 1 An Introduction to R and Descriptive Statistics Yujin Chung August 30th, 2016 Fall, 2016 Yujin Chung R Session 1 Fall, 2016 1/24 Introduction to R R software R is both open source

More information

VLOOKUP() takes three mandatory parameters and one default/optional parameter:

VLOOKUP() takes three mandatory parameters and one default/optional parameter: Excel Lesson: Table Lookup Functions Topics Covered: VLookup() [Look across] HLookup() [Look down] Lookup() [Look almost anywhere] Related Functions (a list) We will not be examining all forms of these

More information

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold

More information

» How do I Integrate Excel information and objects in Word documents? How Do I... Page 2 of 10 How do I Integrate Excel information and objects in Word documents? Date: July 16th, 2007 Blogger: Scott Lowe

More information

Introduction to R. Course in Practical Analysis of Microarray Data Computational Exercises

Introduction to R. Course in Practical Analysis of Microarray Data Computational Exercises Introduction to R Course in Practical Analysis of Microarray Data Computational Exercises 2010 March 22-26, Technischen Universität München Amin Moghaddasi, Kurt Fellenberg 1. Installing R. Check whether

More information

Applied Calculus. Lab 1: An Introduction to R

Applied Calculus. Lab 1: An Introduction to R 1 Math 131/135/194, Fall 2004 Applied Calculus Profs. Kaplan & Flath Macalester College Lab 1: An Introduction to R Goal of this lab To begin to see how to use R. What is R? R is a computer package for

More information

(Refer Slide Time 3:31)

(Refer Slide Time 3:31) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology Madras Lecture - 5 Logic Simplification In the last lecture we talked about logic functions

More information

How to program with Matlab (PART 1/3)

How to program with Matlab (PART 1/3) Programming course 1 09/12/2013 Martin SZINTE How to program with Matlab (PART 1/3) Plan 0. Setup of Matlab. 1. Matlab: the software interface. - Command window - Command history - Section help - Current

More information

Helping the Compiler Help You. Thomas Dy

Helping the Compiler Help You. Thomas Dy Helping the Compiler Help You Thomas Dy Programming do { programmer.write_code(); if(lazy) { sleep(); } compile_code(); } while(compiler.has_errors()); Compiler: Me no speaky English Programmer: Compiler,

More information

Introduction to the workbook and spreadsheet

Introduction to the workbook and spreadsheet Excel Tutorial To make the most of this tutorial I suggest you follow through it while sitting in front of a computer with Microsoft Excel running. This will allow you to try things out as you follow along.

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

An Introduction to Stata Exercise 1

An Introduction to Stata Exercise 1 An Introduction to Stata Exercise 1 Anna Folke Larsen, September 2016 1 Table of Contents 1 Introduction... 1 2 Initial options... 3 3 Reading a data set from a spreadsheet... 5 4 Descriptive statistics...

More information

Intro. Scheme Basics. scm> 5 5. scm>

Intro. Scheme Basics. scm> 5 5. scm> Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if

More information

Excel Basics: Working with Spreadsheets

Excel Basics: Working with Spreadsheets Excel Basics: Working with Spreadsheets E 890 / 1 Unravel the Mysteries of Cells, Rows, Ranges, Formulas and More Spreadsheets are all about numbers: they help us keep track of figures and make calculations.

More information

CSI Lab 02. Tuesday, January 21st

CSI Lab 02. Tuesday, January 21st CSI Lab 02 Tuesday, January 21st Objectives: Explore some basic functionality of python Introduction Last week we talked about the fact that a computer is, among other things, a tool to perform high speed

More information

UNIVERSITY OF CALIFORNIA, SANTA CRUZ BOARD OF STUDIES IN COMPUTER ENGINEERING

UNIVERSITY OF CALIFORNIA, SANTA CRUZ BOARD OF STUDIES IN COMPUTER ENGINEERING UNIVERSITY OF CALIFORNIA, SANTA CRUZ BOARD OF STUDIES IN COMPUTER ENGINEERING CMPE13/L: INTRODUCTION TO PROGRAMMING IN C SPRING 2012 Lab 3 Matrix Math Introduction Reading In this lab you will write a

More information

Hi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we

Hi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we Hi everyone. I hope everyone had a good Fourth of July. Today we're going to be covering graph search. Now, whenever we bring up graph algorithms, we have to talk about the way in which we represent the

More information

R in Linguistic Analysis. Week 2 Wassink Autumn 2012

R in Linguistic Analysis. Week 2 Wassink Autumn 2012 R in Linguistic Analysis Week 2 Wassink Autumn 2012 Today R fundamentals The anatomy of an R help file but first... How did you go about learning the R functions in the reading? More help learning functions

More information

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html Intro to R R is a functional programming language, which means that most of what one does is apply functions to objects. We will begin with a brief introduction to R objects and how functions work, and

More information

(Refer Slide Time 6:48)

(Refer Slide Time 6:48) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology Madras Lecture - 8 Karnaugh Map Minimization using Maxterms We have been taking about

More information

How to Improve Your Campaign Conversion Rates

How to Improve Your  Campaign Conversion Rates How to Improve Your Email Campaign Conversion Rates Chris Williams Author of 7 Figure Business Models How to Exponentially Increase Conversion Rates I'm going to teach you my system for optimizing an email

More information

Light Speed with Excel

Light Speed with Excel Work @ Light Speed with Excel 2018 Excel University, Inc. All Rights Reserved. http://beacon.by/magazine/v4/94012/pdf?type=print 1/64 Table of Contents Cover Table of Contents PivotTable from Many CSV

More information

MULTIVARIATE ANALYSIS USING R

MULTIVARIATE ANALYSIS USING R MULTIVARIATE ANALYSIS USING R B N Mandal I.A.S.R.I., Library Avenue, New Delhi 110 012 bnmandal @iasri.res.in 1. Introduction This article gives an exposition of how to use the R statistical software for

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Introduction This handout briefly outlines most of the basic uses and functions of Excel that we will be using in this course. Although Excel may be used for performing statistical

More information

Assignment 3, Due October 4

Assignment 3, Due October 4 Assignment 3, Due October 4 1 Summary This assignment gives you practice with writing shell scripts. Shell scripting is also known as bash programming. Your shell is bash, and when you write a shell script

More information

BIOL 417: Biostatistics Laboratory #3 Tuesday, February 8, 2011 (snow day February 1) INTRODUCTION TO MYSTAT

BIOL 417: Biostatistics Laboratory #3 Tuesday, February 8, 2011 (snow day February 1) INTRODUCTION TO MYSTAT BIOL 417: Biostatistics Laboratory #3 Tuesday, February 8, 2011 (snow day February 1) INTRODUCTION TO MYSTAT Go to the course Blackboard site and download Laboratory 3 MYSTAT Intro.xls open this file in

More information

Lecture 1: Getting Started and Data Basics

Lecture 1: Getting Started and Data Basics Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate

More information

JUnit Test Patterns in Rational XDE

JUnit Test Patterns in Rational XDE Copyright Rational Software 2002 http://www.therationaledge.com/content/oct_02/t_junittestpatternsxde_fh.jsp JUnit Test Patterns in Rational XDE by Frank Hagenson Independent Consultant Northern Ireland

More information

Lab 4. Recall, from last lab the commands Table[], ListPlot[], DiscretePlot[], and Limit[]. Go ahead and review them now you'll be using them soon.

Lab 4. Recall, from last lab the commands Table[], ListPlot[], DiscretePlot[], and Limit[]. Go ahead and review them now you'll be using them soon. Calc II Page 1 Lab 4 Wednesday, February 19, 2014 5:01 PM Recall, from last lab the commands Table[], ListPlot[], DiscretePlot[], and Limit[]. Go ahead and review them now you'll be using them soon. Use

More information

Wednesday. Wednesday, September 17, CS 1251 Page 1

Wednesday. Wednesday, September 17, CS 1251 Page 1 CS 1251 Page 1 Wednesday Wednesday, September 17, 2014 8:20 AM Here's another good JavaScript practice site This site approaches things from yet another point of view it will be awhile before we cover

More information

Statistical Software Camp: Introduction to R

Statistical Software Camp: Introduction to R Statistical Software Camp: Introduction to R Day 1 August 24, 2009 1 Introduction 1.1 Why Use R? ˆ Widely-used (ever-increasingly so in political science) ˆ Free ˆ Power and flexibility ˆ Graphical capabilities

More information

Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015

Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015 Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015 Contents Things to Know Before You Begin.................................... 1 Entering and Outputting Data......................................

More information

Introduction to Access 97/2000

Introduction to Access 97/2000 Introduction to Access 97/2000 PowerPoint Presentation Notes Slide 1 Introduction to Databases (Title Slide) Slide 2 Workshop Ground Rules Slide 3 Objectives Here are our objectives for the day. By the

More information

Remember, this question was mis-worded: you could also add quoted words and sentences in the blanks below. This allowed for a solution to [4] below.

Remember, this question was mis-worded: you could also add quoted words and sentences in the blanks below. This allowed for a solution to [4] below. CS3 Fall 04 Midterm 1 Solutions and Grading standards Problem 1 (6 points, 10 minutes): Make the expressions correct Add parentheses and procedures in the underlined areas to make these expressions return

More information

Matlab Introduction. Scalar Variables and Arithmetic Operators

Matlab Introduction. Scalar Variables and Arithmetic Operators Matlab Introduction Matlab is both a powerful computational environment and a programming language that easily handles matrix and complex arithmetic. It is a large software package that has many advanced

More information

SISG/SISMID Module 3

SISG/SISMID Module 3 SISG/SISMID Module 3 Introduction to R Ken Rice Tim Thornton University of Washington Seattle, July 2018 Introduction: Course Aims This is a first course in R. We aim to cover; Reading in, summarizing

More information

Introduction to MATLAB

Introduction to MATLAB Chapter 1 Introduction to MATLAB 1.1 Software Philosophy Matrix-based numeric computation MATrix LABoratory built-in support for standard matrix and vector operations High-level programming language Programming

More information

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0. L J Howell UX Software Ver. 1.0

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0. L J Howell UX Software Ver. 1.0 VISUAL GUIDE to RX Scripting for Roulette Xtreme - System Designer 2.0 L J Howell UX Software 2009 Ver. 1.0 TABLE OF CONTENTS INTRODUCTION...ii What is this book about?... iii How to use this book... iii

More information

Formulas, LookUp Tables and PivotTables Prepared for Aero Controlex

Formulas, LookUp Tables and PivotTables Prepared for Aero Controlex Basic Topics: Formulas, LookUp Tables and PivotTables Prepared for Aero Controlex Review ribbon terminology such as tabs, groups and commands Navigate a worksheet, workbook, and multiple workbooks Prepare

More information

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center What is R? R is a statistical computing environment with graphics capabilites It is fully scriptable

More information

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation Chapter 7 Introduction to Matrices This chapter introduces the theory and application of matrices. It is divided into two main sections. Section 7.1 discusses some of the basic properties and operations

More information

Monday. A few notes on homework I want ONE spreadsheet with TWO tabs

Monday. A few notes on homework I want ONE spreadsheet with TWO tabs CS 1251 Page 1 Monday Sunday, September 14, 2014 2:38 PM A few notes on homework I want ONE spreadsheet with TWO tabs What has passed before We ended last class with you creating a function called givemeseven()

More information

MITOCW ocw f99-lec07_300k

MITOCW ocw f99-lec07_300k MITOCW ocw-18.06-f99-lec07_300k OK, here's linear algebra lecture seven. I've been talking about vector spaces and specially the null space of a matrix and the column space of a matrix. What's in those

More information

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 04 Programs with IO and Loop We will now discuss the module 2,

More information

1 Lecture 5: Advanced Data Structures

1 Lecture 5: Advanced Data Structures L5 June 14, 2017 1 Lecture 5: Advanced Data Structures CSCI 1360E: Foundations for Informatics and Analytics 1.1 Overview and Objectives We ve covered list, tuples, sets, and dictionaries. These are the

More information

Slide 1 CS 170 Java Programming 1 Multidimensional Arrays Duration: 00:00:39 Advance mode: Auto

Slide 1 CS 170 Java Programming 1 Multidimensional Arrays Duration: 00:00:39 Advance mode: Auto CS 170 Java Programming 1 Working with Rows and Columns Slide 1 CS 170 Java Programming 1 Duration: 00:00:39 Create a multidimensional array with multiple brackets int[ ] d1 = new int[5]; int[ ][ ] d2;

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

6.001 Notes: Section 6.1

6.001 Notes: Section 6.1 6.001 Notes: Section 6.1 Slide 6.1.1 When we first starting talking about Scheme expressions, you may recall we said that (almost) every Scheme expression had three components, a syntax (legal ways of

More information

CIS4/681 { Articial Intelligence 2 > (insert-sort '( )) ( ) 2 More Complicated Recursion So far everything we have dened requires

CIS4/681 { Articial Intelligence 2 > (insert-sort '( )) ( ) 2 More Complicated Recursion So far everything we have dened requires 1 A couple of Functions 1 Let's take another example of a simple lisp function { one that does insertion sort. Let us assume that this sort function takes as input a list of numbers and sorts them in ascending

More information

Topic C. Communicating the Precision of Measured Numbers

Topic C. Communicating the Precision of Measured Numbers Topic C. Communicating the Precision of Measured Numbers C. page 1 of 14 Topic C. Communicating the Precision of Measured Numbers This topic includes Section 1. Reporting measurements Section 2. Rounding

More information

USE IBM IN-DATABASE ANALYTICS WITH R

USE IBM IN-DATABASE ANALYTICS WITH R USE IBM IN-DATABASE ANALYTICS WITH R M. WURST, C. BLAHA, A. ECKERT, IBM GERMANY RESEARCH AND DEVELOPMENT Introduction To process data, most native R functions require that the data first is extracted from

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

A Tour of Sweave. Max Kuhn. March 14, Pfizer Global R&D Non Clinical Statistics Groton

A Tour of Sweave. Max Kuhn. March 14, Pfizer Global R&D Non Clinical Statistics Groton A Tour of Sweave Max Kuhn Pfizer Global R&D Non Clinical Statistics Groton March 14, 2011 Creating Data Analysis Reports For most projects where we need a written record of our work, creating the report

More information

STATISTICAL TECHNIQUES. Interpreting Basic Statistical Values

STATISTICAL TECHNIQUES. Interpreting Basic Statistical Values STATISTICAL TECHNIQUES Interpreting Basic Statistical Values INTERPRETING BASIC STATISTICAL VALUES Sample representative How would one represent the average or typical piece of information from a given

More information

Introduction to R. base -> R win32.exe (this will change depending on the latest version)

Introduction to R. base -> R win32.exe (this will change depending on the latest version) Dr Raffaella Calabrese, Essex Business School 1. GETTING STARTED Introduction to R R is a powerful environment for statistical computing which runs on several platforms. R is available free of charge.

More information

Memory Addressing, Binary, and Hexadecimal Review

Memory Addressing, Binary, and Hexadecimal Review C++ By A EXAMPLE Memory Addressing, Binary, and Hexadecimal Review You do not have to understand the concepts in this appendix to become well-versed in C++. You can master C++, however, only if you spend

More information

MITOCW watch?v=w_-sx4vr53m

MITOCW watch?v=w_-sx4vr53m MITOCW watch?v=w_-sx4vr53m The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

(Refer Slide Time: 1:27)

(Refer Slide Time: 1:27) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data

More information

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors. 1 LECTURE 3 OUTLINES Variable names in MATLAB Examples Matrices, Vectors and Scalar Scalar Vectors Entering a vector Colon operator ( : ) Mathematical operations on vectors examples 2 VARIABLE NAMES IN

More information

MITOCW watch?v=4dj1oguwtem

MITOCW watch?v=4dj1oguwtem MITOCW watch?v=4dj1oguwtem PROFESSOR: So it's time to examine uncountable sets. And that's what we're going to do in this segment. So Cantor's question was, are all sets the same size? And he gives a definitive

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides

Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides for both problems first, and let you guys code them

More information

Excel Basics Fall 2016

Excel Basics Fall 2016 If you have never worked with Excel, it can be a little confusing at first. When you open Excel, you are faced with various toolbars and menus and a big, empty grid. So what do you do with it? The great

More information

Introduction to R. Introduction to Econometrics W

Introduction to R. Introduction to Econometrics W Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,

More information

Introduction to R. Nishant Gopalakrishnan, Martin Morgan January, Fred Hutchinson Cancer Research Center

Introduction to R. Nishant Gopalakrishnan, Martin Morgan January, Fred Hutchinson Cancer Research Center Introduction to R Nishant Gopalakrishnan, Martin Morgan Fred Hutchinson Cancer Research Center 19-21 January, 2011 Getting Started Atomic Data structures Creating vectors Subsetting vectors Factors Matrices

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction Why I Am Writing This: Why I am I writing a set of tutorials on compilers and how to build them? Well, the idea goes back several years ago when Rapid-Q, one of the best free BASIC

More information

ECE Lesson Plan - Class 1 Fall, 2001

ECE Lesson Plan - Class 1 Fall, 2001 ECE 201 - Lesson Plan - Class 1 Fall, 2001 Software Development Philosophy Matrix-based numeric computation - MATrix LABoratory High-level programming language - Programming data type specification not

More information

COPYRIGHTED MATERIAL. Starting Strong with Visual C# 2005 Express Edition

COPYRIGHTED MATERIAL. Starting Strong with Visual C# 2005 Express Edition 1 Starting Strong with Visual C# 2005 Express Edition Okay, so the title of this chapter may be a little over the top. But to be honest, the Visual C# 2005 Express Edition, from now on referred to as C#

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

LOOPS. Repetition using the while statement

LOOPS. Repetition using the while statement 1 LOOPS Loops are an extremely useful feature in any programming language. They allow you to direct the computer to execute certain statements more than once. In Python, there are two kinds of loops: while

More information

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT NAVAL POSTGRADUATE SCHOOL LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT Statistics (OA3102) Lab #2: Sampling, Sampling Distributions, and the Central Limit Theorem Goal: Use R to demonstrate sampling

More information

These are notes for the third lecture; if statements and loops.

These are notes for the third lecture; if statements and loops. These are notes for the third lecture; if statements and loops. 1 Yeah, this is going to be the second slide in a lot of lectures. 2 - Dominant language for desktop application development - Most modern

More information

Learn a lot beyond the conventional VLOOKUP

Learn a lot beyond the conventional VLOOKUP The Ultimate Guide Learn a lot beyond the conventional VLOOKUP Hey there, Howdy? =IF ( you are first timer at Goodly, Then a very warm welcome here, Else for all my regular folks you know I love you :D

More information

C++ Reference NYU Digital Electronics Lab Fall 2016

C++ Reference NYU Digital Electronics Lab Fall 2016 C++ Reference NYU Digital Electronics Lab Fall 2016 Updated on August 24, 2016 This document outlines important information about the C++ programming language as it relates to NYU s Digital Electronics

More information

ECON 502 INTRODUCTION TO MATLAB Nov 9, 2007 TA: Murat Koyuncu

ECON 502 INTRODUCTION TO MATLAB Nov 9, 2007 TA: Murat Koyuncu ECON 502 INTRODUCTION TO MATLAB Nov 9, 2007 TA: Murat Koyuncu 0. What is MATLAB? 1 MATLAB stands for matrix laboratory and is one of the most popular software for numerical computation. MATLAB s basic

More information

This exam is worth 30 points, or 18.75% of your total course grade. The exam contains

This exam is worth 30 points, or 18.75% of your total course grade. The exam contains CS 60A Final May 16, 1992 Your name Discussion section number TA's name This exam is worth 30 points, or 18.75% of your total course grade. The exam contains six questions. This booklet contains eleven

More information

6.001 Notes: Section 7.1

6.001 Notes: Section 7.1 6.001 Notes: Section 7.1 Slide 7.1.1 In the past few lectures, we have seen a series of tools for helping us create procedures to compute a variety of computational processes. Before we move on to more

More information