Instructions and Result Summary

Size: px
Start display at page:

Download "Instructions and Result Summary"

Transcription

1 Instructions and Result Summary VU Biostatistics and Experimental Design PLA.216 Exercise 1 Introduction to R & Biostatistics Name and Student ID MAXIMILIANE MUSTERFRAU Name and Student ID JOHN DUMMY Work in teams of 2 students only!

2 0. General Information!!Read this paragraph carefully!! Aim: This section aims at providing general information about the use of specific programs or functions, installation instructions and other tips and tricks. Please read carefully. R Studio: RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. # The program is installed on your computer. # Start the program # if you want to clean up everything, this is how you do it # but be careful it deletes everything rm(list=ls()) R code: Some of the code snippets for this exercise are available in this document. Run each line step by step in R Studio to make sure you understand what it does and complete the questions in this template. Report template: You will need to write a report during this exercise and it to us after the session. Save your report regularly. After the session it as PDF to biostatistik@genome.tugraz.at. Do not forget to state your names in the report! Install R packages: Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. # install.packages("<nameofpackage>") # install.packages("ggplot2") # should be installed already Install Bioconductor R packages: source(" # bioclite("<nameofpackage>") Load R packages: # load the following packages # library(<nameofpackage>) library("ggplot2") Working directory: Create a folder L:/Biostatistics/Ex1/" in your home directory before you run the code below. This is the folder where you will find your result files! Also save your R file here. The raw data file (CSV) is available at

3 You can either download the raw file to this folder or load it via URL. # set working directory # also save your R file here! # write your result files to this directory! setwd("l:/biostatistics/ex1/") R help: Use the help to retrieve more information about a function, class or package. Use? to access the help: #?<nameoffunction> Save an image/figure in R: To save an image as PDF, do the following: # filename is the name (and path) of your file! # pdf(file = filename) # <create plot here your code here> # dev.off() # Or just use Export/Save Plot as PDF in the Plots pane. Read your data table from disk: Use e.g. the read.table() function. #?write.table # tab-separated # read.table(file="filenamehere.txt", sep="\t") # comma-separated # read.csv(file="filenamehere.csv") # read.table(file="filenamehere.txt", sep=",") Write your result table to disk: Use e.g. the write.table() function. #?write.table # tab-separated # write.table(dataframe, file="filenamehere.txt", sep="\t") # comma-separated # write.csv(dataframe, file="filenamehere.csv") Don t worry, we will help you through the exercise!

4 1. Data frames in R The beginning of a data analysis usually starts with getting a table of data loaded into R. Here we have a Comma Separated Values (CSV) file. Excel sheets can be converted to CSV files and CSV files can be easily read into R. More about this file format is available here: edoceo.com/utilitas/csv-file-format Read CSV-files into R Let's start here with a CSV file of mammalian sleep data. Read the CSV file msleep.csv into R using the function read.csv(). Call the data.frame tab. More information on the dataset: tab <- read.csv("msleep.csv") # have you set the working directory? The variable tab has the class data.frame, which is R's name for a table of data. class(tab) Two useful things to know are: what does the top of the dataframe look like, what are the dimensions of the dataframe and what is the structure of the dataframe? head(tab) dim(tab) str(tab) Type?read.csv() and read the description of the arguments there. Note that the header was assumed to exist because of the argument header=true. If the CSV file did not have a header, the first line of data would be taken as the header. The fix for this would be to specify the argument header=false. The $ operator We can get a column of the data from a dataframe by typing the name of the dataframe followed by a $ symbol and the name of the column with no spaces in between. First get the column names using colnames(tab) and then extract one of the columns. The column will be returned as a vector of numbers. Try using autocompletion on the column name using the TAB key on your keyboard. Type the name of the dataframe and a $ symbol followed by the first few letters of the column and then hit TAB.

5 What is the name of the first animal in the table? The name of the first animal is Cheetah Vectors can be combined using the function c(). For example, we can add a number, 12, to the sleep totals: c(tab$sleep_total, 12) The summary() function gives the summary statistics of a set of values. summary(tab$sleep_total) What is the 3rd quartile of the total sleep of all the animals? The 3rd quantile of the total sleep of the animals is hours. Indexing and Subsetting Subsetting a dataframe to the first two rows: tab[ c(1,2), ] The rows where the total sleep is greater than 18 hours: tab[ tab$sleep_total > 18, ] Subsetting a vector looks very similar, but we just remove the comma (because there are no columns now). The first two elements can be subset like so: tab$sleep_total[ c(1,2) ] What is the average total sleep, using the function mean() and vector subsetting, for the animals with total sleep smaller than or equal to 10 hours? The average total sleep for these animals is 6.67 hours. The function which() gives us the numeric index that satisfies a logical question: which(tab$sleep_total > 18)

6 For example, let s say we want to get the first value where the total sleep was more than 18 hours. This combines three operations: which() gives the number of values which have total sleep more than 18 hours, then on the right side, we index this vector with [1] to get the first number. Then we index the original vector with that number. Take a while to look over this and take the command apart to understand what is going on: tab$sleep_total[ which(tab$sleep_total > 18)[1] ] We can also combine two logical vectors and use which() to see the rows that satisfy both criteria. Logical conditions are added using the ampersand symbol: & (logical AND). What is the row number of the animal, which has more than 18 hours of total sleep and less than 3 hours of REM sleep? The row number is 43. Also try with instead of & and explain the results. is the logical symbol for OR, so the command results to all the animals that have either 18 h of total sleep OR less than 3 hours of REM sleep, or both. The function subset() provides another possibility to subset vectors, matrices and dataframes by a condition. The following code line for example reduces the dataframe to contain only 3 columns: "order", name and total_sleep and only 22 rows with order being Rodentia. subset(tab, subset = tab$order == "Rodentia", select = c("order", "name", "sleep_total")) Use the function subset() to obtain a new dataframe, which contains only rows where order is Primates with the columns name, total_sleep and bodywt. How many rows does the new dataframe contain? The subset contains 12 rows. Now save the dataframe you just created to a tab-separated file (file extension.txt) using the function write.table().

7 Consult the help page of the function by typing?write.table() in the console. Set the parameters to avoid printing the row names. Also set the separator correctly. In the windows explorer navigate to the directory where you saved the.txt file and open it. Include the table here. "name" "sleep_total" "bodywt" "Owl monkey" "Grivet" "Patas monkey" "Galago" "Human" 8 62 "Mongoose lemur" "Macaque" "Slow loris" "Chimpanzee" "Baboon" "Potto" "Squirrel monkey"

8 2. Plotting Regular plots Let's go ahead and make a plot of the brain weight (brainwt) and the total sleep (sleep_total), to see what the data look like: plot(tab$brainwt, tab$sleep_total) Once more, with a logarithmic scale x-axis: plot(tab$brainwt, tab$sleep_total, log="x") abline(h=15) Add axis labels (name & unit) to the plot. Add a title. Change one graphical parameter, e.g. the color (?plot(),?par()). Add a horizontal line at y=15 using the abline() function. Include your plot and R code here:

9 plot(tab$brainwt, tab$sleep_total, log="x", col="#587498", main="sleep vs. Brain weight", ylab="total sleep [h]", xlab="brain weight [log(kg)]") abline(h=15) Save your plot as PDF using the pdf() function. State your code here. Hint: Look at 0. General Information for more information how to do this! pdf(file = "your_plot.pdf") [...your code for plotting...] dev.off() ggplots (BONUS) Let s try a different way to plot: ggplots. There are many tutorials for ggplots, e.g. In order to use ggplots, we need to load the package. If the package cannot be loaded you will have to install the package first. library(ggplot2) #install.packages("ggplot2") # install if you get an error The first line of code removes all rows with NA values in the brainwt column. Now let s plot the same as above. Every ggplot2 plot has a data layer, which defines the data set to plot (which would be tab3), and the basic mappings of data to aesthetic elements (aes(x,y)). Then we define the basic data-to-aesthetic mappings to add geometries to the data we would like to get a scatterplot (points) so geom_point(). tab3 <- tab[!is.na(tab$brainwt),] ggplot(tab3,aes(x=log(brainwt), y=sleep_total)) + geom_point() Have a look at how the plot can be manipulated using ggplots: ggplot(tab3,aes(x=log(brainwt), y=sleep_total, color=vore)) + geom_point() ggplot(tab3,aes(x=log(brainwt), y=sleep_total)) + geom_point(color="#587498")

10 g <- ggplot(tab3,aes(x=log(brainwt), y=sleep_total)) + geom_point() g1 <- g + geom_smooth(); print(g1) g2 <- g + geom_hline(yintercept = 10); print(g2) Add axis labels (name & unit) to the plot. Add a title. Add a horizontal line at y=15. Include your ggplot here (BONUS): g <- ggplot(tab3,aes(x=log(brainwt), y=sleep_total)) + geom_point(color="#587498") + labs(x = "Brain weight [log(kg)]", y = "Total sleep [h]"); print(g) g <- g + ggtitle("sleep vs. Brain weight"); print(g) g <- g + geom_hline(yintercept = 15); print(g)

11 3. For Loop Simple Examples A for loop can be used to iterate over the elements in a vector, the rows or columns in a matrix or a dataframe, the elements of a list etc. In each loop a block of code is executed on the current element. Let s look at a really simple example: for (i in 1:5){ print(paste("we are in the loop. Iteration #", i)) # another example x <- c(3,4,5,2); x for (){ y <- x[i] + 3 print(paste("y is", y)) When iterating over the elements of a vector in a for-loop the expressions in the code block within the for-loop are evaluated in each iteration. This is rather inefficient (can take very long) especially for a large number of elements (~ ). In R many functions are vectorized. Thanks to vectorization we do not need to use a for-loop to add 3 to a vector. We can simply replace the for-loop with x + 3 y <- x + 3; y cat(paste("y is", x + 3, "\n")) Now we will us an if statement, a logical NOT (!) and next. x <- c(3,4,5,2); x for (i in 1:length(x)){ y <- x[i] + 3 if (!(y %% 2)){ next print(paste("y is", y)) Try to understand what the line if (!y %% 2) does. Hint: modulus operand %%.

12 Also this could be vectorized, e.g. y <- (x + 3) %% 2 y <- x[y > 0] + 3 cat(paste("y is", y, "\n")) Explain the effect of the next statement on the For loop in one sentence. With the next statement the rest of the current iteration is skipped and loop goes to the next iteration. Write a For Loop Let s go back to our table. We can also iterate over the rows in our sleep dataset and subtract the REM sleep time from the total sleep time to obtain the non-rem sleep time. To do so, create a new vector with length equal to the number of rows. This has to be done outside the for-loop. sleep_other <- numeric(nrow(tab)) Note: The function nrow(tab) returns the number of rows of tab. With the function numeric() a vector of mode numeric and length equal to nrow(tab) is created. The elements of sleep_other are by default initialized with zero. Now iterate over the rows in the sleep dataset and store the difference of total sleep time and rem sleep time to the corresponding element in the vector. for ( ) { <your code here> str(sleep_other) State your R code here. for (i in 1:length(sleep_other)) { sleep_other[i] <- tab$sleep_total[i]-tab$sleep_rem[i]

13 If either the total sleep time or the REM sleep time are not available (NA) the difference cannot be calculated and NA is returned. Use the following code lines of nested functions to determine the number of NA values: length(which(is.na(sleep_other))) table(is.na(sleep_other)) How many values in sleep_other are NA? What does this mean? 22 values is sleep_other are na, this means for 22 values either the total sleep or rem sleep are not available and therefore sleep_other can not be calculated. Again, thanks to vectorization we do not need to use a for-loop to calculate the values in sleep_other. We can simply replace the for-loop with sleep_other2 <- tab$sleep_total tab$sleep_rem Double For Loops Let s have a look at a double for loop now (BONUS): # double for loop y <- c(5,6,2) for (i in 1:length(x)){ for (j in 1:length(y)){ z <- x[i] + y[j] + 3 print(paste("z is", z)) # here x + y + 3 does not work! Create a 15 x 15 matrix. For each row and for each column, assign the values of the matrix based on position using the product of the two indexes. When the indexes are equal, set to 1 using an if / else statement. Copy your code here. (BONUS) mat <- matrix(nrow=15, ncol=15) for(i in 1:dim(mat)[1]) { for(j in 1:dim(mat)[2]) { if (i==j) { mat[i,j] = 1 else {

14 mat[i,j] = i*j

15 4. Dataframe Manipulations Similar to the function c(), which concatenates the elements of vectors to a single vector, cbind() and rbind() can be used to concatenate vectors, matrices or dataframes into one single dataframe. To concatenate objects with rbind() i.e. to increase the number of rows they need to have the same number of columns. To concatenate objects with cbind() i.e. to increase the number of columns they need to have the same number of rows. Add a new column to the sleep dataset containing the sleep hours other than rem (sleep_other) using cbind(). tab2 <- cbind(tab, sleep_other) For reasons of clarity we would like all columns containing sleep hours to appear next to each other. Thus we have to reorder the columns in the dataframe. tab2 <- tab2[, c("name", "genus", "vore", "order", "conservation", "sleep_total", "sleep_rem", sleep_other, "sleep_cycle", "awake", "brainwt", "bodywt") ] Additionally, we want to reorder the rows of the dataframe so that the animal, which has the longest sleep of type other than REM is listed at the top of the table and the one with the shortest at the bottom. Sort and print out the number of sleeping hours other than REM with the longest sleep at the top using the sort() function (here you have a vector!). sort(tab2$sleep_other, decreasing = TRUE) How many hours (sleep_other) does the animal with the longest sleep other than REM sleep? It sleeps for 17.9 h (sleep_other). Reorder the rows in the dataframe to list the animals that sleep longest at the top of the table. tab2[ order(tab2$sleep_other, decreasing = TRUE), ] Use for example the function head() to display the top rows of the dataframe.

16 5. Useful Functions in R split split() is a function which takes a vector and splits it into a list by grouping the vector according to a factor. Let's use our mammal sleep data again to try this out. Split the total sleep column by the mammals Order (here Order means the biological taxonomy, above Family and below Class) s <- split(tab$sleep_total, tab$order); s We can pull out a single vector from the list using the name of the Order or the number that it occurs in the list (Note: this is where the level occurs in the levels of the factor). Lists are indexed with double square brackets [[]], instead of a single square bracket []: s[[17]] s[["rodentia"]] How many hours do rodents sleep (total sleep) on average? They sleep for hours on average. apply The family of apply() functions are used to manipulate slices of data from matrices, arrays, lists and dataframes in a repetitive way. lapply() and sapply() are useful functions for applying a function repeatedly to a vector or list. lapply() returns a list, while sapply() tries to "simplify", returning a vector if possible (if there is only one element returned by the function for each element of the input. Let's use lapply() to get the average total sleep for each Order: lapply(s, mean) As you can see, a list is returned. Let's use sapply() instead: sapply(s, mean) # the above is equivalent to sapply(s, function (x) { mean(x) )

17 Use any lapply() or sapply()to answer the following question: What is the standard deviation of total hours of sleep for the Primates Order? The standard deviaton of total hours of sleep of Primates is 2.21 hours. Use sapply() to search through the list s and retrieve all indexes where the value equals to State your R code and the results here. (BONUS) sapply(s, function (x) { which(x == 10.1) ) $Carnivora [1] 3 $Erinaceomorpha [1] 1 $Primates [1] 7

18 6. User defined Functions A Simple Example One of the strengths of R is the ability to add functions. The syntax of a function looks like this in R: # myfunction <- function(arg1, arg2,... ){ # statements # return(object) # Now let s write our first function! First we have to define the function and give it a name. We will call it square.value(). Our function will simply compute the square value of a given value. # simple example # define function square.value <- function(x) { sqval <- x*x return(sqval) Now that we have defined our function, we can call it with a value of our choosing as argument. # call function square.value(4) Note that our function is already vectorized: x <- c(3,4,5,2); x square.value(x) Write your own function, which first takes the square root of a given value second adds 10 to the result of the above. State your R code here. calculate.value <- function(x) { r <- sqrt(x) val <- r + 10

19 return(val) calculate.value(x) Write your own function, which first takes the mean of a given vector (arg1), second adds a given value (arg2) to the result of the above third takes the log2 of the result of the above State your R code here. (BONUS) calculate.value <- function(x,y) { m <- mean(x) addval <- m + y endval <- log(addval,2) return(endval) calculate.value(x, 4)

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri)

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri) R: BASICS Andrea Passarella (plus some additions by Salvatore Ruggieri) BASIC CONCEPTS R is an interpreted scripting language Types of interactions Console based Input commands into the console Examine

More information

Pandas III: Grouping and Presenting Data

Pandas III: Grouping and Presenting Data Lab 8 Pandas III: Grouping and Presenting Data Lab Objective: Learn about Pivot tables, groupby, etc. Introduction Pandas originated as a wrapper for numpy that was developed for purposes of data analysis.

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

Lab 1: Getting started with R and RStudio Questions? or

Lab 1: Getting started with R and RStudio Questions? or Lab 1: Getting started with R and RStudio Questions? david.montwe@ualberta.ca or isaacren@ualberta.ca 1. Installing R and RStudio To install R, go to https://cran.r-project.org/ and click on the Download

More information

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here: Lab 1. Introduction to R & SAS R is free, open-source software. Get it here: http://tinyurl.com/yfet8mj for your own computer. 1.1. Using R like a calculator Open R and type these commands into the R Console

More information

Introduction to Statistics using R/Rstudio

Introduction to Statistics using R/Rstudio Introduction to Statistics using R/Rstudio R and Rstudio Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations sent) R on Windows R on MACs

More information

Mails : ; Document version: 14/09/12

Mails : ; Document version: 14/09/12 Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary

More information

R Tutorial. Anup Aprem September 13, 2016

R Tutorial. Anup Aprem September 13, 2016 R Tutorial Anup Aprem aaprem@ece.ubc.ca September 13, 2016 Installation Installing R: https://www.r-project.org/ Recommended to also install R Studio: https://www.rstudio.com/ Vectors Basic element is

More information

Introduction to R Commander

Introduction to R Commander Introduction to R Commander 1. Get R and Rcmdr to run 2. Familiarize yourself with Rcmdr 3. Look over Rcmdr metadata (Fox, 2005) 4. Start doing stats / plots with Rcmdr Tasks 1. Clear Workspace and History.

More information

Introduction to R. Introduction to Econometrics W

Introduction to R. Introduction to Econometrics W Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,

More information

R: A Gentle Introduction. Vega Bharadwaj George Mason University Data Services

R: A Gentle Introduction. Vega Bharadwaj George Mason University Data Services R: A Gentle Introduction Vega Bharadwaj George Mason University Data Services Part I: Why R? What do YOU know about R and why do you want to learn it? Reasons to use R Free and open-source User-created

More information

R in Linguistic Analysis. Week 2 Wassink Autumn 2012

R in Linguistic Analysis. Week 2 Wassink Autumn 2012 R in Linguistic Analysis Week 2 Wassink Autumn 2012 Today R fundamentals The anatomy of an R help file but first... How did you go about learning the R functions in the reading? More help learning functions

More information

Introduction to R. Daniel Berglund. 9 November 2017

Introduction to R. Daniel Berglund. 9 November 2017 Introduction to R Daniel Berglund 9 November 2017 1 / 15 R R is available at the KTH computers If you want to install it yourself it is available at https://cran.r-project.org/ Rstudio an IDE for R is

More information

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015 R syntax guide Richard Gonzalez Psychology 613 August 27, 2015 This handout will help you get started with R syntax. There are obviously many details that I cannot cover in these short notes but these

More information

Introduction to R Programming

Introduction to R Programming Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data

More information

Lecture 1: Getting Started and Data Basics

Lecture 1: Getting Started and Data Basics Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate

More information

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012 An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.

More information

Vectors and Matrices Flow Control Plotting Functions Simulating Systems Installing Packages Getting Help Assignments. R Tutorial

Vectors and Matrices Flow Control Plotting Functions Simulating Systems Installing Packages Getting Help Assignments. R Tutorial R Tutorial Anup Aprem aaprem@ece.ubc.ca September 14, 2017 Installation Installing R: https://www.r-project.org/ Recommended to also install R Studio: https://www.rstudio.com/ Vectors Basic element is

More information

R basics workshop Sohee Kang

R basics workshop Sohee Kang R basics workshop Sohee Kang Math and Stats Learning Centre Department of Computer and Mathematical Sciences Objective To teach the basic knowledge necessary to use R independently, thus helping participants

More information

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center What is R? R is a statistical computing environment with graphics capabilites It is fully scriptable

More information

Lecture 3: Basics of R Programming

Lecture 3: Basics of R Programming Lecture 3: Basics of R Programming This lecture introduces you to how to do more things with R beyond simple commands. Outline: 1. R as a programming language 2. Grouping, loops and conditional execution

More information

Getting Started. Slides R-Intro: R-Analytics: R-HPC:

Getting Started. Slides R-Intro:   R-Analytics:   R-HPC: Getting Started Download and install R + Rstudio http://www.r-project.org/ https://www.rstudio.com/products/rstudio/download2/ TACC ssh username@wrangler.tacc.utexas.edu % module load Rstats %R Slides

More information

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic

More information

R Basics / Course Business

R Basics / Course Business R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

POL 345: Quantitative Analysis and Politics

POL 345: Quantitative Analysis and Politics POL 345: Quantitative Analysis and Politics Precept Handout 1 Week 2 (Verzani Chapter 1: Sections 1.2.4 1.4.31) Remember to complete the entire handout and submit the precept questions to the Blackboard

More information

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017 R and parallel libraries Introduction to R for data analytics Bologna, 26/06/2017 Outline Overview What is R R Console Input and Evaluation Data types R Objects and Attributes Vectors and Lists Matrices

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

An Introduction to Statistical Computing in R

An Introduction to Statistical Computing in R An Introduction to Statistical Computing in R K2I Data Science Boot Camp - Day 1 AM Session May 15, 2017 Statistical Computing in R May 15, 2017 1 / 55 AM Session Outline Intro to R Basics Plotting In

More information

Introduction to R. Dataset Basics. March 2018

Introduction to R. Dataset Basics. March 2018 Introduction to R March 2018 1. Preliminaries.... a) Suggested packages for importing/exporting data.... b) FAQ: How to find the path of your dataset (or whatever). 2. Import/Export Data........ a) R (.Rdata)

More information

Introduction to R. Course in Practical Analysis of Microarray Data Computational Exercises

Introduction to R. Course in Practical Analysis of Microarray Data Computational Exercises Introduction to R Course in Practical Analysis of Microarray Data Computational Exercises 2010 March 22-26, Technischen Universität München Amin Moghaddasi, Kurt Fellenberg 1. Installing R. Check whether

More information

Statistics for Biologists: Practicals

Statistics for Biologists: Practicals Statistics for Biologists: Practicals Peter Stoll University of Basel HS 2012 Peter Stoll (University of Basel) Statistics for Biologists: Practicals HS 2012 1 / 22 Outline Getting started Essentials of

More information

R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website:

R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website: Introduction to R R R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website: http://www.r-project.org/ Code Editor: http://rstudio.org/

More information

A Brief Introduction to R

A Brief Introduction to R A Brief Introduction to R Babak Shahbaba Department of Statistics, University of California, Irvine, USA Chapter 1 Introduction to R 1.1 Installing R To install R, follow these steps: 1. Go to http://www.r-project.org/.

More information

Lecture 3: Basics of R Programming

Lecture 3: Basics of R Programming Lecture 3: Basics of R Programming This lecture introduces how to do things with R beyond simple commands. We will explore programming in R. What is programming? It is the act of instructing a computer

More information

Recap From Last Time:

Recap From Last Time: BIMM 143 More on R functions and packages Lecture 7 Barry Grant http://thegrantlab.org/bimm143 Office hour check-in! Recap From Last Time: Covered data input with the read.table() family of functions including

More information

Tutorial: SeqAPass Boxplot Generator

Tutorial: SeqAPass Boxplot Generator 1 Tutorial: SeqAPass Boxplot Generator 1. Access SeqAPASS by opening https://seqapass.epa.gov/seqapass/ using Mozilla Firefox web browser 2. Open the About link on the login page or upon logging in to

More information

STAT 540 Computing in Statistics

STAT 540 Computing in Statistics STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Fact Sheet No.1 MERLIN

Fact Sheet No.1 MERLIN Fact Sheet No.1 MERLIN Fact Sheet No.1: MERLIN Page 1 1 Overview MERLIN is a comprehensive software package for survey data processing. It has been developed for over forty years on a wide variety of systems,

More information

MATLAB TUTORIAL WORKSHEET

MATLAB TUTORIAL WORKSHEET MATLAB TUTORIAL WORKSHEET What is MATLAB? Software package used for computation High-level programming language with easy to use interactive environment Access MATLAB at Tufts here: https://it.tufts.edu/sw-matlabstudent

More information

Matlab Tutorial: Basics

Matlab Tutorial: Basics Matlab Tutorial: Basics Topics: opening matlab m-files general syntax plotting function files loops GETTING HELP Matlab is a program which allows you to manipulate, analyze and visualize data. MATLAB allows

More information

MATLAB Programming for Numerical Computation Dr. Niket Kaisare Department Of Chemical Engineering Indian Institute of Technology, Madras

MATLAB Programming for Numerical Computation Dr. Niket Kaisare Department Of Chemical Engineering Indian Institute of Technology, Madras MATLAB Programming for Numerical Computation Dr. Niket Kaisare Department Of Chemical Engineering Indian Institute of Technology, Madras Module No. #01 Lecture No. #1.1 Introduction to MATLAB programming

More information

Introduction to scientific programming in R

Introduction to scientific programming in R Introduction to scientific programming in R John M. Drake & Pejman Rohani 1 Introduction This course will use the R language programming environment for computer modeling. The purpose of this exercise

More information

Setup Mount the //geobase/geo4315 server and add a new Lab2 folder in your user folder.

Setup Mount the //geobase/geo4315 server and add a new Lab2 folder in your user folder. L AB 2 L AB M2 ICROSOFT E XCEL O FFICE W ORD, E XCEL & POWERP OINT XCEL & P For this lab, you will practice importing datasets into an Excel worksheet using different types of formatting. First, you will

More information

A brief introduction to R

A brief introduction to R A brief introduction to R Cavan Reilly September 29, 2017 Table of contents Background R objects Operations on objects Factors Input and Output Figures Missing Data Random Numbers Control structures Background

More information

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N.

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N. LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I Part Two Introduction to R Programming RStudio November 2016 Written by N.Nilgün Çokça Introduction to R Programming 5 Installing R & RStudio 5 The R Studio

More information

BGGN 213 Working with R packages Barry Grant

BGGN 213 Working with R packages Barry Grant BGGN 213 Working with R packages Barry Grant http://thegrantlab.org/bggn213 Recap From Last Time: Why it is important to visualize data during exploratory data analysis. Discussed data visualization best

More information

Introduction to R statistical environment

Introduction to R statistical environment Introduction to R statistical environment R Nano Course Series Aishwarya Gogate Computational Biologist I Green Center for Reproductive Biology Sciences History of R R is a free software environment for

More information

Tutorial (Unix Version)

Tutorial (Unix Version) Tutorial (Unix Version) S.f.Statistik, ETHZ February 26, 2010 Introduction This tutorial will give you some basic knowledge about working with R. It will also help you to familiarize with an environment

More information

Brief cheat sheet of major functions covered here. shoe<-c(8,7,8.5,6,10.5,11,7,6,12,10)

Brief cheat sheet of major functions covered here. shoe<-c(8,7,8.5,6,10.5,11,7,6,12,10) 1 Class 2. Handling data in R Creating, editing, reading, & exporting data frames; sorting, subsetting, combining Goals: (1) Creating matrices and dataframes: cbind and as.data.frame (2) Editing data:

More information

ADVANCED INQUIRIES IN ALBEDO: PART 2 EXCEL DATA PROCESSING INSTRUCTIONS

ADVANCED INQUIRIES IN ALBEDO: PART 2 EXCEL DATA PROCESSING INSTRUCTIONS ADVANCED INQUIRIES IN ALBEDO: PART 2 EXCEL DATA PROCESSING INSTRUCTIONS Once you have downloaded a MODIS subset, there are a few steps you must take before you begin analyzing the data. Directions for

More information

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor.

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor. Reading data into R There is a famous, but apocryphal, story about Mrs Beeton, the 19th century cook and writer, which says that she began her recipe for rabbit stew with the instruction First catch your

More information

the R environment The R language is an integrated suite of software facilities for:

the R environment The R language is an integrated suite of software facilities for: the R environment The R language is an integrated suite of software facilities for: Data Handling and storage Matrix Math: Manipulating matrices, vectors, and arrays Statistics: A large, integrated set

More information

R Primer for Introduction to Mathematical Statistics 8th Edition Joseph W. McKean

R Primer for Introduction to Mathematical Statistics 8th Edition Joseph W. McKean R Primer for Introduction to Mathematical Statistics 8th Edition Joseph W. McKean Copyright 2017 by Joseph W. McKean at Western Michigan University. All rights reserved. Reproduction or translation of

More information

Desktop Command window

Desktop Command window Chapter 1 Matlab Overview EGR1302 Desktop Command window Current Directory window Tb Tabs to toggle between Current Directory & Workspace Windows Command History window 1 Desktop Default appearance Command

More information

Introducion to R and parallel libraries. Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015

Introducion to R and parallel libraries. Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015 Introducion to R and parallel libraries Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015 Overview What is R R Console Input and Evaluation Data

More information

Introduction into R. A Short Overview. Thomas Girke. December 8, Introduction into R Slide 1/21

Introduction into R. A Short Overview. Thomas Girke. December 8, Introduction into R Slide 1/21 Introduction into R A Short Overview Thomas Girke December 8, 212 Introduction into R Slide 1/21 Introduction Look and Feel of the R Environment R Library Depositories Installation Getting Around Basic

More information

Stochastic Models. Introduction to R. Walt Pohl. February 28, Department of Business Administration

Stochastic Models. Introduction to R. Walt Pohl. February 28, Department of Business Administration Stochastic Models Introduction to R Walt Pohl Universität Zürich Department of Business Administration February 28, 2013 What is R? R is a freely-available general-purpose statistical package, developed

More information

PhotoSpread. Quick User s Manual. Stanford University

PhotoSpread. Quick User s Manual. Stanford University PS PhotoSpread Quick User s Manual Stanford University PhotoSpread Quick Introduction Guide 1.1 Introduction 1.2 Starting the application 1.3 The main application windows 1.4 Load Photos into a cell 1.5

More information

STA 248 S: Some R Basics

STA 248 S: Some R Basics STA 248 S: Some R Basics The real basics The R prompt > > # A comment in R. Data To make the variable x equal to 2 use > x x = 2 To make x a vector, use the function c() ( c for concatenate)

More information

Data Input/Output. Andrew Jaffe. January 4, 2016

Data Input/Output. Andrew Jaffe. January 4, 2016 Data Input/Output Andrew Jaffe January 4, 2016 Before we get Started: Working Directories R looks for files on your computer relative to the working directory It s always safer to set the working directory

More information

STAT 540: R: Sections Arithmetic in R. Will perform these on vectors, matrices, arrays as well as on ordinary numbers

STAT 540: R: Sections Arithmetic in R. Will perform these on vectors, matrices, arrays as well as on ordinary numbers Arithmetic in R R can be viewed as a very fancy calculator Can perform the ordinary mathematical operations: + - * / ˆ Will perform these on vectors, matrices, arrays as well as on ordinary numbers With

More information

Introduction to R 21/11/2016

Introduction to R 21/11/2016 Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced

More information

Bioinformatics Workshop - NM-AIST

Bioinformatics Workshop - NM-AIST Bioinformatics Workshop - NM-AIST Day 2 Introduction to R Thomas Girke July 24, 212 Bioinformatics Workshop - NM-AIST Slide 1/21 Introduction Look and Feel of the R Environment R Library Depositories Installation

More information

Homework 1 Excel Basics

Homework 1 Excel Basics Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the

More information

Short Introduction to R

Short Introduction to R Short Introduction to R Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México June, 2015. CIMMYT, México-SAGPDB Short Introduction to R 1/51 Contents 1 Introduction 2 Simple objects 3 User defined

More information

Lab 5, part b: Scatterplots and Correlation

Lab 5, part b: Scatterplots and Correlation Lab 5, part b: Scatterplots and Correlation Toews, Math 160, Fall 2014 November 21, 2014 Objectives: 1. Get more practice working with data frames 2. Start looking at relationships between two variables

More information

Introduction to MATLAB

Introduction to MATLAB to MATLAB Spring 2019 to MATLAB Spring 2019 1 / 39 The Basics What is MATLAB? MATLAB Short for Matrix Laboratory matrix data structures are at the heart of programming in MATLAB We will consider arrays

More information

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide Paper 809-2017 Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide ABSTRACT Marje Fecht, Prowerk Consulting Whether you have been programming in SAS for years, are new to

More information

Introduction to MATLAB

Introduction to MATLAB ELG 3125 - Lab 1 Introduction to MATLAB TA: Chao Wang (cwang103@site.uottawa.ca) 2008 Fall ELG 3125 Signal and System Analysis P. 1 Do You Speak MATLAB? MATLAB - The Language of Technical Computing ELG

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3A Visualizing Data By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to visualize data. If you intend to

More information

Ecffient calculations

Ecffient calculations Ecffient calculations Vectorized computations The efficiency of calculations depends on how you perform them. Vectorized calculations, for example, avoid going trough individual vector or matrix elements

More information

R (and S, and S-Plus, another program based on S) is an interactive, interpretive, function language.

R (and S, and S-Plus, another program based on S) is an interactive, interpretive, function language. R R (and S, and S-Plus, another program based on S) is an interactive, interpretive, function language. Available on Linux, Unix, Mac, and MS Windows systems. Documentation exists in several volumes, and

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

day one day four today day five day three Python for Biologists

day one day four today day five  day three Python for Biologists Overview day one today 0. introduction 1. text output and manipulation 2. reading and writing files 3. lists and loops 4. writing functions day three 5. conditional statements 6. dictionaries day four

More information

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows Oxford Spring School, April 2013 Effective Presentation ti Monday morning lecture: Crash Course in R Robert Andersen Department of Sociology University of Toronto And Dave Armstrong Department of Political

More information

IN-CLASS EXERCISE: INTRODUCTION TO R

IN-CLASS EXERCISE: INTRODUCTION TO R NAVAL POSTGRADUATE SCHOOL IN-CLASS EXERCISE: INTRODUCTION TO R Survey Research Methods Short Course Marine Corps Combat Development Command Quantico, Virginia May 2013 In-class Exercise: Introduction to

More information

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics BIO5312: R Session 1 An Introduction to R and Descriptive Statistics Yujin Chung August 30th, 2016 Fall, 2016 Yujin Chung R Session 1 Fall, 2016 1/24 Introduction to R R software R is both open source

More information

The Average and SD in R

The Average and SD in R The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant.

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant. BIMM 143 Data analysis with R Lecture 4 Barry Grant http://thegrantlab.org/bimm143 Recap From Last Time: Substitution matrices: Where our alignment match and mis-match scores typically come from Comparing

More information

Introduction to R Forecasting Techniques

Introduction to R Forecasting Techniques Introduction to R zabbeta@fsu.gr katerina@fsu.gr Starting out in R Working with data Plotting & Forecasting 1. Starting Out In R R & RStudio Variables & Basics Data Types Functions R + RStudio Programming

More information

Practice for Learning R and Learning Latex

Practice for Learning R and Learning Latex Practice for Learning R and Learning Latex Jennifer Pan August, 2011 Latex Environments A) Try to create the following equations: 1. 5+6 α = β2 2. P r( 1.96 Z 1.96) = 0.95 ( ) ( ) sy 1 r 2 3. ˆβx = r xy

More information

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics Introduction to S-Plus 1 Input: Data files For rectangular data files (n rows,

More information

Reading and writing data

Reading and writing data An introduction to WS 2017/2018 Reading and writing data Dr. Noémie Becker Dr. Sonja Grath Special thanks to: Prof. Dr. Martin Hutzenthaler and Dr. Benedikt Holtmann for significant contributions to course

More information

Outline. CSE 1570 Interacting with MATLAB. Starting MATLAB. Outline. MATLAB Windows. MATLAB Desktop Window. Instructor: Aijun An.

Outline. CSE 1570 Interacting with MATLAB. Starting MATLAB. Outline. MATLAB Windows. MATLAB Desktop Window. Instructor: Aijun An. CSE 170 Interacting with MATLAB Instructor: Aijun An Department of Computer Science and Engineering York University aan@cse.yorku.ca Outline Starting MATLAB MATLAB Windows Using the Command Window Some

More information

WEEK 8: FUNCTIONS AND LOOPS. 1. Functions

WEEK 8: FUNCTIONS AND LOOPS. 1. Functions WEEK 8: FUNCTIONS AND LOOPS THOMAS ELLIOTT 1. Functions Functions allow you to define a set of instructions and then call the code in a single line. In R, functions are defined much like any other object,

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Getting Started in R

Getting Started in R Getting Started in R Giles Hooker May 28, 2007 1 Overview R is a free alternative to Splus: a nice environment for data analysis and graphical exploration. It uses the objectoriented paradigm to implement

More information

SISG/SISMID Module 3

SISG/SISMID Module 3 SISG/SISMID Module 3 Introduction to R Ken Rice Tim Thornton University of Washington Seattle, July 2018 Introduction: Course Aims This is a first course in R. We aim to cover; Reading in, summarizing

More information

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 6 November 2009 3.00 pm BRAGG Cluster This document contains the tasks need to be done and completed by

More information

Lab #3: Probability, Simulations, Distributions:

Lab #3: Probability, Simulations, Distributions: Lab #3: Probability, Simulations, Distributions: A. Objectives: 1. Reading from an external file 2. Create contingency table 3. Simulate a probability distribution 4. The Uniform Distribution Reading from

More information

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW Introduction to R for Beginners, Level II Jeon Lee Bio-Informatics Core Facility (BICF), UTSW Basics of R Powerful programming language and environment for statistical computing Useful for very basic analysis

More information

Homework : Data Mining SOLUTIONS

Homework : Data Mining SOLUTIONS Homework 1 36-350: Data Mining SOLUTIONS 1. (a) What is the bag-of-words representation of the sentence to be or not to be? Answer: A vector with one component for each word in our dictionary, all of them

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

9. Writing Functions

9. Writing Functions 9. Writing Functions Ken Rice Thomas Lumley Universities of Washington and Auckland NYU Abu Dhabi, January 2017 In this session One of the most powerful features of R is the user s ability to expand existing

More information

Introduction to Scientific Computing with Matlab

Introduction to Scientific Computing with Matlab UNIVERSITY OF WATERLOO Introduction to Scientific Computing with Matlab SAW Training Course R. William Lewis Computing Consultant Client Services Information Systems & Technology 2007 Table of Contents

More information