Reading and wri+ng data

Similar documents
Reading and writing data

Data types and structures

Reading in data. Programming in R for Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen

Rearranging and manipula.ng data

Input/Output Data Frames

Data Input/Output. Andrew Jaffe. January 4, 2016

Intermediate Programming in R Session 1: Data. Olivia Lau, PhD

Reading and writing data

IST 3108 Data Analysis and Graphics Using R. Summarizing Data Data Import-Export

"no.loss"), FALSE) na.strings=c("na","#div/0!"), 72 # Ενσωματωμένες συναρτήσεις (build-in functions) του R

Basics of R. > x=2 (or x<-2) > y=x+3 (or y<-x+3)

Package swat. June 5, 2018

Introduction to the R Language

Package swat. March 5, 2018

Package swat. April 27, 2017

Basic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1)

An introduction to R WS 2013/2014

Importing data sets in R

The statistical software R

STAT 540 Computing in Statistics

What R is. STAT:5400 (22S:166) Computing in Statistics

Mails : ; Document version: 14/09/12

IMPORTING DATA IN R. Introduction read.csv

Programming for Chemical and Life Science Informatics

Module 4. Data Input. Andrew Jaffe Instructor

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to Statistics using R/Rstudio

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics

Introduction to R. Dataset Basics. March 2018

Introduction to R Commander

Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015

Package csvread. August 29, 2016

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

An introduction to WS 2015/2016

Brief cheat sheet of major functions covered here. shoe<-c(8,7,8.5,6,10.5,11,7,6,12,10)

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017

Introducion to R and parallel libraries. Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor.

Introduction to R: Data Types


Data Structures STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri)

IMPORTING DATA INTO R. Introduction Flat Files

Package rio. R topics documented: January 14, Type Package Title A Swiss-Army Knife for Data I/O Version 0.3.

Lecture 1: Getting Started and Data Basics

A Brief Introduction to R

Chapter 2 Getting Data into R

Stat 302 Statistical Software and Its Applications Data Import and Export

CS Introduction to Computational and Data Science. Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017

R basics workshop Sohee Kang

POL 345: Quantitative Analysis and Politics

SISG/SISMID Module 3

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018

PARTE XI: Introduzione all ambiente R - Panoramica

Stat405. More about data. Hadley Wickham. Tuesday, September 11, 12

The SQLiteDF Package

In this tutorial we will see some of the basic operations on data frames in R. We begin by first importing the data into an R object called train.

Introduction to R 21/11/2016

Data Input/Output. Introduction to R for Public Health Researchers

Text Mining with R: Building a Text Classifier

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics

Introduction to R. Le Yan HPC User LSU

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

ACHIEVEMENTS FROM TRAINING

R for Libraries. Session 2: Data Exploration. Clarke Iakovakis Scholarly Communications Librarian University of Houston-Clear Lake

GS Analysis of Microarray Data

An Introduction to Statistical Computing in R

Introduction to R statistical environment

R Basics / Course Business

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.

SML 201 Week 2 John D. Storey Spring 2016

An introduction to R WS 2013/2014

The foreign Package. October 3, 2007

Introduction to R and R-Studio Getting Data Into R. 1. Enter Data Directly into R...

Computer lab 2 Course: Introduction to R for Biologists

ICSSR Data Service Indian Social Science Data Repository R : User Guide Indian Council of Social Science Research

Data Import and Export

Introduction to R. Le Yan HPC User LSU. Some materials are borrowed from the Data Science course by John Hopkins University on Coursera.

Data Classes. Introduction to R for Public Health Researchers

Package clipr. June 23, 2018

PSS718 - Data Mining

Introduction to R. Le Yan HPC User LSU. Some materials are borrowed from the Data Science course by John Hopkins University on Coursera.

Floa.ng Point : Introduc;on to Computer Systems 4 th Lecture, Sep. 10, Instructors: Randal E. Bryant and David R.

6.GettingDataInandOutofR

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

Statistics for Biologists: Practicals

Outline. What are SAS (1966), Stata (1985), and SPSS (1968)? Migrating to R for SAS/SPSS/Stata Users

1. Introduction to R. 1.1 Introducing R

This document is designed to get you started with using R

Package LSDinterface

University of Torino, Italy Earth Science Department. R Summer School

A brief introduction to R

PcAux. Kyle M. Lang, Jacob Curtis, Daniel E Bontempo Institute for Measurement, Methodology, Analysis & Policy at Texas Tech University May 5, 2017

Outline. Mixed models in R using the lme4 package Part 1: Introduction to R. Following the operations on the slides

Reading Data in zoo. Gabor Grothendieck GKX Associates Inc. Achim Zeileis Universität Innsbruck

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

What is Stata? A programming language to do sta;s;cs Strongly influenced by economists Open source, sort of. An acceptable way to manage data

Draft Proof - do not copy, post, or distribute DATA MUNGING LEARNING OBJECTIVES

Transcription:

An introduc+on to Reading and wri+ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17

Course outline Day 4 Course outline Review Data types and structures Reading data How should data look like Impor+ng data into R Checking and cleaning data Common problems Wri+ng data

Review Data types and structures Crea+ng objects General form: variable <- value Examples: x <- 3 # The variable x is assigned the value 3 y <- x^2 +3 MyFunc+on <- sqrt

Review Data types and structures Data types Data type Descrip+on Example logical True or False TRUE, FALSE numeric real numbers or decimal 2.3, pi, sqrt(2) integer whole numbers -5L, 0L, 7L character character or string male, female complex complex numbers 2.1+3i, 1+0i

Review Data types and structures Vectors To create vectors, you can use the func+ons: c(), seq(), and rep() Examples: c(2,7,8,12,3,25) c(2:5, 3:6) seq(1, 8) seq(from=4, to=10, by=2) seq(4, 10, 2) rep(1, 4) rep(4:5, 3) rep(1:4, each = 2)

Review Data types and structures Coercion of vectors You also can coerce vectors using the func+ons: as.logical(x) as.integer(x) as.numeric(x) as.complex(x) as.character(x) as.factor(x) logical integer numeric complex character

Review Data types and structures Vectors To create vectors, you can use the func+ons: c(), seq(), and rep() Examples: c(2,7,8,12,3,25) c(2:5, 3:6) seq(1, 8) seq(from=4, to=10, by=2) seq(4, 10, 2) rep(1, 4) rep(4:5, 3) rep(1:4, each = 2)

Review Data types and structures Vectors To create vectors, you can use the func+ons: c(), seq(), and rep() Examples: c(2,7,8,12,3,25) c(2:5, 3:6) seq(1, 8) seq(from=4, to=10, by=2) seq(4, 10, 2) rep(1, 4) rep(4:5, 3) rep(1:4, each = 2)

Review Data types and structures Matrices To create matrices, you can use the func+ons: matrix(), dim(), and cbind() or rbind() Examples: m<- matrix(data=c(10:22), nrow = 2, ncol = 4) m<- matrix(10:22, 2, 4) # is the same y <- 1:6 dim(y) <- c(3, 2) cbind(1:3, 5:7) rbind(1:3, 5:7, 10:12)

Review Data types and structures To create data drames, you can use the func+ons: data.frame() Examples: Data frames df <- data.frame(id = 1:3, Sex = c("f", "F", "M"), Mass = c(17, 18, 18)) df_2 <- data.frame(x = 1:3, y = c("a", "b", "c"))

Review Data types and structures Data frames A data frame is a very important data type in R Data frames are displayed in a tabular layout A data frame contains scien+fic observa+ons and measurements, which are used for sta+s+cs

Review Data types and structures To access single or mul+ple elements of vectors, matrices and data frames you can use x[ ] Examples: x[7] x[c(1,4,9,12)] x[-2:-7] # excludes elements 2 to 7 z[2, 3] # row by column z[2, ] # 2nd row z[, 3] # 3rd column z[1:2, 1:2] Accessing data structures

Review Data types and structures Accessing data structures You can also access elements by name, using x[ ] or $ Examples: df[c("sex", "Mass")] df$mass

Workflow for reading and wri+ng data frames 1) Import your data 2) Check, clean and prepare your data (can be up to 80% of your project) 3) Conduct your analyses 4) Export your results 5) Clean R environment and close session

How should data look like? Columns should contain variables Rows should contain observa+ons, measurements, cases, etc. Use first row for the names of the variables Enter NA (in capitals) into cells represen+ng missing values You should avoid names (or fields or values) that contain spaces Store data as.csv or.txt files as those can be easily read into R

How should data look like? Bird_ID Sex Mass Wing Bird_1 F 17.45 75.0 Bird_2 F 18.20 75.0 Bird_3 M 18.45 78.25 Bird_4 F 17.36 NA Bird_5 M 18.90 84.0 Bird_6 M 19.16 81.83

Import data Import data using read.table() and read.csv() func+ons Examples: mydata<- read.table(file = "datafile.txt") mydata<- read.csv(file = "datafile.csv") # Creates a data frame named mydata

Import data Import data using read.table() and read.csv() func+ons Examples: mydata<- read.csv(file = "datafile.csv") # Error in file(file, "rt") : cannot open the connec+on # In addi+on: Warning message: # In file(file, "rt") : # cannot open file 'datafile.csv': No such file or directory Important: Set your working directory (setwd()) first, so that R uses the right folder to look for your data file!

Check?read.table or?read.csv Import data read.csv(file, header = FALSE, sep = " ", quote = "\"'", dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"), row.names, col.names, as.is =!stringsasfactors, na.strings = "NA", colclasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill =!blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowescapes = FALSE, flush = FALSE, stringsasfactors = default.stringsasfactors(), fileencoding = "", encoding = "unknown", text, skipnul = FALSE)

Import data Reduce errors when loading a data file The header = TRUE argument tells R that the first row of your file contains the variable names The sep =," argument tells R that fields are separated by comma The strip.white = TRUE argument removes white space before or aver factors that has been mistakenly insert during data entry (e.g. small vs. small_ become both small ) The na.strings = " " argument replaces empty cells by NA (missing data in R)

Import data Reduce errors when loading a data file mydata<- read.csv(file = "datafile.csv, header = TRUE, sep =,", strip.white = TRUE, na.strings = " ")

Import data R allows to us a URL in place of a filename in R: mydata<- read.csv(file = "h]p://environmentalcompu+ng.net/wpcontent/uploads/2016/05/snail_feeding.csv", header = TRUE, sep = ", ", strip.white = TRUE, na.strings = " ")

Missing and special values NA = not available Import data Inf and -Inf = posi+ve and nega+ve infinity NaN = Not a Number NULL = argument in func+ons that means that no value was assigned to the argument

Missing and special values Important command: is.na() v <- c(1, 3, NA, 5) is.na(v) [1] FALSE FALSE TRUE FALSE Ignore missing data: na.rm=true mean(v, na.rm=true) [1] 3 Import data

Import data from other programs File format func+on library ERSI ArcGIS read.shapefile() shapefiles Matlap readmat() R.matlap minitab read.mtp() foreign SAS (permanent data) read.ssd() foreign SAS (XPORT format) read.xport() foreign SPSS read.spss() foreign Stata read.dta() foreign Systat read.systat() foreign

Import objects R objects can be imported with the load( ) func+on: Usually model outputs such as YourModel.Rdata Example: load("~/desktop/yourmodel.rdata")

Reading and wri+ng data Checking and cleaning data An example on marine snails provided by Environmental Compu+ng www.environmentalcompu+ng.net

Set directory: setwd("~/desktop") Import the sample data: Checking and cleaning data Snail_data<- read.csv(file = "Snail_feeding.csv", header = T, strip.white = T, na.strings = " ")

Checking and cleaning data Use the str() command to check the status and data type of each variable: str(snail_data) 'data.frame': 769 obs. of 12 variables: $ Snail.ID: int 1 1 1 1 1 1 1 1 1 1... $ Sex : Factor w/ 4 levels "female","male",..: 2 2 4 2 2 2 2 2 2 2... $ Size : Factor w/ 2 levels "large","small": 2 2 2 2 2 2 2 2 2 2... $ Feeding : logi FALSE FALSE FALSE FALSE FALSE TRUE... $ Distance: num 0.17 0.87 0.22 0.13 0.36 0.84 0.69 0.6 0.85 0.59... $ Depth : num 1.66 1.26 1.43 1.46 1.21 1.56 1.62 162 1.96 1.93... $ Temp : int 21 21 18 19 21 21 20 20 19 19... $ X : logi NA NA NA NA NA NA... $ X.1 : logi NA NA NA NA NA NA... $ X.2 : logi NA NA NA NA NA NA...

Checking and cleaning data Use the str() command to check the status and data type of each variable: str(snail_data) 'data.frame': 769 obs. of 12 variables: $ Snail.ID: int 1 1 1 1 1 1 1 1 1 1... $ Sex : Factor w/ 4 levels "female","male",..: 2 2 4 2 2 2 2 2 2 2... $ Size : Factor w/ 2 levels "large","small": 2 2 2 2 2 2 2 2 2 2... $ Feeding : logi FALSE FALSE FALSE FALSE FALSE TRUE... $ Distance: num 0.17 0.87 0.22 0.13 0.36 0.84 0.69 0.6 0.85 0.59... $ Depth : num 1.66 1.26 1.43 1.46 1.21 1.56 1.62 162 1.96 1.93... $ Temp : int 21 21 18 19 21 21 20 20 19 19... $ X : logi NA NA NA NA NA NA... $ X.1 : logi NA NA NA NA NA NA... $ X.2 : logi NA NA NA NA NA NA...

Checking and cleaning data To get rid of the extra columns we can just choose the columns we need by using Snail_data[m, n]

Checking and cleaning data To get rid of the extra columns we can just choose the columns we need by using Snail_data[m, n] Snail_data <- Snail_data[, 1:7] # takes columns 1 to 7 str(snail_data) 'data.frame': 769 obs. of 7 variables: $ Snail.ID: int 1 1 1 1 1 1 1 1 1 1... $ Sex : Factor w/ 4 levels "female","male",..: 2 2 4 2 2 2 2 2 2 2... $ Size : Factor w/ 2 levels "large","small": 2 2 2 2 2 2 2 2 2 2... $ Feeding : logi FALSE FALSE FALSE FALSE FALSE TRUE... $ Distance: num 0.17 0.87 0.22 0.13 0.36 0.84 0.69 0.6 0.85 0.59... $ Depth : num 1.66 1.26 1.43 1.46 1.21 1.56 1.62 162 1.96 1.93... $ Temp : int 21 21 18 19 21 21 20 20 19 19...

Checking and cleaning data 'data.frame': 769 obs. of 7 variables: $ Snail.ID: int 1 1 1 1 1 1 1 1 1 1... $ Sex : Factor w/ 4 levels "female","male",..: 2 2 4 2 2 2 2 2 2 2... $ Size : Factor w/ 2 levels "large","small": 2 2 2 2 2 2 2 2 2 2... $ Feeding : logi FALSE FALSE FALSE FALSE FALSE TRUE... $ Distance: num 0.17 0.87 0.22 0.13 0.36 0.84 0.69 0.6 0.85 0.59... $ Depth : num 1.66 1.26 1.43 1.46 1.21 1.56 1.62 162 1.96 1.93... $ Temp : int 21 21 18 19 21 21 20 20 19 19...

Checking and cleaning data 'data.frame': 769 obs. of 7 variables: $ Snail.ID: int 1 1 1 1 1 1 1 1 1 1... $ Sex : Factor w/ 4 levels "female","male",..: 2 2 4 2 2 2 2 2 2 2... $ Size : Factor w/ 2 levels "large","small": 2 2 2 2 2 2 2 2 2 2... $ Feeding : logi FALSE FALSE FALSE FALSE FALSE TRUE... $ Distance: num 0.17 0.87 0.22 0.13 0.36 0.84 0.69 0.6 0.85 0.59... $ Depth : num 1.66 1.26 1.43 1.46 1.21 1.56 1.62 162 1.96 1.93... $ Temp : int 21 21 18 19 21 21 20 20 19 19... The variable Sex has 4 levels, when it should have only two ( female and male )

Checking and cleaning data You can check the levels of a factor or character with the unique() or levels() unique(snail_data$sex) [1] male males Male female Levels: female male Male males

Checking and cleaning data To turn males or Male into the correct male, you can use the the [ ]-Operator together with the which() func+on: Snail_data$Sex[which(Snail_data$Sex == "males")] <- "male Snail_data$Sex[which(Snail_data$Sex == "Male")] <- "male

Checking and cleaning data To turn males or Male into the correct male, you can use the the [ ]-Operator together with the which() func+on: Snail_data$Sex[which(Snail_data$Sex == "males")] <- "male" Snail_data$Sex[which(Snail_data$Sex == "Male")] <- "male" Or both together: Snail_data$Sex[which(Snail_data$Sex == "males" Snail_data$Sex == "Male")] <- "male"

Check if it worked using unique() unique(snail_data$sex) [1] male female Levels: female male Male males Checking and cleaning data

Check if it worked using unique() unique(snail_data$sex) [1] male female Levels: female male Male males Checking and cleaning data You can remove the extra levels using factor() Snail_data$Sex <- factor(snail_data$sex) unique(snail_data$sex) # [1] male female # Levels: female male

Checking and cleaning data The summary() func+on provides summary sta+s+cs for each variable: summary(snail_data) Snail.ID Sex Size Feeding Distance Min. : 1.00 female:384 large:383 Mode :logical Min. :0.0000 1st Qu.: 4.00 male :385 small:385 FALSE:503 1st Qu.:0.2800 Median : 8.50 NA's : 1 TRUE :266 Median :0.5100 Mean : 8.49 NA's :0 Mean :0.5125 3rd Qu.:12.00 3rd Qu.:0.7500 Max. :16.00 Max. :1.0000......... Con+nues

Checking and cleaning data The summary() func+on provides summary sta+s+cs for each variable: summary(snail_data)...... Con+nued Depth Temp Min. : 1.000 Min. :18.00 1st Qu.: 1.260 1st Qu.:19.00 Median : 1.510 Median :19.00 Mean : 1.716 Mean :19.49 3rd Qu.: 1.760 3rd Qu.:20.00 Max. :162.000 Max. :21.00 NA's :6

Checking and cleaning data The summary() func+on provides summary sta+s+cs for each variable: summary(snail_data)...... Con+nued Depth Temp Min. : 1.000 Min. :18.00 1st Qu.: 1.260 1st Qu.:19.00 Median : 1.510 Median :19.00 Mean : 1.716 Mean :19.49 3rd Qu.: 1.760 3rd Qu.:20.00 Max. :162.000 Max. :21.00 NA's :6

Checking and cleaning data To find depths greater than 2m you can use the [ ]-Operator together with the which() func+on: Snail_data[which(Snail_data$Depth > 2), ] Snail.ID Sex Size Feeding Distance Depth Temp 8 1 male small TRUE 0.6 162 20

Checking and cleaning data To find depths greater than 2m you can use the [ ]-Operator together with the which() func+on: Snail_data[which(Snail_data$Depth > 2), ] Snail.ID Sex Size Feeding Distance Depth Temp 8 1 male small TRUE 0.6 162 20 Replace value Snail_data[8, 6] <- 1.62

Checking and cleaning data Finding and removing duplicates using duplicated() Example: duplicated(snail_data)

Checking and cleaning data Finding and removing duplicates using duplicated() duplicated(snail_data) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [15] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE [29] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [43] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE... # Cut to reduce space duplicated() shows that the 17 th row is a duplicate of an earlier row

Checking and cleaning data To remove duplicate rows you can use the [ ]-Operator together with the duplicated() func+on: Snail_data<- Snail_data[!duplicated(Snail_data), ] Or use unique() Snail_data<- unique(snail_data) Check if it worked duplicated(snail_data)

Checking and cleaning data Finding and removing duplicates using duplicated() Faster when you incorporate which() Snail_data[which(duplicated(Snail_data)), ] Snail.ID Sex Size Feeding Distance Depth Temp X X.1 X.2 17 1 male small FALSE 0.87 1.95 18 NA NA NA

Checking and cleaning data Two other opera+ons that might be useful to get an overview of your data are sort() and order()

Checking and cleaning data Two other opera+ons that might be useful to get an overview of your data are sort() and order() Sor+ng single vectors sort(snail_data$depth) [1] 1.00 1.00 1.00 1.00 1.00 1.01 1.01 1.01 1.01 1.01 1.02 1.02... [18] 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.04 1.04... [35] 1.05 1.05 1.05 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.07... [52] 1.07 1.07 1.07 1.08 1.08 1.08 1.09 1.09 1.09 1.09 1.09 1.09...... # Cut to reduce space

Checking and cleaning data Two other opera+ons that might be useful to get an overview of your data are sort() and order() Sor+ng data frames Snail_data[order(Snail_data$Depth, Snail_data$Temp), ] Snail.ID Sex Size Feeding Distance Depth Temp 494 11 female small FALSE 0.76000 1.00 18 607 13 female large FALSE 0.45000 1.00 18 86 2 male small FALSE 0.09000 1.00 19 239 5 male large TRUE 0.03000 1.00 20 511 11 female small TRUE 0.62000 1.00 21... # Cut to reduce space

Checking and cleaning data Two other opera+ons that might be useful to prepare your data are sort() and order() Sor+ng data frames in decreasing order Snail_data[order(Snail_data$Depth, Snail_data$Temp, decreasing=true), ] Snail.ID Sex Size Feeding Distance Depth Temp 762 16 female large FALSE 0.92000 2.00 21 412 9 female small TRUE 0.48000 2.00 19 37 1 male small FALSE 0.67000 2.00 18 155 4 male small FALSE 0.38000 2.00 18 434 11 female small FALSE 0.49000 2.00 18... # Cut to reduce space

Large data frames Checking and cleaning data Snail.ID Sex Size Feeding Distance Depth Temp 762 16 female large FALSE 0.92000 2.00 21 412 9 female small TRUE 0.48000 2.00 19 37 1 male small FALSE 0.67000 2.00 18 155 4 male small FALSE 0.38000 2.00 18 434 11 female small FALSE 0.49000 2.00 18... # Cut to reduce space [ reached getop+on("max.print") -- omied 626 rows ]

Checking and cleaning data To get an overview of an object use or head() or tail() Examples: head(snail_data, n = 4) # returns first 4 rows of Snail_data Snail.ID Sex Size Feeding Distance Depth Temp 1 1 male small FALSE 0.17 1.66 21 2 1 male small FALSE 0.87 1.26 21 3 1 male small FALSE 0.22 1.43 18 4 1 male small FALSE 0.13 1.46 19

Checking and cleaning data To get an overview of an object use or head() or tail() Examples: tail(snail_data, n = 4) # returns last 4 rows of Snail_data Snail.ID Sex Size Feeding Distance Depth Temp 766 16 female large TRUE 0.65 1.71 20 767 16 female large TRUE 0.46 1.27 19 768 16 female large FALSE 0.36 1.28 21 769 16 female large FALSE 0.42 1.82 19

Checking and cleaning data To get an overview of an object use or head() or tail() Examples: head() and order() combined head(snail_data[order(snail_data$depth),], n=10) # returns first 10 rows of Snail_data with increasing depth

Export data To export data use the write.table() or write.csv() func+ons Check?read.table or?read.csv write.table(x, file = " ", append = FALSE, quote = TRUE, sep = " ", eol = "\n", na = "NA", dec = ".", row.names = TRUE, col.names = TRUE, qmethod = c("escape", "double"), fileencoding = "")

Export data To export data use the write.table() or write.csv() func+ons Example: write.csv(snail_data, # object you want export file = "Snail_data_checked.csv", # file name saved row.names = FALSE) # exclude row names

Export objects To export R objects, such as model outputs, use the func+on save() Example: save(my_t-test, file = "T-test_master_thesis.Rdata")

Export data At the end use rm() to clean the R environment rm(list=ls()) # will remove all objects from the memory Distance Snail.ID Size large 21 762

Why do this in R? You can follow which changes are made Set up a script already when only part of the data is available It is quick to run the script again on the full data set

Which R func+ons did we learn? read.table() read.csv() is.na() load() str() summary() duplicated() unique() which() sort() order() head() tail() Import data saved as text file (.txt) or as comma separated format (.csv) into R indicates which elements are missing (NA) reloads datasets wrien with the func+on save provides an overview of an object returns basic sta+s+cal summary for variables indicates which rows are duplicates removes duplicate elements allows to select specific elements from a data frame sorts values in a vector or a factor sorts data by a set of variables at the same +me returns the first records of an object returns the last records of an object

Which R func+ons did we learn? write.table() write.csv() save() rm() rm(list=ls()) save a data frame as text file (.txt) or as comma separated format (.csv) into R writes an external representa+on of R objects removes specific or all objects from working environment