An Introduction to R- Programming

Similar documents
Introduction to R Commander

An Introduction to the R Commander

R for IR. Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University

LAB #1: DESCRIPTIVE STATISTICS WITH R

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N.

Stata: A Brief Introduction Biostatistics

STAT 571A Advanced Statistical Regression Analysis. Introduction to R NOTES

Lab 1: Getting started with R and RStudio Questions? or

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

Lecture 1: Getting Started and Data Basics

A brief introduction to R

IN-CLASS EXERCISE: INTRODUCTION TO R

R Short Course Session 1

Chapter 1 Introduction to using R with Mind on Statistics

Why use R? Getting started. Why not use R? Introduction to R: It s hard to use at first. To perform inferential statistics (e.g., use a statistical

Statistics for Biologists: Practicals

Experimental epidemiology analyses with R and R commander. Lars T. Fadnes Centre for International Health University of Bergen

An Introduction to Using R

Introduction to R Software

Introduction to R: Using R for statistics and data analysis

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Introduction to R. Introduction to Econometrics W

Short Introduction to R

Instruction: Download and Install R and RStudio

Introduction to scientific programming in R

7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel?

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Outline. Mixed models in R using the lme4 package Part 1: Introduction to R. Following the operations on the slides

Module 1: Introduction RStudio

Introduction to Data Science

Tutorial: SeqAPass Boxplot Generator

Computer lab 2 Course: Introduction to R for Biologists

Introduction to Statistics using R/Rstudio

Minitab Notes for Activity 1

Creating a data file and entering data

8. MINITAB COMMANDS WEEK-BY-WEEK

A whirlwind introduction to using R for your research

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

Introduction to Stata: An In-class Tutorial

Introduction to R: Using R for statistics and data analysis

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first

Using R for statistics and data analysis

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

Introduction to R Statistical Package. Eng. Mohammad Khalaf Dep. of Statistics

Spatial Ecology Lab 2: Data Analysis with R

Getting Started. Slides R-Intro: R-Analytics: R-HPC:

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

STAT 20060: Statistics for Engineers. Statistical Programming with R

Install RStudio from - use the standard installation.

SISG/SISMID Module 3

2. Getting started with MLwiN

History, installation and connection

An Introductory Guide to R

Introduction to R: Part I

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.

This document is designed to get you started with using R

Applied Regression Modeling: A Business Approach

STA 570 Spring Lecture 5 Tuesday, Feb 1

STAT 113: R/RStudio Intro

Using RExcel and R Commander

POL 345: Quantitative Analysis and Politics

Intro to R. Some history. Some history

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi

STAT 540 Computing in Statistics

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017

Overview. Linear Algebra Notation. MATLAB Data Types Data Visualization. Probability Review Exercises. Asymptotics (Big-O) Review

Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

Econ Stata Tutorial I: Reading, Organizing and Describing Data. Sanjaya DeSilva

Data Analyst Nanodegree Syllabus

Introduction to R. Dataset Basics. March 2018

8.1 Come analizzare i dati: R

R package

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

CS130/230 Lecture 6 Introduction to StatView

A (very) brief introduction to R

Introduction to Minitab 1

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker,

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R

ECO375 Tutorial 1 Introduction to Stata

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

University of Wollongong School of Mathematics and Applied Statistics. STAT231 Probability and Random Variables Introductory Laboratory

WINKS SDA 7. Version 7

Introduction to R. base -> R win32.exe (this will change depending on the latest version)

ECOLOGY OF AMPHIBIANS & REPTILES. A Statistical R Primer

Intro to Stata for Political Scientists

Lab 1: Introduction, Plotting, Data manipulation

Yet Another R FAQ, or How I Learned to Stop Worrying and Love Computing 1. Roger Koenker CEMMAP and University of Illinois, Urbana-Champaign

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant.

Computing With R Handout 1

ST Lab 1 - The basics of SAS

Introduction to R. Daniel Berglund. 9 November 2017

WELCOME! Lecture 3 Thommy Perlinger

Transcription:

An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

What is R? Open source programming language and software environment for statistical computing and graphics R is an implementation of the S programming language S was created by John Chambers while at Bell Labs R was created by Ross Ihaka and Robert Gentleman R is named partly after the first names of the first two R authors and partly as a play on the name of S

Why Use R? FREE and Open Source Strong user Community Highly extensible, flexible Implementation of high end statistical methods Flexible graphics and intelligent defaults Runs on Windows, Mac OS, and Linux/UNIX platforms

Then, why is not everyone using R??? Difficult, but NOT FOR YOU Command- (not menu-) driven. No commercial support, means you need to look for solutions your self, which can be very frustrating Easy to make mistakes and not know Data prep and cleaning can be messier and more mistake prone in R vs. SPSS or SAS

Survey Asking Researched their Primary Data Analysis Tool? https://r4stats.wordpress.com/articles/popularity/

Let us start learning R

Learning R Piece of cake

Downloading R https://cran.r-project.org/

The R Environment- R Command Window

The R Environment- R Command Window R command window (console) or Graphical User Interface (GUI) Used for entering commands, data manipulations, analyses, graphing Output: results of analyses, queries, etc. are written here Toggle through previous commands by using the up and down arrow keys

The R Environment- R Command Window The prompt > indicates that R is your commands R is case sensitive (y Y) In the Console, use the up and down arrows to scroll back to previous code

The R Environment- R Workspace Current working environment Comprised primarily of variables, datasets, functions

R as Calculator 2+2 2*2 2*100/2 2^10 2*5^2 X<-2*5^2 X In R <- indicates that you making an assignment. This will come up often, particularly with data manipulation

Operations in R Comparison operators Purpose Example == Equal 1==1 returns TRUE!= Not equal 1!=1 returns FALSE < > Great/less than 1<1 returns FALSE >= <= Greater/less than or equal 1<=1 returns TRUE Logical operators & And must meet both condition 1==1 & 1<1 returns FALSE (above backslash key) Or only needs to meet 1 condition 1==1 1<1 returns TRUE

The R Environment- R Script A text file containing commands that you would enter on the command line of R To place a comment in a R script, use a hash mark (#) at the beginning of the line

The R Environment- R Script In the R console, you can create a new script (from the file menu New Script ) and write all of your R code there This allows you to save your code (but not output) for later To place a comment in a R script, use a hash mark (#) at the beginning of the line

The R Environment- R Script

The R Environment- R Script

The R Environment- R Script

The R Environment- R Script When working from your script, you can highlight sections of code and press F5 to automatically get the code to run in the R console When saving your script, type the extension.r after the file name so that your computer recognizes that it is an R file (e.g., Hadeel.R) You can then open previously saved scripts to re-run or modify your code

Saving Output (Results) Saving output using R s menu options can be annoying One option is to copy and paste (into a Word document) the output that you need while working Problem with waiting until the end: You end up copying a bunch of code and output you may not need (e.g., you ll be copying all your errors) At some point, the R console window fills up and you can t access your earlier work When copying and pasting your R output, you can change the font to Courier New and the output will line up and look pretty

Objects in R Everything that you work within R is an object Types of objects include vectors, matrices, and data frames To see which objects are available in the workspace (i.e., R memory) use the command ls() or objects() You can remove objects with the rm()function The class() function tells you the type of object

Objects in R- Vectors An ordered collection of numerical, categorical, complex or logical objects vec1<-1:10 vec1 This is putting numbers 1 through 10 into a vector called, vec1 [1] 1 2 3 4 5 6 7 8 9 10 class(vec1) [1] "integer"

Objects in R- Vectors vec2<-letters[1:10] vec2 This is putting letters A (1st letter) through J (10th letter) into a vector called, vec2 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J class(vec2) [1] "character"

Objects in R- Matrix A matrix is a multidimensional collection of data entries of the same type Matrices have two dimensions: rows and columns mat1 <- matrix(vec1, ncol = 2, nrow = 5) Number of rows in the matrix Function to create a matrix Number of columns in the matrix What will be going into the matrix (we are putting vec1 that we created above into the matrix)

Objects in R- Matrix mat1 <- matrix(vec1, ncol = 2, nrow = 5) [,1] [,2] [1,] 1 6 [2,] 2 7 [3,] 3 8 [4,] 4 9 [5,] 5 10 class(mat1) [1] "matrix"

Objects in R- Matrix To find the dimensions (i.e., number of rows and columns) of the matrix: dim(mat1) [1] 5 2 Print the data in the first row of the matrix: mat1[1,] [1] 1 6

Objects in R- Data Frame A data.frame may be regarded as a matrix with columns that can be different modes (e.g., numeric, character) It is displayed in matrix form, by rows and columns It s like an excel spreadsheet This is primarily what we will be using when analyzing our data in R

Objects in R- Data Frame Creates a data frame from the matrix we had previously made: df1<-data.frame(mat1) df1

Objects in R- Data Frame Find class of the dataframe Find dimension of the dataframe Print first row of the dataframe Print first column of the dataframe

Objects in R- Let s Play a little with the Data Frame When referring to a variable within a data frame, you must use the $ sign Print data for variable X1 Take the mean of variable X2 in the data frame Assign the name var1 to the first variable X1 and var2 to the second variable X2 Review the objects that we have created

R Help R help is web-based each function has its own page On the bottom of each page, R gives us an example of each function?ls help(ls)?colnames?c

R Packages R is built around packages R consist of a core (that already includes a number of packages) and contributed packages programmed by user around the world Contributed packages add new functions that are not available in the core (e.g., genomic analyses) In computer terms, packages are ZIP-files that contain all that is needed for using the new functions

Downloading R Packages Two main functions when installing and loading packages: 1. install.packages() With nothing in parentheses: list all packages available to install With the name of package in parentheses: install that package. The package will live permanently on your hard drive, you don t need to install again (unless you download a newer R version) After entering this command, you will select a CRAN Mirror (a server from where package will be downloaded)

Downloading R Packages Two main functions when installing and loading packages: 2. library() With nothing in the parentheses, this will list all packages that are currently loaded Each time you open R, you must load the installed packages you would like to use by running the library command with the package name listed in the parentheses

Example Downloading Package The Introductory Statistics with R book comes with a package that contains many example datasets that we will be using Install and load the package called ISwR install.packages("iswr") library(iswr)

Built-in Datasets By default, R is also pre-loaded with a small set of datasets and each package often comes with example datasets Several commands are helpful when exploring these built-in datasets: data(): Lists available datasets help(name OF DATASET): Brief description of dataset NAME OF DATASET: Prints the entire dataset be careful!

Built-in Dataset Example Look at the dataset energy from the ISwR package help(energy) energy

Now that we understand some basic R functions.. How can I work with my dataset???

Reading Data into R Reading data into R using the function read.csv. Reads in data that are in the comma delimited format Excel spreadsheet containing data example: fev1.csv The dataset contains variables on subject number (variable name=subject), forced expiratory volume (variable name=fev1), and gender (variable name=gender)

How to know if my dataset is in CSV format? If your data are in excel, you can save into a commadelimited format Use saving as and selecting CSV (Comma Delimited)

Now how will R now where the dataset is? Tell R the location of the data you will be reading in First know the working directory by using getwd() command To change working directory

Now you can read your Dataset In the flash memory handed please save the file fev1.csv in your working directory (eg. Documents) Read the file: fev<-read.csv("fev1.csv", header=true) fev is the name of the object that contains our data. fev1.csv is the name of the csv file we created. header=true indicates that the first row of data are variable names.

Attributes of Datasets and Variables Function For datasets dim names head tail class View For variables is.numeric is.character is.factor Purpose Gives dimensions of data (#rows, #columns) Lists the variable names of data Lists first 6 rows of data Lists last 6 rows of data Lists class of the object (e.g., data frame) Displays data like a spreadsheet Returns TRUE or FALSE if variable is numeric Returns TRUE or FALSE if variable is character Returns TRUE or FALSE if variable is factor

Exercise Exploring fev Dataset What are the dimensions of the dataset? Print the dataset What class is the dataset? What are the variable names of the dataset? Look at the first few and last rows of the data Print the variable fev1 What type of variable is fev1? What about gender?

Now let us actually run some statistics in R

Descriptive Statistics in R Has functions for all common statistics summary() gives lowest, mean, median, first, third quartiles, highest for numeric variables table() gives tabulation of categorical variables Many other functions to summarize data

Summarizing Data in R We will be summarizing the FEV dataset using: Means, SD, ranges, quartiles, histograms, boxplots, proportions and frequencies

Functions to Describe Variables Function Continuous variables mean(data$var) sd(data$var) summary(data$var) min(data$var) max(data$var) range(data$var) median(data$var) hist(data$var) boxplot(data$var) Categorical/Binary variables table(data$var) prob.table(table(data$var)) Purpose Mean Standard deviation Min, max, q1, q3, median, mean Minimum Maximum Min & Max Median Histogram Boxplot Gives frequency of different level of the variable Gives proportion of different level of the variable

Exercise Summarizing Data in fev Dataset What is the mean, SD, median, min, and max of fev1? Use two different plots to look at the distribution of fev1. Is it normally distributed? What is the frequency of each gender? What is the proportion of each gender? How does the proportion of gender 1 compare to the proportion of gender 2? What does the histogram of gender look like?

Statistical Modeling in R So many modeling functions: e.g. lm, glm, aov, ts Numerous libraries and packages: survival, coxph, nls, Distinction between factors and regressors Factors: categorical, regressors: continuous Must specify factors unless they are obvious to R

Statistical Modeling in R Specify your model like this: y ~ xi+ci (VERY simplified) y = outcome variable, xi = main explanatory variables, ci = covariates, + = add terms

Example for Linear Regression Read new dataset lbwt<-read.csv("lbwt.csv", header=true)

Example: Linear Regression with Continuous Predictor First we usually test correlations Scatterplot: plot(lbwt$headcirc~lbwt$gestage) #simple plot plot(lbwt$headcirc~lbwt$gestage, col="red", xlab="gest. Age (weeks)", ylab="head Circumfference (cm)", main="scatterplot") Pearson correlation: cor.test(lbwt$headcirc,lbwt$gestage)

Example: Linear Regression with Continuous Predictor Then we run the linear regression model: lm1<-lm(headcirc~gestage, data=lbwt) summary(lm1) plot(lbwt$headcirc~lbwt$gestage, col="red", xlab="gest. Age (weeks)", ylab="head Circumfference (cm)", main="scatterplot", xlim=c(0,35), ylim=c(0,35)) abline(lm1) abline(a=3.19, b=0.78)

How to Learn More About R?

R Resources R home page: http://www.r-project.org R discussion group: http://www.stat.math.ethz.ch/mailman/listinfo/r-help Search Google for R and Statistics

Tutorials http://www.statmethods.net/stats/ http://scc.stat.ucla.edu/mini-courses http://www.ats.ucla.edu/stat/r/

Top 10 Great Books About R http://www.datasciencecentral.com/profiles/blogs/10- great-books-about-r-1

Thank you