R
Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources
www.r-project.org
What is R? Free (speech/beer), open source implementation of S S is a statistical programming language developed by Bell Labs in the late 70s Cross-platform: can run anywhere Extensible: lots of libraries CRAN & Bioconductor
Bioconductor.org
Packages
Everything in R is an Object All objects have a mode and a length Modes: null NULL logical TRUE, FALSE numeric 37.22, -1.5, 6.022e23 complex -3.5+4i character 'stage',"grade","can't", 'She said, "What?".' list list(3,false,c(2,5,6),'hi there') function c, ls, mean, q Length is fairly straightforward length(null)=0
Data classes Atomic vector 1-dimensional matrix 2-dimensional array multiple dimensions time-series vector w/associated time data factor categorical data Non-atomic / Recursive list 1-dimensional data.frame
Generating lists c() function : operator : has precedence over other operators seq() function seq(1,10,by=0.1) rep() function Using random number generators E.g., rnorm()
vectors Generate by using c() test <- c(30,15,22) Generate empty vector using vector(mode,length) temp <- vector( logical,30) Vector elements can have names mydata <- c(2,1,6,6,0) names(mydata) <-c("a","b","c","d","e") > mydata a b c d e 2 1 6 6 0 > names(mydata) [1] "a" "b" "c" "d" "e"
Vectors (continued) Access vectors Numerically (start counting from 1) > mydata[1] a 2 > mydata[1:3] a b c 2 1 6 Negative indices invert selection > mydata[-c(1,5,3)] b d 1 6 By Name (if available) > mydata['b'] b 1 Logically > mydata[mydata<6] a b e 2 1 0 > mydata a b c d e 2 1 6 6 0 > mydata[c(1,5,3,1)] a e c a 2 0 6 2 > mydata[6] <NA> NA > mydata[0] named numeric(0) > mydata[] a b c d e 2 1 6 6 0 > mydata['f'] <NA> NA
Adding new element to an array arr[length(arr)+1] <- newelement
Matrices and Arrays Like vectors, but more dimensions > a <- matrix(1:6,nrow=2,ncol=3) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 Similar accession, just more commas > a[1,2] [1] 3 Functions > a[1,] [1] 1 3 5 > a[,2] [1] 3 4 > a[,2:3] [,1] [,2] [1,] 3 5 [2,] 4 6 Dimensions dim(), nrow(), ncol() Names colnames(), rownames() Matrix algebra t(), diag()
Lists Like vectors, but can have many data types Recursive: lists can contain lists can contain lists etc. a <- list(3,false,c(2,5,6),greeting='hi there',list(3,4,5)) Accession is different: Numerically Single brackets give a sublist Double-brackets give the actual contents of list > a[1] [[1]] [1] 3 > a[[1]] [1] 3 > a[1:3] [[1]] [1] 3 [[2]] [1] FALSE [[3]] [1] 2 5 6 > a[[1:3]] Error
Lists (continued) a <- list(3,false,c(2,5,6),greeting='hi there',list(3,4,5)) Accession is different: By name, if available Again, distinction between single and double brackets Object$subname construct > a['greeting'] $greeting [1] "hi there" > a[['greeting']] [1] "hi there" > a$greeting [1] "hi there" unlist() converts to regular vectors > unlist(a) greeting "3" "FALSE" "2" "5" "6" "hi there" "3" "4" "5" > unlist(list(3,4,5)) [1] 3 4 5
Lists Adding new elements lista <- c(lista, newelement)
data.frames Look like a matrix, act like a list Each row represents a particular item Columns represent different characteristics Different columns can have different modes Columns are treated as elements in a list Tabular data implicitly loaded as data.frame matrix and data.frame differences > a <- matrix(1:12,nrow=3) [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > mean(a) [1] 6.5 > b <- data.frame(1:3,4:6,7:9,10:12) X1.3 X4.6 X7.9 X10.12 1 1 4 7 10 2 2 5 8 11 3 3 6 9 12 > mean(b) X1.3 X4.6 X7.9 X10.12 2 5 8 11 Must use apply() or unlist()
Categorical Data Factors: distinct character string values representing a particular category e.g. species name or treatment type > iris[1,5] [1] setosa Levels: setosa versicolor virginica Can be ordered Small, medium, large > drink.size[1] [1] medium Levels: small < medium < large For more help:?factor
Naming Your Objects Allowed characters: A-Z, a-z, 0-9,., _ Names cannot begin with number,., or _ Valid names include: test36, tumor.stage, TEMP Invalid names:.temp, 2z R is case-sensitive Num.samples num.samples Reserved names FALSE, Inf, NA, NaN, NULL, TRUE break, else, for, function, if, in, next, repeat, return, while
Type casting as.character() as.integer() etc.
Basic Functions Information ls(), ls.str(), dir(), getwd(), setwd(), args(),? or help(), apropos() Arithmetic + - / ^ %/% (integer divide) %% (modulo) % % (matrix multiply) Logical! & && < > <= >= ==!= Use & and for logical vector comparison is.na(), is.nan(), is.null()
Vector arithmetic Different then real vector arithmetic You can add vectors of different sizes The smaller vector is recycled, i.e., repeated For example, you can add a scalar to a vector The scalar will be added to all the elements of the vector one by one.
Basic Functions, continued Sums and products sum(), prod() Use na.rm=true if data contain NAs Trig, logarithms, and other math functions abs(), sign(), sqrt() sin(), cos(), tan(), asin(), acos(), atan(), sinh(), cosh(), tanh() exp(), log() (base e), log10(), log2(), logb(x,base=b) Rounding round(), signif(), ceiling(), floor(), trunc()
Still more basic functions Sorting and ordering sort(), rank(), order(); rev() Sequence generation seq(), rep() Location which(), match(), unique(), duplicated()
Stats Descriptive statistics (na.rm = TRUE) mean(), median(), var(), sd(), cor(), cov(), quantile(), summary() Extrema (na.rm = TRUE) min(), max(), range(), cummin(), cummax() Generating distributions Functions are of type xnorm() If x is r, generates random numbers If x is p, gives cumulative probability If x is d, gives probability If x is q, translates quantiles to numbers
Distributions, Continued Discrete Continuous Poisson Binomial Negative binomial rpois() rbinom() rnbinom() Uniform runif() Normal rnorm() Log-Normal rlnorm() Beta rbeta() Gamma rgamma() Weibull rweibull() Cauchy rcauchy() Student's t rt() (not rstudent()) F rf() 2 rchisq() Exponential rexp()
I/O read.table() readlines() save() load() source() sink()
LOGICAL if(condition) expression Control Flow if(condition) expression else alternate expression switch(expr, alt1=result1, alt2=result2,, default) LOOPING for( x in sequence ) expression while( condition ) expression repeat expression break next exits from loop skips rest of code for current iteration moves on to next iteration
Grouping expressions Expressions are grouped using { }
Plotting plot boxplot stripchart pairs hist image contour 2-d plotting Graphical Summarization More detailed Correlation analysis Histogram 2-d rep. of 3-d data. Contour plot
Plot
boxplot(count~spray,data=insectsprays)
Stripchart
Pairs
Image
Contour
Further Resources help.start(), help menu r-project.org documentation section Lists useful books