Computing with large data sets
|
|
- Tobias McCarthy
- 5 years ago
- Views:
Transcription
1 Computing with large data sets Richard Bonneau, spring 009 (week ): introduction to R
2 other notes, courses, lectures about R and S Ingo Ruczinski and Rafael Irizarry (Johs Hopkins Biostat): Roger D. Peng (JHU): Read the manual!!: v.0480: computing with data, Richard Bonneau
3 S history S is a language and system for organizing, visualizing, and analyzing data. S started at Bell Labs since 976. The language has evolved through several major versions to become the most widely used environment for research in data analysis and statistics. In 998, S became the first statistical system to receive the Software System Award, the top software award from the ACM. ( For a great account of the early history of S see the paper on the course website ) v.0480: computing with data, Richard Bonneau
4 R history and facts R is an environment for data analysis and visualization. R is an open source implementation of the S language (S-Plus is a commercial implementation of the S language). The current version of R (September 004) is.9.. The R Core group consists of Doug Bates, John Chambers, Peter Dalgaard, Rober t Gentleman, Kur t Hornik, Stefano Iacus, Ross Ihaka, Friedrich Leisch, Thomas Lumley, Mar tin Maechler, Guido Masarotto, Paul Murrell, Brian Ripley, Duncan Temple Lang, and Luke Tierney. join the R Foundation for Statistical Computing 99 Ross Ihaka and Rober t Gentleman begin work on a project that will ultimately become R. 99 Design and implementation of pre-r. 993 The first announcement of R. 995 R available by ftp under the GPL. 996 A mailing list is star ted and maintained by Martin Maechler at ETH. 997 The R core group is formed. 999 DSC meeting in Vienna, the first time many R core members meet. 000 R.0.0 is released. 009 R is still very actively developed and availiable for all platforms, open source, pervasive in bioinformatics and several other fields. v.0480: computing with data, Richard Bonneau
5 playing around < > y <- 4 <-.0 > y <- c(,4,6) * y [] 4 8 * [] 4 > y * y [] > sqrt( - ) [] NaN Warning message: In sqrt(-) : NaNs produced > sqrt(-+0i) [] 0+i v.0480: computing with data, Richard Bonneau
6 playing around > y <- :0 > y ^^ Error: syntax error > y ^ [] > y [] > y <- jitter( y ) > y [] [0] > class( y ) [] "numeric" > class( x ) [] "numeric" > length( x ) [] > length( y ) [] 0 > dim( y ) NULL > dim( x ) NULL v.0480: computing with data, Richard Bonneau
7 playing around > z <- matrix( sample(y), nrow = 5, ncol = 5) > z [,] [,] [,3] [,4] [,5] [,] [,] [3,] [4,] [5,] > dim(z) [] 5 5 > length( z ) [] 5 > summary( y ) Min. st Qu. Median Mean 3rd Qu. Max v.0480: computing with data, Richard Bonneau
8 playing around Histogram of y > hist( y ) > hist( y, nclass = 0 ) > hist( y ) > pdf("hist.l.pdf") > hist( y ) > dev.off() quartz <- :0 > y <- runif( length( x ) ) > plot( x, y ) > abline(h=0.5, lty=, col="green",lwd=) > pdf("sample-sesion.pdf") > plot( x, y ) > abline(h=0.5, lty=, col="green",lwd=) > dev.off() quartz Frequency y y x v.0480: computing with data, Richard Bonneau
9 using built in examples >? heatmap ### then cut and paste in exmples > require(graphics); require(grdevices) <- as.matrix(mtcars) > rc <- rainbow(nrow(x), start=0, end=.3) > cc <- rainbow(ncol(x), start=0, end=.3) > hv <- heatmap(x, col = cm.colors(56), scale="column", + RowSideColors = rc, ColSideColors = cc, margins=c(5,0), + xlab = "specification variables", ylab= "Car Models", + main = "heatmap(<mtcars data>,..., scale = \"column\")") map(<mtcars data>,..., scale = "column") ## mtcars is a datastructure provided as an example ## of how to use heatmap() ## > str( mtcars ) 'data.frame': 3 obs. of variables: $ mpg : num $ cyl : num $ disp: num $ hp : num $ drat: num $ wt : num $ qsec: num $ vs : num $ am : num $ gear: num $ carb: num >?mtcars ## for description of what it actually is ## v.0480: computing with data, Richard Bonneau cyl am vs carb wt drat gear qsec specification variables mpg hp disp Toyota Corona Porsche 94 Datsun 70 Volvo 4E Merc 30 Lotus Europa Merc 80 Merc 80C Mazda RX4 Wag Mazda RX4 Merc 40D Ferrari Dino Fiat 8 Fiat X 9 Toyota Corolla Honda Civic Merc 450SL Merc 450SE Merc 450SLC Dodge Challenger AMC Javelin Hornet 4 Drive Valiant Duster 360 Camaro Z8 Ford Pantera L Pontiac Firebird Hornet Sportabout Cadillac Fleetwood Lincoln Continental Chrysler Imperial Maserati Bora Car Models
10 dumping functions : example code galore > hist ### type function with no ( ) or argument function (x,...) UseMethod("hist") <environment: namespace:graphics> map(<mtcars data>,..., scale = "column") you get info, but not code if the function is part of the main R code (part of the base or core) > heatmap ### for higher level functions ### or defined functions you ### you get the code function (x, Rowv = NULL, Colv = if (symm) "Rowv" else NULL, distfun = dist, hclustfun = hclust, reorderfun = function(d, w) reorder(d, w), add.expr, symm = FALSE, revc = identical(colv, "Rowv"), scale = c("row", "column", "none"), na.rm = TRUE, margins = c(5, 5), ColSideColors, RowSideColors, cexrow = 0. + /log0(nr), cexcol = 0. + /log0(nc), labrow = NULL, labcol = NULL, main = NULL, xlab = NULL, ylab = NULL, keep.dendro = FALSE, verbose = getoption("verbose"),...) { scale <- if (symm && missing(scale)) "none" else match.arg(scale) if (length(di <- dim(x))!=!is.numeric(x)) stop("'x' must be a numeric matrix")... truncated cyl am vs carb wt drat gear qsec mpg specification variables hp disp Toyota Corona Porsche 94 Datsun 70 Volvo 4E Merc 30 Lotus Europa Merc 80 Merc 80C Mazda RX4 Wag Mazda RX4 Merc 40D Ferrari Dino Fiat 8 Fiat X 9 Toyota Corolla Honda Civic Merc 450SL Merc 450SE Merc 450SLC Dodge Challenger AMC Javelin Hornet 4 Drive Valiant Duster 360 Camaro Z8 Ford Pantera L Pontiac Firebird Hornet Sportabout Cadillac Fleetwood Lincoln Continental Chrysler Imperial Maserati Bora Car Models v.0480: computing with data, Richard Bonneau
11 R basic types / atomic classes of objects <- character() ### char, strings, vectors or both <- "test" [] "test" [] <- "test" [] <- "test" [3] <- "test" [] "test" "test" "test" > class(x) [] "character" <- numeric() ### double floats, vectors of floats <- complex() ### complex numbers complex(0) <- logical() ## logicals, can be used ## to index other objects <- > class(x) [] "numeric" <- L ### force integer > class(x) [] integer <- Inf ## infinity [] Inf <- NA ### missing values are NA or NaN [] NA > is.na( x ) ### built in functions help in dealing ### with NAs [] TRUE <- logical() logical(0) <- NaN > is.na( x ) [] TRUE v.0480: computing with data, Richard Bonneau
12 NA, NaN, empty/missing values Values can be missing for lots of good reasons. Technical: -the measurement failed (it was cloudy that night, the probe for that DNA was synthesized incorrectly) Budgetary/Social: - we could only afford to measure so many points / attributes - people will only answer 5 minutes of questions... Bugs (incorrect explicit type coercion) Values not filled in YET <- Inf ## infinity [] Inf ### this IS a number <- NA ### missing values are NA or NaN [] NA > is.na( x ) ### built in functions help in dealing ### with NAs [] TRUE > ### messed up explicit coercion <- c( "f", "fg" ) > as.numeric ( x ) [] NA NA Warning message: NAs introduced by coercion see also: is.nan(), is.null(), as.null() v.0480: computing with data, Richard Bonneau
13 R basic types vectors Integers <- : > class( x) [] "integer" <- c(l, L, 3L) [] 3 Numeric <- c(,, 3.) [] Logical <- c( TRUE, TRUE, FALSE) [] TRUE TRUE FALSE Logical from conditional statement <- c("azure", "red", "green", "red") [] "azure" "red" "green" "red" == "azure" [] TRUE FALSE FALSE FALSE <- c(,, 3.) [] <. [] TRUE TRUE FALSE Integer indexes from conditionals > which ( x <. ) [] v.0480: computing with data, Richard Bonneau
14 R basic types: vectors <- numeric( 0 ) ## a length 0 numeric vactor ## short for print(x) [] <- character( 0 ) [] "" "" "" "" "" "" "" "" "" "" [ length(x) + ] <- "a" [] "" "" "" "" "" "" "" "" "" "" "a" <- c(x, "b") [] "" "" "" "" "" "" "" "" "" "" "a" "b" > ### attributes > length ( x ) [] > names( x ) NULL > str( x ) chr [:] "" "" "" "" "" "" "" "" "" "" "a" "b" <- :5 ## loading atributes > names( x ) <- c("one", "two", "three", "four", "five") one two three four five > names( x ) [] "one" "two" "three" "four" "five" > class( x ) [] "integer" v.0480: computing with data, Richard Bonneau
15 creative ways of making nasty bugs > ## you can, but shouldn't do nutz stuff like this <- c(, "two" ) [] "" "two" > class(x ) [] "character" > y <- c(,0,true, FALSE) > y [] 0 0 > class( y ) [] "numeric" > y <- c( "true", TRUE, FALSE) ## nuts! > y [] "true" "TRUE" "FALSE" > class( y ) [] "character" > ## creative ways of writing nasty nasty bugs R variables, vectors and matrices assume the type st specified OR loaded. assigning different types later in the code will often override this initial type. for example <- :0 > example.function( x ) ## function returns a charcter vec <- length( x ) * pi <- FALSE x has been 4 types in 4 lines of code v.0480: computing with data, Richard Bonneau
16 factors... Making a factor vector > youare <- as.factor ( c("m", "F", "F", "U" ) ) > youare [] M F F U Levels: F M U > youare <- rep(, 0) > youare [] >?runif > y <- runif( 0 ) > youare[ y > 0.5 ] <- "big" > youare[ y <= 0.5 ] <- "small" > youare [] "big" "big" "big" "big" "big" "small" "big" "small" "big" "big" > as.factor(youare) [] big big big big big small big small big big Levels: big small Factors are integers with a label, but the label is stored much more efficiently (once for the whole vector of factors) Using Factors is better in that they have meaningful attributes... why say,, 3 as integers when you can say male, female, undetermined? Many functions ( functions that aim to classify instances based on vectors of mixed attributes) use factors. v.0480: computing with data, Richard Bonneau
17 forcing type conversions, explicit coercion > ### explicit coercion --- forcing the type <- character( "", "", "3", "4", "0", "0" ) Error in character("", "", "3", "4", "0", "0") : unused argument(s) ("", "3", "4", "0", "0") <- c( "", "", "3", "4", "0", "0" ) > class ( x ) [] "character" [] "" "" "3" "4" "0" "0" <- as.numeric( x ) [] > str( x) num [:6] > as.logical( x ) [] TRUE TRUE TRUE TRUE FALSE FALSE > as.complex( x ) [] +0i +0i 3+0i 4+0i 0+0i 0+0i > as.integer( x ) [] * remember, many times coercion to the type you think is a good way of checking youʼve read in OR computes what you think you have... e.g. coercion of a character to a numeric can often produce NAs that lead you to bugs. so declaring and coercion of type is a good idea even if R doesnʼt strictly require it. v.0480: computing with data, Richard Bonneau
18 coercion of matrix objects <- c(,,3,4,0,0) [] > matrix( x, ncol =, nrow = ) [,] [,] [,] 3 [,] 4 > matrix( x, ncol =, nrow = 3 ) [,] [,] [,] 4 [,] 0 [3,] 3 0 > matrix( x, ncol =, nrow = 4 ) [,] [,] [,] 0 [,] 0 [3,] 3 [4,] 4 Warning message: In matrix(x, ncol =, nrow = 4) : data length [6] is not a sub-multiple or multiple of the number of rows [4] > ### but it still did it!!!!! is this a feature or a bug waiting to happen? <- c(,,3,4,0,0) > dim(x) <- c(3,) [,] [,] [,] 4 [,] 0 [3,] 3 0 > ### but the dim has to match the length? v.0480: computing with data, Richard Bonneau
19 matrix names <- c(na, NA, ) [] NA NA > is.na(x) [] TRUE TRUE FALSE > > y <- matrix( x, ncol =, nrow = 3 ) > > dim(y ) [] 3 > y [,] [,] [,] NA NA [,] NA NA [3,] > y[ is.na(y) ] < > y [,] [,] [,] [,] [3,] > y[,] < > rownames( y ) <- c( "eq", "er", "es") > colnames( y ) <- c("qr", "rq" ) > dimnames( y ) [[]] [] "eq" "er" "es" [[]] [] "qr" "rq" > y qr rq eq er es > v.0480: computing with data, Richard Bonneau
20 matrices > ## matrix are filled starting in the upper left courner and then running down > ## the column. The first indexis the row, and the second is the > > y <- :0 > dim(y) <- c(,5) > y [,] [,] [,3] [,4] [,5] [,] [,] > dim(y) <- c(5,) > y [,] [,] [,] 6 [,] 7 [3,] 3 8 [4,] 4 9 [5,] 5 0 > dim(y) <- c(5,5) ### oops? Error in dim(y) <- c(5, 5) : dims [product 5] do not match the length of object [0] v.0480: computing with data, Richard Bonneau
21 rbind, cbind <- :0 > y <- 0: > z <- c(:5, 5:) yz <- rbind( x,y,z ) yz [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,0] x y z yz <- cbind( x,y,z ) yz x y z [,] 0 [,] 9 [3,] [4,] [5,] [6,] [7,] [8,] [9,] 9 [0,] 0 > > ## adding to a matrix one row at a time yz <-rbind( xyz, c( 3,4,5) ) yz x y z [,] 0 [,] 9 [3,] [4,] [5,] [6,] [7,] [8,] [9,] 9 [0,] 0 [,] > ## could do a similar thing with cbind() v.0480: computing with data, Richard Bonneau
22 lists Making a list > p <- list() > ### Declare a list > p$x <- > p$x <-.0 Error in p$x <- : object "p" not found > p$x <-.0 > p <- list() > p$x <- 3.0 > p$y <-.0 > p$y <-.0 > p$y <-.0 > p $x [] $y [] > p $x [] 3 $y [] Making a list of lists > all.p <- list() > all.p[[]] <- p > all.p[[]] <- p > all.p [[]] [[]]$x [] [[]]$y [] [[]] [[]]$x [] 3 [[]]$y [] Naming and accessing lists: > names( all.p ) <- c("p","p") > all.p $p $p$x [] $p$y [] $p $p$x [] 3 $p$y [] > all.p$p $x [] $y [] > all.p$p$x [] > all.p[[]]$x [] > all.p[[]][[]] [] v.0480: computing with data, Richard Bonneau
23 lists are a great way to return and pass data <- rnorm( 0,., 0.5 ) ## 0 draws from a normal N(., 0.5) > hist.x <- hist( x ) > hist.x $breaks [] $counts [] $intensities [] $density [] $mids [] $xname [] "x" $equidist [] TRUE Frequency Histogram of rnorm(000,., 0.5) attr(,"class") [] "histogram" > class(hist.x ) [] "histogram" ## so it is not ʻjustʼ a list ## more on that later rnorm(000,., 0.5) v.0480: computing with data, Richard Bonneau
24 a strange thing lists do... name autocompletion <- rnorm( 0,., 0.5 ) ## 0 draws from a normal N(., 0.5) > hist.x <- hist( x ) > hist.x $breaks [] $counts [] $intensities [] $density [] $mids [] $xname [] "x" $equidist [] TRUE attr(,"class") [] "histogram" > class(hist.x ) [] "histogram" > ## so it is not ʻjustʼ a list > ## more on that later v.0480: computing with data, Richard Bonneau
25 dataframes v.0480: computing with data, Richard Bonneau Data.frames are tables of data, most of the time you get them by reading in tab delimited tables or flat files, read.table() Letʼs look at an example data.frame most installs of R should have loaded. > class( USJudgeRatings ) [] "data.frame" > str(usjudgeratings) 'data.frame': 43 obs. of variables: $ CONT: num $ INTG: num $ DMNR: num $ DILG: num $ RTEN: num > pairs( USJudgeRatings[,,5] ) ## this function knows ## what to do with a ## dataframe > USJudgeRatings$CONT [] > USJudgeRatings[,] [] CONT INTG DMNR DILG CFMG
26 dataframes coerce a data.frame to a matrix <- as.matrix ( USJudgeRatings ) > str( x ) num [:43, :] attr(*, "dimnames")=list of..$ : chr [:43] "AARONSON,L.H." "ALEXANDER,J.M." "ARMENTANO,A.J." "BERDON,R.I.".....$ : chr [:] "CONT" "INTG" "DMNR" "DILG"... v.0480: computing with data, Richard Bonneau
27 reading in code # ~bonneau/v-class/ > cat mean.vec.r ## function to report the mean of a vector mean.vec <- function ( x, na.remove = T ) { } if ( class( x ) == "numeric" class( x) == "integer") { return( mean(x, na.rm = na.remove) ) } else { return( NULL ) ## we could also return a NA } # ~bonneau/v-class/ > R...R startup... >? source >source( file = mean.vec.r ) ## you might need a path... > mean.vec( c(,3) ).5 > mean.vec( c(, 3, NA) ).5 > mean.vec( c( ps, qs ) ) NULL > v.0480: computing with data, Richard Bonneau
28 homework and reading for next time. Read the R manual.. non-graded homework: Make a function that: given a matrix returns a vector containing the means of each row given a list of numeric vectors returns the mean of each vector in the list for test data either use the link to small test expression matrix or use a built in R data object ( like volcano ): > dim( volcano ) [] 87 6 > str( volcano ) num [:87, :6] > dim(volcano ) [] 87 6 >? image use loops, don t worry about NAs for now, this is not a graded assignment, but give it a try to get your feet wet. if you want a hint stay after class next lecture we ll play with plotting and graphics. If you re confused there will be time to catch up next week. v.0480: computing with data, Richard Bonneau
Basic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1)
Basic R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-3_basic_r.html#(1) 1/21 Preliminary R is case sensitive: a is not the same as A.
More informationIntroduction for heatmap3 package
Introduction for heatmap3 package Shilin Zhao April 6, 2015 Contents 1 Example 1 2 Highlights 4 3 Usage 5 1 Example Simulate a gene expression data set with 40 probes and 25 samples. These samples are
More informationQuick Guide for pairheatmap Package
Quick Guide for pairheatmap Package Xiaoyong Sun February 7, 01 Contents McDermott Center for Human Growth & Development The University of Texas Southwestern Medical Center Dallas, TX 75390, USA 1 Introduction
More informationIntroduction to R: Day 2 September 20, 2017
Introduction to R: Day 2 September 20, 2017 Outline RStudio projects Base R graphics plotting one or two continuous variables customizable elements of plots saving plots to a file Create a new project
More informationPart 1: Getting Started
Part 1: Getting Started 140.776 Statistical Computing Ingo Ruczinski Thanks to Thomas Lumley and Robert Gentleman of the R-core group (http://www.r-project.org/) for providing some tex files that appear
More informationObjects, Class and Attributes
Objects, Class and Attributes Introduction to objects classes and attributes Practically speaking everything you encounter in R is an object. R has a few different classes of objects. I will talk mainly
More informationGraphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley
Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Base Graphics 2 Graphics in R Traditional
More informationMBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R
MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic
More informationChapter 7. The Data Frame
Chapter 7. The Data Frame The R equivalent of the spreadsheet. I. Introduction Most analytical work involves importing data from outside of R and carrying out various manipulations, tests, and visualizations.
More informationgetting started in R
Garrick Aden-Buie // Friday, March 25, 2016 getting started in R 1 / 70 getting started in R Garrick Aden-Buie // Friday, March 25, 2016 INFORMS Code & Data Boot Camp Today we ll talk about Garrick Aden-Buie
More informationIntroduction to the R Language
Introduction to the R Language Data Types and Basic Operations Starting Up Windows: Double-click on R Mac OS X: Click on R Unix: Type R Objects R has five basic or atomic classes of objects: character
More informationIntroduction to Huxtable David Hugh-Jones
Introduction to Huxtable David Hugh-Jones 2018-01-01 Contents Introduction 2 About this document............................................ 2 Huxtable..................................................
More informationenote 1 1 enote 1 Introduction to R Updated: 01/02/16 kl. 16:10
enote 1 1 enote 1 Introduction to R Updated: 01/02/16 kl. 16:10 enote 1 INDHOLD 2 Indhold 1 Introduction to R 1 1.1 Getting started with R and Rstudio....................... 3 1.1.1 Console and scripts............................
More informationPackage TROM. August 29, 2016
Type Package Title Transcriptome Overlap Measure Version 1.2 Date 2016-08-29 Package TROM August 29, 2016 Author Jingyi Jessica Li, Wei Vivian Li Maintainer Jingyi Jessica
More informationIntroduction to R: Data Types
Introduction to R: Data Types https://ivanek.github.io/introductiontor/ Florian Geier (florian.geier@unibas.ch) September 26, 2018 Recapitulation Possible workspaces Install R & RStudio on your laptop
More informationWill Landau. January 24, 2013
Iowa State University January 24, 2013 Iowa State University January 24, 2013 1 / 30 Outline Iowa State University January 24, 2013 2 / 30 statistics: the use of plots and numerical summaries to describe
More informationThe R statistical computing environment
The R statistical computing environment Luke Tierney Department of Statistics & Actuarial Science University of Iowa June 17, 2011 Luke Tierney (U. of Iowa) R June 17, 2011 1 / 27 Introduction R is a language
More informationThe xtablelist Gallery. Contents. David J. Scott. January 4, Introduction 2. 2 Single Column Names 7. 3 Multiple Column Names 9.
The xtablelist Gallery David J. Scott January 4, 2018 Contents 1 Introduction 2 2 Single Column Names 7 3 Multiple Column Names 9 4 lsmeans 12 1 1 Introduction This document represents a test of the functions
More informationWEEK 13: FSQCA IN R THOMAS ELLIOTT
WEEK 13: FSQCA IN R THOMAS ELLIOTT This week we ll see how to run qualitative comparative analysis (QCA) in R. While Charles Ragin provides a program on his website for running QCA, it is not able to do
More informationData Structures STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley
Data Structures STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Data Types and Structures To make the
More informationHandling Missing Values
Handling Missing Values STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Missing Values 2 Introduction
More informationPackage d3heatmap. February 1, 2018
Type Package Package d3heatmap February 1, 2018 Title Interactive Heat Maps Using 'htmlwidgets' and 'D3.js' Version 0.6.1.2 Date 2016-02-23 Maintainer ORPHANED Description Create interactive heat maps
More informationData types and structures
An introduc+on to Data types and structures Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 3 Review GeFng started with R Crea+ng Objects Data types in R Data structures in R
More informationStat 241 Review Problems
1 Even when things are running smoothly, 5% of the parts produced by a certain manufacturing process are defective. a) If you select parts at random, what is the probability that none of them are defective?
More informationCreate Awesome LaTeX Table with knitr::kable and kableextra Hao Zhu
Create Awesome LaTeX Table with knitr::kable and kableextra Hao Zhu 2017-10-31 Contents Overview 2 Installation 2 Getting Started 2 LaTeX packages used in this package...................................
More informationDescription/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources
R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org
More informationLecture 09: Feb 13, Data Oddities. Lists Coercion Special Values Missingness and NULL. James Balamuta STAT UIUC
Lecture 09: Feb 13, 2019 Data Oddities Lists Coercion Special Values Missingness and NULL James Balamuta STAT 385 @ UIUC Announcements hw03 slated to be released on Thursday, Feb 14th, 2019 Due on Wednesday,
More informationAdvances in integrating statistical inference
Nicos Angelopoulos 1 Samer Abdallah 2 and Georgios Giamas 1 1 Department of Surgery and Cancer, Division of Cancer, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 ONN, UK.
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationMatrix algebra. Basics
Matrix.1 Matrix algebra Matrix algebra is very prevalently used in Statistics because it provides representations of models and computations in a much simpler manner than without its use. The purpose of
More informationCreate Awesome LaTeX Table with knitr::kable and kableextra
Create Awesome LaTeX Table with knitr::kable and kableextra Hao Zhu 2018-01-15 Contents Overview 3 Installation 3 Getting Started 3 LaTeX packages used in this package...................................
More informationExtremely short introduction to R Jean-Yves Sgro Feb 20, 2018
Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Contents 1 Suggested ahead activities 1 2 Introduction to R 2 2.1 Learning Objectives......................................... 2 3 Starting
More informationRegression Models Course Project Vincent MARIN 28 juillet 2016
Regression Models Course Project Vincent MARIN 28 juillet 2016 Executive Summary "Is an automatic or manual transmission better for MPG" "Quantify the MPG difference between automatic and manual transmissions"
More informationReading and wri+ng data
An introduc+on to Reading and wri+ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 4 Course outline Review Data types and structures Reading data How should data look
More informationAccessing Databases from R
user Vignette: Accessing Databases from R Greater Boston user Group May, 20 by Jeffrey Breen jbreen@cambridge.aero Photo from http://en.wikipedia.org/wiki/file:oracle_headquarters_redwood_shores.jpg Outline
More informationIntroduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010
UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview
More informationSTAT 540 Computing in Statistics
STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External
More informationA tutorial for the sendplot R package
A tutorial for the sendplot R package Lori A. Shepherd, John A. Kirchgraber Jr., and Daniel P. Gaile January 8, 2010 Statistical Genetics and Genomics Research Group Department of Biostatistics, University
More informationPackage slam. February 15, 2013
Package slam February 15, 2013 Version 0.1-28 Title Sparse Lightweight Arrays and Matrices Data structures and algorithms for sparse arrays and matrices, based on inde arrays and simple triplet representations,
More informationThe Tidyverse BIOF 339 9/25/2018
The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,
More informationSpring 2017 CS130 - Intro to R 1 R VISUALIZING DATA. Spring 2017 CS130 - Intro to R 2
Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA Spring 2017 Spring 2017 CS130 - Intro to R 2 Goals for this lecture: Review constructing Data Frame, Categorizing variables Construct basic graph, learn
More informationIntroducion to R and parallel libraries. Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015
Introducion to R and parallel libraries Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015 Overview What is R R Console Input and Evaluation Data
More informationSML 201 Week 2 John D. Storey Spring 2016
SML 201 Week 2 John D. Storey Spring 2016 Contents Getting Started in R 3 Summary from Week 1.......................... 3 Missing Values.............................. 3 NULL....................................
More informationIntroduction to R. Le Yan HPC User LSU. Some materials are borrowed from the Data Science course by John Hopkins University on Coursera.
Introduction to R Le Yan HPC User Services @ LSU Some materials are borrowed from the Data Science course by John Hopkins University on Coursera. 3/2/2016 HPC training series Spring 2016 Outline R basics
More informationIntroduction to R 21/11/2016
Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced
More informationITS Introduction to R course
ITS Introduction to R course Nov. 29, 2018 Using this document Code blocks and R code have a grey background (note, code nested in the text is not highlighted in the pdf version of this document but is
More informationIntroduction to R. Nishant Gopalakrishnan, Martin Morgan January, Fred Hutchinson Cancer Research Center
Introduction to R Nishant Gopalakrishnan, Martin Morgan Fred Hutchinson Cancer Research Center 19-21 January, 2011 Getting Started Atomic Data structures Creating vectors Subsetting vectors Factors Matrices
More informationIntroduction to R Software
1. Introduction R is a free software environment for statistical computing and graphics. It is almost perfectly compatible with S-plus. The only thing you need to do is download the software from the internet
More informationMails : ; Document version: 14/09/12
Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary
More informationPackage slam. December 1, 2016
Version 0.1-40 Title Sparse Lightweight Arrays and Matrices Package slam December 1, 2016 Data structures and algorithms for sparse arrays and matrices, based on inde arrays and simple triplet representations,
More informationWhat R is. STAT:5400 (22S:166) Computing in Statistics
STAT:5400 (22S:166) Computing in Statistics Introduction to R Lecture 5 September 9, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowa.edu 1 What R is an integrated suite of software facilities for data
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationIntroduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW
Introduction to R for Beginners, Level II Jeon Lee Bio-Informatics Core Facility (BICF), UTSW Basics of R Powerful programming language and environment for statistical computing Useful for very basic analysis
More informationCanadian Bioinforma,cs Workshops.
Canadian Bioinforma,cs Workshops www.bioinforma,cs.ca Module #: Title of Module 2 Modified from Richard De Borja, Cindy Yao and Florence Cavalli R Review Objectives To review the basic commands in R To
More informationEPIB Four Lecture Overview of R
EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided
More informationIntroduction to R. Le Yan HPC User LSU. Some materials are borrowed from the Data Science course by John Hopkins University on Coursera.
Introduction to R Le Yan HPC User Services @ LSU Some materials are borrowed from the Data Science course by John Hopkins University on Coursera. 10/28/2015 HPC training series Fall 2015 Outline R basics
More informationProgramming with R. Bjørn-Helge Mevik. RIS Course Week spring Research Infrastructure Services Group, USIT, UiO
Programming with R Bjørn-Helge Mevik Research Infrastructure Services Group, USIT, UiO RIS Course Week spring 2014 Bjørn-Helge Mevik (RIS) Programming with R Course Week spring 2014 1 / 27 Introduction
More informationthe R environment The R language is an integrated suite of software facilities for:
the R environment The R language is an integrated suite of software facilities for: Data Handling and storage Matrix Math: Manipulating matrices, vectors, and arrays Statistics: A large, integrated set
More informationCMPSC 390 Visual Computing Spring 2014 Bob Roos Notes on R Graphs, Part 2
Notes on R Graphs, Part 2 1 CMPSC 390 Visual Computing Spring 2014 Bob Roos http://cs.allegheny.edu/~rroos/cs390s2014 Notes on R Graphs, Part 2 Bar Graphs in R So far we have looked at basic (x, y) plots
More informationR Visualizing Data. Fall Fall 2016 CS130 - Intro to R 1
R Visualizing Data Fall 2016 Fall 2016 CS130 - Intro to R 1 mtcars Data Frame R has a built-in data frame called mtcars Useful R functions length(object) # number of variables str(object) # structure of
More informationseq(), seq_len(), min(), max(), length(), range(), any(), all() Comparison operators: <, <=, >, >=, ==,!= Logical operators: &&,,!
LECTURE 3: DATA STRUCTURES IN R (contd) STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University SOME USEFUL R FUNCTIONS seq(), seq_len(), min(), max(), length(),
More informationComputing with large data sets
Computing with large data sets Richard Bonneau, spring 2009 Lecture 8(week 5): clustering 1 clustering Clustering: a diverse methods for discovering groupings in unlabeled data Because these methods don
More informationHistory and Ecology of R
History and Ecology of R Martyn Plummer International Agency for Research on Cancer ANF R avancé et performances Aussois 6 Oct 2015 Pre-history Before there was R, there was S. The S language Developed
More informationAdvanced Econometric Methods EMET3011/8014
Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer
More informationStatistical Computing (36-350)
Statistical Computing (36-350) Lecture 1: Introduction to the course; Data Cosma Shalizi and Vincent Vu 29 August 2011 Why good statisticians learn how to program Independence: otherwise, you rely on someone
More informationStatistical Programming with R
(connorharris@college.harvard.edu) CS50, Harvard University October 27, 2015 If you want to follow along with the demos, download R at cran.r-project.org or from your Linux package manager What is R? Programming
More informationACHIEVEMENTS FROM TRAINING
LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM
More informationR and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017
R and parallel libraries Introduction to R for data analytics Bologna, 26/06/2017 Outline Overview What is R R Console Input and Evaluation Data types R Objects and Attributes Vectors and Lists Matrices
More informationPackage assertr. R topics documented: February 23, Type Package
Type Package Package assertr February 23, 2018 Title Assertive Programming for R Analysis Pipelines Version 2.5 Provides functionality to assert conditions that have to be met so that errors in data used
More informationIntroduction to R. Biostatistics 615/815 Lecture 23
Introduction to R Biostatistics 615/815 Lecture 23 So far We have been working with C Strongly typed language Variable and function types set explicitly Functional language Programs are a collection of
More informationGetting started with ggplot2
Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for
More informationS CHAPTER return.data S CHAPTER.Data S CHAPTER
1 S CHAPTER return.data S CHAPTER.Data MySwork S CHAPTER.Data 2 S e > return ; return + # 3 setenv S_CLEDITOR emacs 4 > 4 + 5 / 3 ## addition & divison [1] 5.666667 > (4 + 5) / 3 ## using parentheses [1]
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationIntro to R. Some history. Some history
Intro to R Héctor Corrada Bravo CMSC858B Spring 2012 University of Maryland Computer Science http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2&pagewanted=1 http://www.forbes.com/forbes/2010/0524/opinions-software-norman-nie-spss-ideas-opinions.html
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationIntroduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center
Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center What is R? R is a statistical computing environment with graphics capabilites It is fully scriptable
More informationThis is a simple example of how the lasso regression model works.
1 of 29 5/25/2016 11:26 AM This is a simple example of how the lasso regression model works. save.image("backup.rdata") rm(list=ls()) library("glmnet") ## Loading required package: Matrix ## ## Attaching
More informationfile:///users/williams03/a/workshops/2015.march/final/intro_to_r.html
Intro to R R is a functional programming language, which means that most of what one does is apply functions to objects. We will begin with a brief introduction to R objects and how functions work, and
More informationElements of a programming language 3
Elements of a programming language 3 Marcin Kierczak 21 September 2016 Contents of the lecture variables and their types operators vectors numbers as vectors strings as vectors matrices lists data frames
More informationComputing With R Handout 1
Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution
More informationAn introduction to WS 2015/2016
An introduction to WS 2015/2016 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Prof. Dr. Martin Hutzenthaler (previously AG Metzler, now University of Duisburg-Essen) course
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationCodelink Legacy: the old Codelink class
Codelink Legacy: the old Codelink class Diego Diez October 30, 2018 1 Introduction Codelink is a platform for the analysis of gene expression on biological samples property of Applied Microarrays, Inc.
More informationAn Introduction to R- Programming
An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University
More informationTopics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics
Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics Introduction to S-Plus 1 Input: Data files For rectangular data files (n rows,
More informationDesign Challenge Team He, Khan, Kishor, Young
Design Challenge Team He, Khan, Kishor, Young We met several times as a team, and decided to try out several different visualization ideas instead of focusing on just one. We created a total of five visualizations
More informationPractical 2: Plotting
Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory
More informationR is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website:
Introduction to R R R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website: http://www.r-project.org/ Code Editor: http://rstudio.org/
More informationPackage sendplot. R topics documented: February 20, Version Date March 01, 2013
Version 4.0.0 Date March 01, 2013 Package sendplot February 20, 2015 Title Tool for sending interactive plots with tool-tip content. Author Daniel P Gaile , Lori A. Shepherd ,
More informationImplementing S4 objects in your package: Exercises
Implementing S4 objects in your package: Exercises Hervé Pagès 17-18 February, 2011 Contents 1 Introduction 1 2 Part I: Implementing the GWASdata class 3 2.1 Class definition............................
More informationBIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...
BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...
More informationVisualisation. Wolfgang Huber
Visualisation Wolfgang Huber Visualisation 1-dim. data: distributions 2-dim. data: scatterplots Overview 3-dim. data: pseudo-3d displays a few more than 2-dim: colours, drill-down, lattice, parallel coordinates
More informationBasic matrix math in R
1 Basic matrix math in R This chapter reviews the basic matrix math operations that you will need to understand the course material and how to do these operations in R. 1.1 Creating matrices in R Create
More informationPackage filematrix. R topics documented: February 27, Type Package
Type Package Package filematrix February 27, 2018 Title File-Backed Matrix Class with Convenient Read and Write Access Version 1.3 Date 2018-02-26 Description Interface for working with large matrices
More informationIntroduction to R: Part I
Introduction to R: Part I Jeffrey C. Miecznikowski March 26, 2015 R impact R is the 13th most popular language by IEEE Spectrum (2014) Google uses R for ROI calculations Ford uses R to improve vehicle
More informationWork through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident.
CDT R Review Sheet Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident. 1. Vectors (a) Generate 100 standard normal random variables,
More informationII.Matrix. Creates matrix, takes a vector argument and turns it into a matrix matrix(data, nrow, ncol, byrow = F)
II.Matrix A matrix is a two dimensional array, it consists of elements of the same type and displayed in rectangular form. The first index denotes the row; the second index denotes the column of the specified
More informationData Classes. Introduction to R for Public Health Researchers
Data Classes Introduction to R for Public Health Researchers Data Types: One dimensional types ( vectors ): - Character: strings or individual characters, quoted - Numeric: any real number(s) - Integer:
More informationA brief introduction to R
A brief introduction to R Cavan Reilly September 29, 2017 Table of contents Background R objects Operations on objects Factors Input and Output Figures Missing Data Random Numbers Control structures Background
More informationCS61C Machine Structures. Lecture 3 Introduction to the C Programming Language. 1/23/2006 John Wawrzynek. www-inst.eecs.berkeley.
CS61C Machine Structures Lecture 3 Introduction to the C Programming Language 1/23/2006 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ CS 61C L03 Introduction to C (1) Administrivia
More information