Session 3 Nick Hathaway;

Size: px
Start display at page:

Download "Session 3 Nick Hathaway;"

Transcription

1 Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats Manipulating data in table Piping Summarizing data Filtering data Part 1. Excercises 12 Plotting 12 ggplot2 Basics Modifying colors Changing point shape and line types Changing plot aspect not dependent on input data Switching to another layer type Controlling plotting order using factors Saving plots Part 2. Excercises 39 Manipulating Data frames and matrices The readr package reads in data as what is called a tibble which is different from the default R data.frame. The tibble class was invented to be more efficient and more user friendly than the data.frame but one major difference that trips up most people use to data.frame is that the tibble class doesn t allow rownames. While this doesn t make a big difference for most uses of the data.frame class there are instances when you need rownames for the matrix class. Below is how you would read in data that has rownames and then convert to a matrix and add the rownames. library(tidyverse) ts = read_tsv(".series.data.txt") ts_mat = as.matrix(ts[, 2:ncol(ts)]) rownames(ts_mat) = ts$x1 1

2 Converting to long vs wide formats Tidyr The tidyr package is about making your data.frames tidy. Now what is meant by tidy? There are considered two ways to organize data tables. One is referred as wide format where each cell is a different observation and you have row and column names to explain what those observations are. The other format is called long format and this format is that every column is a different variable and each row is a different observation and this long format is the format that R is the best at for organizing. tidyr is all about switching between the two formats. gather gather() will take a table in wide format and change it into long format. It takes four important arguments, 1) the data.frame to work on, 2) the name of a new column that contain the old column names, 3) the name of new column to contain the observation that were spread out in the column table, 4) the column indexes to gather together. ts = read_tsv(".series.data.txt") #rename first column colnames(ts)[1] = "gene" # or the rename function can also be used ts = read_tsv(".series.data.txt") ts = rename(ts, gene = X1) ts # A tibble: 25,87 x 2 gene Ctrl_h Lps_1h Lps_2h Lps_4h Lps_6h Lps_12h <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 A1BG e+ 3.65e+ 3.28e+ 2.78e+ 3.59e+ 2 A1BG-~ e+ 1.92e+ 1.9e+ 1.24e+ 2.12e+ 3 A1CF e e-2. 4 A2M e e e e e+2 5 A2M-A~ e+ 1.e+ 1.66e+ 9.62e e-1 6 A2ML e-2 1.e e e-2 3.8e-2 7 A2MP e e-2 1.2e-1 4.4e-2 8 A3GAL~ e A4GALT e+ 4.22e+ 9.77e+ 7.95e+ 7.7e+ 1 A4GNT e-1 3.e-2 5.e-2 3.e-2. #... with 25,797 more rows, and 13 more variables: # Lps_24h <dbl>, R848_1h <dbl>, R848_2h <dbl>, # R848_4h <dbl>, R848_6h <dbl>, R848_12h <dbl>, # R848_24h <dbl>, Ifnb_1h <dbl>, Ifnb_2h <dbl>, # Ifnb_4h <dbl>, Ifnb_6h <dbl>, Ifnb_12h <dbl>, # Ifnb_24h <dbl> ts_gat = gather(ts, Condition,, 2:ncol(ts) ) ts_gat # A tibble: 49,333 x 3 gene Condition <chr> <chr> <dbl> 2

3 Figure 1: 3

4 1 A1BG Ctrl_h A1BG-AS1 Ctrl_h A1CF Ctrl_h.54 4 A2M Ctrl_h A2M-AS1 Ctrl_h A2ML1 Ctrl_h A2MP1 Ctrl_h A3GALT2 Ctrl_h A4GALT Ctrl_h A4GNT Ctrl_h.12 #... with 49,323 more rows Figure 2: spread The opposite of the gather() function is the spread() function which can be used to undo the gather() ts_gat_sp = spread(ts_gat, Condition, ) ts_gat_sp # A tibble: 25,87 x 2 gene Ctrl_h Ifnb_12h Ifnb_1h Ifnb_24h Ifnb_2h Ifnb_4h <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 4

5 Figure 3: 5

6 1 A1BG 5.41e e e+ 6.6e+ 2 A1BG-~ 1.72e e e+ 8.65e-1 3 A1CF 5.4e-2. 5.e A2M 7.8e e e e+2 5 A2M-A~ 1.38e e e-1 7.e-2 6 A2ML1 2.75e e e-2 2.e-2 7 A2MP1 3.29e A3GAL~ 2.29e A4GALT 3.85e e e e-1 1 A4GNT 1.2e #... with 25,797 more rows, and 13 more variables: # Ifnb_6h <dbl>, Lps_12h <dbl>, Lps_1h <dbl>, # Lps_24h <dbl>, Lps_2h <dbl>, Lps_4h <dbl>, # Lps_6h <dbl>, R848_12h <dbl>, R848_1h <dbl>, # R848_24h <dbl>, R848_2h <dbl>, R848_4h <dbl>, # R848_6h <dbl> separate tidyr also has functions for manipulating columns into multiple columns, the separate function ts_gat = separate(ts_gat, Condition, c("exposure", "") ) ts_gat # A tibble: 49,333 x 4 gene exposure <chr> <chr> <chr> <dbl> 1 A1BG Ctrl h A1BG-AS1 Ctrl h A1CF Ctrl h.54 4 A2M Ctrl h A2M-AS1 Ctrl h A2ML1 Ctrl h A2MP1 Ctrl h A3GALT2 Ctrl h A4GALT Ctrl h A4GNT Ctrl h.12 #... with 49,323 more rows unite The opposite of the separate function is the the unite function ts_gat = unite(ts_gat, Condition, ts_gat exposure, ) # A tibble: 49,333 x 3 gene Condition <chr> <chr> <dbl> 1 A1BG Ctrl_h A1BG-AS1 Ctrl_h A1CF Ctrl_h.54 4 A2M Ctrl_h 78. 6

7 5 A2M-AS1 Ctrl_h A2ML1 Ctrl_h A2MP1 Ctrl_h A3GALT2 Ctrl_h A4GALT Ctrl_h A4GNT Ctrl_h.12 #... with 49,323 more rows Manipulating data in table The library dplyr can be used to manipulate the data within the table themselves while tidyr is more for reorganization ### Mutating Columns types The mutate function can be used to either create new columns or change current columns. Here I also take advantage of the gsub function, which takes three arguments, 1) a pattern to replace, 2) what to replace the pattern with, 3) what to do the replacement on library(dplyr) ts_gat = mutate(ts_gat, = as.numeric(gsub("h", "", ) ) ) Piping It is common practice with tidyverse functions to use something called piping which is using the results of function call and using that as input to the next function without saving that result in an intermediate variable, this allows for much more efficient processing of the data as not as much memory is used by the computer. This piping is accomplished by the %>% operator, which a keyboard shortcut is hitting the command+shift+m keys together (control+shirt+m for windows or ubuntu). The pipe operator takes what is given to it and places that as the first argument in the next function (e.g. mean(x) == x %>% mean() ) Below is a diagram. x = 1:1 mean(x) [1] 5.5 x %>% mean() [1] 5.5 library(tidyverse) #this will load readr, dplyr, and tidyr ts_longformat = read_tsv(".series.data.txt") ts_longformat = rename(ts_longformat, gene = X1) ts_longformat = gather(ts_longformat, Condition,, 2:ncol(ts_longFormat) ) ts_longformat = separate(ts_longformat, Condition, c("exposure", "") ) ts_longformat = mutate(ts_longformat, = as.numeric(gsub("h", "", ) ) ) ts_longformat # A tibble: 49,333 x 4 gene exposure 7

8 <chr> <chr> <dbl> <dbl> 1 A1BG Ctrl A1BG-AS1 Ctrl A1CF Ctrl A2M Ctrl A2M-AS1 Ctrl A2ML1 Ctrl A2MP1 Ctrl A3GALT2 Ctrl A4GALT Ctrl A4GNT Ctrl..12 #... with 49,323 more rows Is equivalent to library(tidyverse) #this will load readr, dplyr, and tidyr ts_longformat = read_tsv(".series.data.txt") %>% rename(gene = X1) %>% gather(condition,, 2:ncol(.) ) %>% separate(condition, c("exposure", "") ) %>% mutate( = as.numeric(gsub("h", "", ) ) ) ts_longformat # A tibble: 49,333 x 4 gene exposure <chr> <chr> <dbl> <dbl> 1 A1BG Ctrl A1BG-AS1 Ctrl A1CF Ctrl A2M Ctrl A2M-AS1 Ctrl A2ML1 Ctrl A2MP1 Ctrl A3GALT2 Ctrl A4GALT Ctrl A4GNT Ctrl..12 #... with 49,323 more rows Figure 4: Below shows the relationship between the above pipe commands and the commands executed one by one Summarizing data dplyr also offers ways to quickly group and then summarize your data using the group_by and summarize functions. 8

9 ts_longformat_exposure_summary = ts_longformat %>% group_by(exposure) %>% summarise(mean_ = mean(), median_ = median(), max_ = max(), min_ = min(), sd_ = sd()) ts_longformat_exposure_summary # A tibble: 4 x 6 exposure mean_ median_ max_ <chr> <dbl> <dbl> <dbl> 1 Ctrl Ifnb Lps R #... with 2 more variables: min_ <dbl>, # sd_ <dbl> ts_longformat_exposure summary = ts_longformat %>% group_by(exposure, ) %>% summarise(mean_ = mean(), median_ = median(), max_ = max(), min_ = min(), sd_ = sd()) ts_longformat_exposure summary # A tibble: 19 x 7 # Groups: exposure [?] exposure mean_ median_ <chr> <dbl> <dbl> <dbl> 1 Ctrl Ifnb Ifnb Ifnb Ifnb Ifnb Ifnb Lps Lps Lps Lps Lps Lps R R R R R

10 19 R #... with 3 more variables: max_ <dbl>, # min_ <dbl>, sd_ <dbl> Filtering data ts_longformat_crt = ts_longformat %>% filter(exposure == "Ctrl") ts_longformat_crt # A tibble: 25,87 x 4 gene exposure <chr> <chr> <dbl> <dbl> 1 A1BG Ctrl A1BG-AS1 Ctrl A1CF Ctrl A2M Ctrl A2M-AS1 Ctrl A2ML1 Ctrl A2MP1 Ctrl A3GALT2 Ctrl A4GALT Ctrl A4GNT Ctrl..12 #... with 25,797 more rows NAs in values Taking means, mins, maxes, etc. can be affected if NA values are present vals = c(1,3, 5, 9, 19, 23) mean(vals) [1] 11.5 min(vals) [1] 3 max(vals) [1] 23 vals = c(1,3, 5, 9, 19, 23, NA) mean(vals) [1] NA 1

11 min(vals) [1] NA max(vals) [1] NA You can handle this by setting na.rm = T vals = c(1,3, 5, 9, 19, 23, NA) mean(vals, na.rm = T) [1] 11.5 min(vals, na.rm = T) [1] 3 max(vals, na.rm = T) [1] 23 You can also just get rid of the NA values as well and if within in a data.frame you can use filter() you can do. # use na.rm =T ts_longformat_exposure summary = ts_longformat %>% group_by(exposure, ) %>% summarise(mean_ = mean(, na.rm =T), median_ = median(, na.rm =T), max_ = max(, na.rm =T), min_ = min(, na.rm =T), sd_ = sd(, na.rm =T)) # use filter to keep only values of that aren't (!) NA (is.na) ts_longformat_exposure summary = ts_longformat %>% filter(!is.na()) %>% group_by(exposure, ) %>% summarise(mean_ = mean(), median_ = median(), max_ = max(), min_ = min(), sd_ = sd()) dplyr offers a large array of available functions for manipulating data frames. Here is a list of resources for more options: 11

12 1. cheatsheet a few basics tutorial a webinar - Part 1. Excercises Download Average Temperatures USA Average Temperatures USA 1. convert to long format by using gather() on the temperate columns 2. separate columns so you have a column for year and month 3. create a table of mean temperatures for month, city, and months for each city (various group_by calls) 4. filter table to just one city or just one month Plotting Base R offers several basic plotting functions but in this course we will be focusing on using ggplot2 for plotting. A basic introduction can be found here, ggplot2 Basics A basic ggplot2 call # filter data to gene of interest ts_longformat_sod2 = ts_longformat %>% filter("sod2" == gene) ggplot(ts_longformat_sod2) + geom_point(aes(x =, y = ) ) 12

13 ggplot2 is based off of what is called Grammar of Graphics (hence gg plot), which is a book by Leland Wilkinson The philosophy of the book is that you should have a plotting system that allows you to simply describe what the plot should be based on and the computer will take care of it. Of note, ggplot2 is the name of the library but the function call itself is ggplot() and not ggplot2(). ggplot2 works best by working on a long-format data frame, you then describe all layers on the plot, you add each layer with another geom_[type] functions. There are many layers available, see http: //ggplot2.tidyverse.org/reference/index.html#section-layer-geoms for a list and examples of each. ggplot(ts_longformat_sod2) + geom_point(aes(x =, y = ) ) + geom_line(aes(x =, y = ) ) 13

14 Below is a diagram of how a generic ggplot2 is structured. The aspects of the plot that you want to map to specific column in the data frame are given in the aes() call within the layer calls, (aes is short for aesthetic). If the mapping aesthetics are shared between layers you can give them in the top ggplot() call and they will be applied to each layer. ggplot(ts_longformat_sod2, aes(x =, y = )) + geom_point() + geom_line() 14

15 Figure 5: Figure 6: 15

16 Now clearly this plot is not what we actually want to display, we are ignoring the exposure variable and this is causing the plot to look funny, so let s tell ggplot that we have a grouping variable, exposure. ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure)) + geom_point() + geom_line() 16

17 Now let s add some coloring to make this plot a little more exciting. ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, color = exposure)) + geom_point() + geom_line() 17

18 1 exposure Ctrl Ifnb Lps R Also when you add plotting aspects like coloring, ggplot2 assumes that this is also a grouping variable so you no longer have to supply grouping if you are giving a coloring variable. plotsod2 = ggplot(ts_longformat_sod2, aes(x =, y =, color = exposure)) + geom_point() + geom_line() Modifying colors Now the default color variables kind of leave a lot to be desired and we can set these colors to something else by using ggplot2 s scale_color_[func_name] to change the color plotting aspects. Here we are using the colors supplied by RColorBrewer, which you use by using scale_color_brewer(). 18

19 ggplot(ts_longformat_sod2, aes(x =, y =, color = exposure)) + geom_point() + geom_line() + scale_color_brewer(palette = "Dark2") 1 exposure Ctrl Ifnb Lps R The color brewer palettes were developed by Martin Krzywinski on which he wrote a Nature paper on the subject, certain palettes were developed to be color blind safe, more information about his work can be found here and a website for helping choosing colors can be found here For more information on how you can change the color other than scale_color_brewer() see here http: //ggplot2.tidyverse.org/reference/index.html#section-scales. 19

20 Setting colors manually As most PIs are extremely picky about colors, there are also easy ways of setting specific colors for specific grouping using scale_color_manual(). exposurecolors = c("#5ac8", "#AAA3C", "#AB45A", "#14D2DC") ggplot(ts_longformat_sod2, aes(x =, y =, color = exposure)) + geom_point() + geom_line() + scale_color_manual(values = exposurecolors) 1 exposure Ctrl Ifnb Lps R The above will assign the colors in the order they appear in the data frame and you can make it sure it doesn t matter the order or if certain levels are missing (which could then mess up the ordering) you can 2

21 name the color vector so that coloring is consistent. exposurecolors = c("#5ac8", "#AAA3C", "#AB45A", "#14D2DC", "#8214A") names(exposurecolors) = c("ctrl", "R848", "Ifnb", "Lps", "other") ggplot(ts_longformat_sod2, aes(x =, y =, color = exposure)) + geom_point() + geom_line() + scale_color_manual(values = exposurecolors) 1 exposure Ctrl Ifnb Lps R If you don t name one of the layer it will not get a color, so be careful of case etc. 21

22 exposurecolors = c("#5ac8", "#AAA3C", "#AB45A", "#14D2DC", "#8214A") names(exposurecolors) = c("ctrl", "R848", "Ifnb", "LPS", "other") ggplot(ts_longformat_sod2, aes(x =, y =, color = exposure)) + geom_point() + geom_line() + scale_color_manual(values = exposurecolors) 1 exposure Ctrl Ifnb Lps R

23 Changing point shape and line types ts_longformat_sod2_cd74 = ts_longformat %>% filter("sod2" == gene "CD74" == gene) # taking advantage of the or operator to do a check for either # also you can also the %in% operator that R offers ts_longformat_sod2_cd74 = ts_longformat %>% filter(gene %in% c("sod2", "CD74") ) # create a grouping variable to make plotting easier ts_longformat_sod2_cd74 = ts_longformat_sod2_cd74 %>% mutate(grouping = paste(gene, "-", exposure)) # using group = grouping to separate out the different genes and the exposure but still color by exposur ggplot(ts_longformat_sod2_cd74, aes(x =, y =, color = exposure, group = grouping)) + geom_point() + geom_line() + scale_color_brewer(palette = "Dark2") 23

24 15 1 exposure Ctrl Ifnb Lps R But just coloring by exposure we can t tell which lines and points are from which genes so lets change the shape and line types so we distinguish ggplot(ts_longformat_sod2_cd74, aes(x =, y =, color = exposure, group = grouping)) + geom_point(aes(shape = gene)) + geom_line(aes(linetype = gene)) + scale_color_brewer(palette = "Dark2") 24

25 15 1 gene CD74 SOD2 exposure Ctrl Ifnb Lps R Changing plot aspect not dependent on input data If you want to change certain aspect about the plot that doesn t depend on mapping data from the input data frame you put these setting on the output of the aes() call. Figure 7: 25

26 # make the points larger, the value given to size is a relative number ggplot(ts_longformat_sod2_cd74, aes(x =, y =, color = exposure, group = grouping)) + geom_point(aes(shape = gene), size = 3) + geom_line(aes(linetype = gene)) + scale_color_brewer(palette = "Dark2") 15 1 gene CD74 SOD2 exposure Ctrl Ifnb Lps R You can also then change the linetypes and shapes with scale_ functions genelinetypes =c("dotted", "solid") names(genelinetypes) = c("cd74", "SOD2") # make the points larger, the value given to size is a relative number ggplot(ts_longformat_sod2_cd74, aes(x =, y =, color = exposure, group = grouping)) + geom_point(aes(shape = gene), size = 3) + 26

27 geom_line(aes(linetype = gene)) + scale_color_brewer(palette = "Dark2") + scale_shape_manual(values = c(1, 3)) + scale_linetype_manual(values = genelinetypes) 15 1 gene CD74 SOD2 exposure Ctrl Ifnb Lps R Switching to another layer type Say you decide to take your plot and switch to a different layer type, you can reuse a lot of what you have already done. For example lets switch from a dot/line plot to a bar plot by using geom_bar(). By default geom_bar() does plotting by counting up all values that fall into a group, but if you want a specific values instead you have to give geom_bar() stat = "identity" 27

28 ggplot(ts_longformat_sod2, aes(x =, y =, color = exposure)) + geom_bar(stat = "identity") + scale_color_brewer(palette = "Dark2") exposure Ctrl Ifnb Lps R Notice now that for geom_bar() color controls the border of the bars but if we want the bars themselves to be the given color we have to use fill instead. ggplot(ts_longformat_sod2, aes(x =, y =, fill = exposure)) + geom_bar(stat = "identity") + scale_color_brewer(palette = "Dark2") 28

29 exposure Ctrl Ifnb Lps R But we lost the colors we were trying to set and that s because we are using the scale_color_brewer but we are now using fill instead so we need the scale_fill_brewer function instead. ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity") + scale_fill_brewer(palette = "Dark2") 29

30 exposure Ctrl Ifnb Lps R By default geom_bar() stacks all the bars belonging to the same x-axis grouping on top of each other but if we wanted them next to each other instead we give geom_bar() position = "dodge". ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette = "Dark2") 3

31 1 exposure Ctrl Ifnb Lps R We can make the bars stand out more by giving change the border color for all bars to be black ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge", color = "black") + scale_fill_brewer(palette = "Dark2") 31

32 1 exposure Ctrl Ifnb Lps R Now lets dress up the plot a little bit, we can do this by using ggplot2 s theme() function which allows the tweaking of many different aspects of how the plot itself looks in general, let s change the legend position so it s on the bottom instead. ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge", color = "black") + scale_fill_brewer(palette = "Dark2") + theme(legend.position = "bottom") 32

33 exposure Ctrl Ifnb Lps R848 To see all the things that theme() can do use the help function help(theme) We can also take advantage of preset themes supplied by ggplot2 ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge", color = "black") + scale_fill_brewer(palette = "Dark2") + theme_bw() + theme(legend.position = "bottom") 33

34 exposure Ctrl Ifnb Lps R848 We can also change the title and labels of axis with labs() function and lets get rid of the panel around the plot (panel.border = element_blank()). Also center the title (plot.title = element_text(hjust =.5)). ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge", color = "black") + scale_fill_brewer(palette = "Dark2") + labs(title = "Expression of SOD2 gene", y = "Gene Expression", x = "Time (hrs)") + theme_bw() + theme(legend.position = "bottom", panel.border = element_blank(), plot.title = element_text(hjust =.5)) 34

35 Expression of SOD2 gene 1 Gene Expression Time (hrs) exposure Ctrl Ifnb Lps R848 Controlling plotting order using factors Let s say we didn t change the column into a numeric column. Notice how the order isn t what we would want, this is because R will determine the order automatically by sorting the input values. ts_longformat_sod2 = read_tsv(".series.data.txt") %>% rename(gene = X1) %>% gather(condition,, 2:ncol(.) ) %>% separate(condition, c("exposure", "") ) %>% filter("sod2" == gene) ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge", color = "black") + 35

36 scale_fill_brewer(palette = "Dark2") + labs(title = "Expression of SOD2 gene", y = "Gene Expression", x = "Time (hrs)") + theme_bw() + theme(legend.position = "bottom", panel.border = element_blank(), plot.title = element_text(hjust =.5)) Expression of SOD2 gene 1 Gene Expression 5 h 12h 1h 24h 2h 4h 6h Time (hrs) exposure Ctrl Ifnb Lps R848 This can be fixed by changing the column into a factor from a character and set the order of levels ts_longformat_sod2 = ts_longformat_sod2 %>% mutate( = factor(, levels = c("h", "1h", "2h", "4h", "6h", "12h", "24h"))) ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge", color = "black") + 36

37 scale_fill_brewer(palette = "Dark2") + labs(title = "Expression of SOD2 gene", y = "Gene Expression", x = "Time (hrs)") + theme_bw() + theme(legend.position = "bottom", panel.border = element_blank(), plot.title = element_text(hjust =.5)) Expression of SOD2 gene 1 Gene Expression 5 h 1h 2h 4h 6h 12h 24h Time (hrs) exposure Ctrl Ifnb Lps R848 Saving plots Plots can be saved to a variety of image types but the most useful is likely pdf, this will allow you to be able to manipulate the plot in programs like Illustrator or Inkscape or save as any other image type after. To save as pdf we will use the function pdf(). How this functions works is that it opens up a pdf graphic 37

38 device which will catch all plot calls (rather than going to the plot window in RStudio) until the function dev.off() is called pdf("example_plot.pdf", width = 11, height = 8.5, usedingbats = F) ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge", color = "black") + scale_fill_brewer(palette = "Dark2") + labs(title = "Expression of SOD2 gene", y = "Gene Expression", x = "Time (hrs)") + theme_bw() + theme(legend.position = "bottom", panel.border = element_blank(), plot.title = element_text(hjust =.5)) dev.off() pdf 2 example_plot.pdf - The first argument is the name of a file you want to save the plots to, this will erase any file with this name if it already exists so be careful. width - This is the width of the plot in inches height - This is the height of the plot in inches usedingbats=f - This turns off the graphic library Dingbats, you always want to set this to FALSE, it causes R to create a larger file but if you don t turn off Dingbats it causes problems when editing the pdf latter in something like Illustrator. Multiple pages R will keeping adding pages to the opened pdf until dev.off() is called. pdf("example_plot_2_pages.pdf", width = 11, height = 8.5, usedingbats = F) ggplot(ts_longformat_sod2, aes(x =, y =, group = exposure, fill = exposure)) + geom_bar(stat = "identity", position = "dodge", color = "black") + scale_fill_brewer(palette = "Dark2") + labs(title = "Expression of SOD2 gene", y = "Gene Expression", x = "Time (hrs)") + theme_bw() + theme(legend.position = "bottom", panel.border = element_blank(), plot.title = element_text(hjust =.5)) ggplot(ts_longformat_sod2_cd74, aes(x =, y =, color = exposure, group = grouping)) + geom_point(aes(shape = gene), size = 3) + geom_line(aes(linetype = gene)) + scale_color_brewer(palette = "Dark2") + scale_shape_manual(values = c(1, 3)) + scale_linetype_manual(values = genelinetypes) dev.off() pdf 2 38

39 Part 2. Excercises Using the Temperature data frame read in earlier Average Temperatures USA 1. Filter the long format data frame created in Part 1 to just one Station_name 2. Modify the month column into a factor so that the months are organized in chronological order. (hint use this vector c("january","february","march","april","may","june","july","august","september","october", 3. Create a line and dot plot of temperature for the Station_name you picked in 1 with months on x-axis and temperatures on y-axis, color the lines by years (see what happens to the colors when you change years into a factor rather than a numeric data type) 4. Now create a barplot 5. Now filter the long format data frame again to be from 3 different stations and to just the year Take the new data frame from 5 and create a barplot x = months and y = temperature and color the bars by station name (try setting the station names to new custom colors of your choosing, you can use to pick colors) 39

Session 5 Nick Hathaway;

Session 5 Nick Hathaway; Session 5 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Adding Text To Plots 1 Line graph................................................. 1 Bar graph..................................................

More information

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

03 - Intro to graphics (with ggplot2)

03 - Intro to graphics (with ggplot2) 3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................

More information

Data Visualization. Module 7

Data Visualization.  Module 7 Data Visualization http://datascience.tntlab.org Module 7 Today s Agenda A Brief Reminder to Update your Software A walkthrough of ggplot2 Big picture New cheatsheet, with some familiar caveats Geometric

More information

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013 Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")

More information

CRAN and Libraries CRAN AND LIBRARIES

CRAN and Libraries CRAN AND LIBRARIES V CRAN AND LIBRARIES V CRAN and Libraries One of the major advantages of using R for data analysis is the rich and active community that surrounds it. There is a rich ecosystem of extensions (also known

More information

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R and the tidyverse. Paolo Crosetto Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:

More information

Demo yeast mutant analysis

Demo yeast mutant analysis Demo yeast mutant analysis Jean-Yves Sgro February 20, 2018 Contents 1 Analysis of yeast growth data 1 1.1 Set working directory........................................ 1 1.2 List all files in directory.......................................

More information

Financial Econometrics Practical

Financial Econometrics Practical Financial Econometrics Practical Practical 3: Plotting in R NF Katzke Table of Contents 1 Introduction 1 1.0.1 Install ggplot2................................................. 2 1.1 Get data Tidy.....................................................

More information

The Average and SD in R

The Average and SD in R The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

The diamonds dataset Visualizing data in R with ggplot2

The diamonds dataset Visualizing data in R with ggplot2 Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part

More information

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018 Lecture 09 Graphics::ggplot I 2018 R Teaching Team October 1, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3. R Markdown: The

More information

Plotting with ggplot2: Part 2. Biostatistics

Plotting with ggplot2: Part 2. Biostatistics Plotting with ggplot2: Part 2 Biostatistics 14.776 Building Plots with ggplot2 When building plots in ggplot2 (rather than using qplot) the artist s palette model may be the closest analogy Plots are built

More information

Data Import and Formatting

Data Import and Formatting Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data

More information

Creating elegant graphics in R with ggplot2

Creating elegant graphics in R with ggplot2 Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is

More information

Data Wrangling in the Tidyverse

Data Wrangling in the Tidyverse Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction

More information

A Whistle-Stop Tour of the Tidyverse

A Whistle-Stop Tour of the Tidyverse A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available

More information

ggplot in 3 easy steps (maybe 2 easy steps)

ggplot in 3 easy steps (maybe 2 easy steps) 1 ggplot in 3 easy steps (maybe 2 easy steps) 1.1 aesthetic: what you want to graph (e.g. x, y, z). 1.2 geom: how you want to graph it. 1.3 options: optional titles, themes, etc. 2 Background R has a number

More information

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences' Lecture 8: The grammar of graphics STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University Grammar? A set of rules describing how to compose a 'vocabulary'

More information

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

ggplot2 for beginners Maria Novosolov 1 December, 2014

ggplot2 for beginners Maria Novosolov 1 December, 2014 ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

Introducing R/Tidyverse to Clinical Statistical Programming

Introducing R/Tidyverse to Clinical Statistical Programming Introducing R/Tidyverse to Clinical Statistical Programming MBSW 2018 Freeman Wang, @freestatman 2018-05-15 Slides available at https://bit.ly/2knkalu Where are my biases Biomarker Statistician Genomic

More information

The Tidyverse BIOF 339 9/25/2018

The Tidyverse BIOF 339 9/25/2018 The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,

More information

Data Manipulation. Module 5

Data Manipulation.   Module 5 Data Manipulation http://datascience.tntlab.org Module 5 Today s Agenda A couple of base-r notes Advanced data typing Relabeling text In depth with dplyr (part of tidyverse) tbl class dplyr grammar Grouping

More information

Package arphit. March 28, 2019

Package arphit. March 28, 2019 Type Package Title RBA-style R Plots Version 0.3.1 Author Angus Moore Package arphit March 28, 2019 Maintainer Angus Moore Easily create RBA-style graphs

More information

Tidy Evaluation. Lionel Henry and Hadley Wickham RStudio

Tidy Evaluation. Lionel Henry and Hadley Wickham RStudio Tidy Evaluation Lionel Henry and Hadley Wickham RStudio Tidy evaluation Our vision for dealing with a special class of R functions Usually called NSE but we prefer quoting functions Most interesting language

More information

Data Handling: Import, Cleaning and Visualisation

Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation 1 Data Display Lecture 11: Visualisation and Dynamic Documents Prof. Dr. Ulrich Matter (University of St. Gallen) 13/12/18 In the last part of a data pipeline

More information

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning

More information

Data Import and Export

Data Import and Export Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you

More information

Package ggextra. April 4, 2018

Package ggextra. April 4, 2018 Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',

More information

Making Tables and Graphs with Excel. The Basics

Making Tables and Graphs with Excel. The Basics Making Tables and Graphs with Excel The Basics Where do my IV and DV go? Just like you would create a data table on paper, your IV goes in the leftmost column and your DV goes to the right of the IV Enter

More information

Making sense of census microdata

Making sense of census microdata Making sense of census microdata Tutorial 3: Creating aggregated variables and visualisations First, open a new script in R studio and save it in your working directory, so you will be able to access this

More information

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Table of Contents Introduction!... 1 Part 1: Entering Data!... 2 1.a: Typing!... 2 1.b: Editing

More information

Лекция 4 Трансформация данных в R

Лекция 4 Трансформация данных в R Анализ данных Лекция 4 Трансформация данных в R Гедранович Ольга Брониславовна, старший преподаватель кафедры ИТ, МИУ volha.b.k@gmail.com 2 Вопросы лекции Фильтрация (filter) Сортировка (arrange) Выборка

More information

Assignment 0. Nothing here to hand in

Assignment 0. Nothing here to hand in Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very

More information

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 Who s ahead in the polls? 2/86 What values are displayed in this chart? 3/86

More information

EXCEL 2003 DISCLAIMER:

EXCEL 2003 DISCLAIMER: EXCEL 2003 DISCLAIMER: This reference guide is meant for experienced Microsoft Excel users. It provides a list of quick tips and shortcuts for familiar features. This guide does NOT replace training or

More information

Session 1 Nick Hathaway;

Session 1 Nick Hathaway; Session 1 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents R Basics 1 Variables/objects.............................................. 1 Functions..................................................

More information

Data wrangling. Reduction/Aggregation: reduces a variable to a scalar

Data wrangling. Reduction/Aggregation: reduces a variable to a scalar Data Wrangling Some definitions A data table is a collection of variables and observations A variable (when data are tidy) is a single column in a data table An observation is a single row in a data table,

More information

Introduction to Functions. Biostatistics

Introduction to Functions. Biostatistics Introduction to Functions Biostatistics 140.776 Functions The development of a functions in R represents the next level of R programming, beyond writing code at the console or in a script. 1. Code 2. Functions

More information

Package ggseas. June 12, 2018

Package ggseas. June 12, 2018 Package ggseas June 12, 2018 Title 'stats' for Seasonal Adjustment on the Fly with 'ggplot2' Version 0.5.4 Maintainer Peter Ellis Provides 'ggplot2' 'stats' that estimate

More information

Lecture 12: Data carpentry with tidyverse

Lecture 12: Data carpentry with tidyverse http://127.0.0.1:8000/.html Lecture 12: Data carpentry with tidyverse STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University options(repr.plot.width=5, repr.plot.height=3)

More information

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2 Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2 Package gggenes November 7, 2018 Provides a 'ggplot2' geom and helper functions for drawing gene arrow maps. Depends R (>= 3.3.0) Imports grid (>=

More information

Graphical critique & theory. Hadley Wickham

Graphical critique & theory. Hadley Wickham Graphical critique & theory Hadley Wickham Exploratory graphics Are for you (not others). Need to be able to create rapidly because your first attempt will never be the most revealing. Iteration is crucial

More information

1 Introduction to Using Excel Spreadsheets

1 Introduction to Using Excel Spreadsheets Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Logical operators: R provides an extensive list of logical operators. These include

Logical operators: R provides an extensive list of logical operators. These include meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few

More information

EVALUATION COPY. Unauthorized Reproduction or Distribution Prohibited EXCEL INTERMEDIATE

EVALUATION COPY. Unauthorized Reproduction or Distribution Prohibited EXCEL INTERMEDIATE EXCEL INTERMEDIATE Overview NOTES... 2 OVERVIEW... 3 VIEW THE PROJECT... 5 USING FORMULAS AND FUNCTIONS... 6 BASIC EXCEL REVIEW... 6 FORMULAS... 7 Typing formulas... 7 Clicking to insert cell references...

More information

A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL

A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL 1 R and RStudio OVERVIEW 2 R and RStudio R is a free and open environment

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 20 - Summary Sungkyu Jung Semester recap data visualization data wrangling professional ethics statistical foundation Statistical modeling: Regression Cause and effect:

More information

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created

More information

R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017

R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017 R R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt 08 June 2017 Introduction What is R?! R is a programming language for statistical computing and graphics R is free and open-source

More information

Lecture 4: Data Visualization I

Lecture 4: Data Visualization I Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview

More information

1 The ggplot2 workflow

1 The ggplot2 workflow ggplot2 @ statistics.com Week 2 Dope Sheet Page 1 dope, n. information especially from a reliable source [the inside dope]; v. figure out usually used with out; adj. excellent 1 This week s dope This week

More information

Package lvplot. August 29, 2016

Package lvplot. August 29, 2016 Version 0.2.0 Title Letter Value 'Boxplots' Package lvplot August 29, 2016 Implements the letter value 'boxplot' which extends the standard 'boxplot' to deal with both larger and smaller number of data

More information

Data Visualization. Andrew Jaffe Instructor

Data Visualization. Andrew Jaffe Instructor Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

Plotting with Rcell (Version 1.2-5)

Plotting with Rcell (Version 1.2-5) Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson

More information

Dplyr Introduction Matthew Flickinger July 12, 2017

Dplyr Introduction Matthew Flickinger July 12, 2017 Dplyr Introduction Matthew Flickinger July 12, 2017 Introduction to Dplyr This document gives an overview of many of the features of the dplyr library include in the tidyverse of related R pacakges. First

More information

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold

More information

= 3 + (5*4) + (1/2)*(4/2)^2.

= 3 + (5*4) + (1/2)*(4/2)^2. Physics 100 Lab 1: Use of a Spreadsheet to Analyze Data by Kenneth Hahn and Michael Goggin In this lab you will learn how to enter data into a spreadsheet and to manipulate the data in meaningful ways.

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018 LabA - Intro to GGPLOT2 Z.Sang Sept 24, 218 In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlike final graphs for publication or thesis, exploratory

More information

MATLAB TUTORIAL WORKSHEET

MATLAB TUTORIAL WORKSHEET MATLAB TUTORIAL WORKSHEET What is MATLAB? Software package used for computation High-level programming language with easy to use interactive environment Access MATLAB at Tufts here: https://it.tufts.edu/sw-matlabstudent

More information

Outline day 4 May 30th

Outline day 4 May 30th Graphing in R: basic graphing ggplot2 package Outline day 4 May 30th 05/2017 117 Graphing in R: basic graphing 05/2017 118 basic graphing Producing graphs R-base package graphics offers funcaons for producing

More information

STA130 - Class #2: Nathan Taback

STA130 - Class #2: Nathan Taback STA130 - Class #2: Nathan Taback 2018-01-15 Today's Class Histograms and density functions Statistical data Tidy data Data wrangling Transforming data 2/51 Histograms and Density Functions Histograms and

More information

You are to turn in the following three graphs at the beginning of class on Wednesday, January 21.

You are to turn in the following three graphs at the beginning of class on Wednesday, January 21. Computer Tools for Data Analysis & Presentation Graphs All public machines on campus are now equipped with Word 2010 and Excel 2010. Although fancier graphical and statistical analysis programs exist,

More information

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first Why use R? Introduction to R: Using R for statistics ti ti and data analysis BaRC Hot Topics October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/r2011/ To perform inferential statistics

More information

Package ezsummary. August 29, 2016

Package ezsummary. August 29, 2016 Type Package Title Generate Data Summary in a Tidy Format Version 0.2.1 Package ezsummary August 29, 2016 Functions that simplify the process of generating print-ready data summary using 'dplyr' syntax.

More information

The Foundation. Review in an instant

The Foundation. Review in an instant The Foundation Review in an instant Table of contents Introduction 1 Basic use of Excel 2 - Important Excel terms - Important toolbars - Inserting and deleting columns and rows - Copy and paste Calculations

More information

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics Introduction to R: Using R for Statistics and Data Analysis BaRC Hot Topics http://barc.wi.mit.edu/hot_topics/ Why use R? Perform inferential statistics (e.g., use a statistical test to calculate a p-value)

More information

Using R for statistics and data analysis

Using R for statistics and data analysis Introduction ti to R: Using R for statistics and data analysis BaRC Hot Topics October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/r2011/ Why use R? To perform inferential statistics (e.g.,

More information

Introduction to R: Using R for statistics and data analysis

Introduction to R: Using R for statistics and data analysis Why use R? Introduction to R: Using R for statistics and data analysis George W Bell, Ph.D. BaRC Hot Topics November 2014 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/

More information

# Call plot plot(gg)

# Call plot plot(gg) Most of the requirements related to look and feel can be achieved using the theme() function. It accepts a large number of arguments. Type?theme in the R console and see for yourself. # Setup options(scipen=999)

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

Package ggsubplot. February 15, 2013

Package ggsubplot. February 15, 2013 Package ggsubplot February 15, 2013 Maintainer Garrett Grolemund License GPL Title Explore complex data by embedding subplots within plots. LazyData true Type Package Author Garrett

More information

Lecture 3: Basics of R Programming

Lecture 3: Basics of R Programming Lecture 3: Basics of R Programming This lecture introduces you to how to do more things with R beyond simple commands. Outline: 1. R as a programming language 2. Grouping, loops and conditional execution

More information

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist LondonR: Introduction to ggplot2 Nick Howlett Data Scientist Email: nhowlett@mango-solutions.com Agenda Catie Gamble, M&S - Using R to Understand Revenue Opportunities for your Online Business Andrie de

More information

This chapter describes a handful of things you can do to customize Office

This chapter describes a handful of things you can do to customize Office Chapter 1: Customizing an Office Program In This Chapter Personalizing the Ribbon Changing around the Quick Access toolbar Choosing what appears on the status bar Choosing a new color scheme Devising keyboard

More information

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics Introduction to R: Using R for Statistics and Data Analysis BaRC Hot Topics http://barc.wi.mit.edu/hot_topics/ Why use R? Perform inferential statistics (e.g., use a statistical test to calculate a p-value)

More information

Tricking it Out: Tricks to personalize and customize your graphs.

Tricking it Out: Tricks to personalize and customize your graphs. Tricking it Out: Tricks to personalize and customize your graphs. Graphing templates may be used online without downloading them onto your own computer. However, if you would like to use the templates

More information

Visualizing Data: Customization with ggplot2

Visualizing Data: Customization with ggplot2 Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers

More information

Intro to R h)p://jacobfenton.s3.amazonaws.com/r- handson.pdf. Jacob Fenton CAR Director InvesBgaBve ReporBng Workshop, American University

Intro to R h)p://jacobfenton.s3.amazonaws.com/r- handson.pdf. Jacob Fenton CAR Director InvesBgaBve ReporBng Workshop, American University Intro to R h)p://jacobfenton.s3.amazonaws.com/r- handson.pdf Jacob Fenton CAR Director InvesBgaBve ReporBng Workshop, American University Overview Import data Move around the file system, save an image

More information

Introduction to Minitab 1

Introduction to Minitab 1 Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,

More information

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0 Type Package Title Tidy Statistical Inference Version 0.3.0 Package infer July 11, 2018 The objective of this package is to perform inference using an epressive statistical grammar that coheres with the

More information

Introduction to the workbook and spreadsheet

Introduction to the workbook and spreadsheet Excel Tutorial To make the most of this tutorial I suggest you follow through it while sitting in front of a computer with Microsoft Excel running. This will allow you to try things out as you follow along.

More information

Week 1: Introduction to R, part 1

Week 1: Introduction to R, part 1 Week 1: Introduction to R, part 1 Goals Learning how to start with R and RStudio Use the command line Use functions in R Learning the Tools What is R? What is RStudio? Getting started R is a computer program

More information

ArtDMX DMX control software V1.4

ArtDMX DMX control software V1.4 User manual ArtDMX DMX control software V1.4 1 2 Table of contents : 1. How to start a new Project...6 1.1. Introduction...6 1.2. System Requirements...6 1.3. Installing software and drivers...7 1.4. Software

More information

Transform Data! The Basics Part I!

Transform Data! The Basics Part I! Transform Data! The Basics Part I! arrange() arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will be used

More information

SUM - This says to add together cells F28 through F35. Notice that it will show your result is

SUM - This says to add together cells F28 through F35. Notice that it will show your result is COUNTA - The COUNTA function will examine a set of cells and tell you how many cells are not empty. In this example, Excel analyzed 19 cells and found that only 18 were not empty. COUNTBLANK - The COUNTBLANK

More information

Individual Covariates

Individual Covariates WILD 502 Lab 2 Ŝ from Known-fate Data with Individual Covariates Today s lab presents material that will allow you to handle additional complexity in analysis of survival data. The lab deals with estimation

More information

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...

More information

Lab2 Jacob Reiser September 30, 2016

Lab2 Jacob Reiser September 30, 2016 Lab2 Jacob Reiser September 30, 2016 Introduction: An R-Blogger recently found a data set from a project of New York s Public Library called What s on the Menu, which can be found at https://www.r-bloggers.com/a-fun-gastronomical-dataset-whats-on-the-menu/.

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates 10-09-03 Outline Contents 1 R Graphics Systems Graphics systems in R ˆ R provides three dierent high-level graphics systems base graphics The system

More information

Old Faithful Chris Parrish

Old Faithful Chris Parrish Old Faithful Chris Parrish 17-4-27 Contents Old Faithful eruptions 1 data.................................................. 1 duration................................................ 1 waiting time..............................................

More information

Spreadsheet View and Basic Statistics Concepts

Spreadsheet View and Basic Statistics Concepts Spreadsheet View and Basic Statistics Concepts GeoGebra 3.2 Workshop Handout 9 Judith and Markus Hohenwarter www.geogebra.org Table of Contents 1. Introduction to GeoGebra s Spreadsheet View 2 2. Record

More information

Using IDLE for

Using IDLE for Using IDLE for 15-110 Step 1: Installing Python Download and install Python using the Resources page of the 15-110 website. Be sure to install version 3.3.2 and the correct version depending on whether

More information