R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017

Size: px
Start display at page:

Download "R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017"

Transcription

1 R R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt 08 June 2017

2 Introduction

3 What is R?! R is a programming language for statistical computing and graphics R is free and open-source software available on Linux, Windows and OS X R implements a wide variety of statistical and graphical techniques User-created packages vastly extend and enhance the capabilities of R

4 Is it useful for me? If you are at all going to run some empirical study and you would like to run the statistical analysis independently and make some beautiful & meaningful graphical depictions of your data or you would like to do some text corpus analysis well then, Yes, very much!

5 Is it difficult??? Well, let s say the learning curve in the beginning is a bit steep ;-) But rest assured that it is every bit worth it. R related questions / problems??? Most probably someone else has already looked for answers online which means: The answers are in most cases just an online search away really!

6 Installing R Download and install for free from Core packages, functions and the R console are installed by default R commands can then be issued via the text-based R console Additional packages can be installed on the fly as and when necessary If you like, you could learn the very basics of R even before installing it Try-R, a browser-based basic R tutorial

7 Installing RStudio (optional, but recommended) RStudio is one of the development environments available for R Download and install RStudio from Using RStudio, you could use the full capability of R plus design web apps or even create presentations of this sort

8 Alright, I ve installed things, what next? You re all set to explore, visualise and analyse your data and more. Just a couple of things to know before starting the R console: 1. If you would like to install a package that is not already installed: In R Studio: Tools -> Install Packages In R console: Enter install.packages("packagename") 2. Set the working directory to the one in which you have your data files: In R Studio: Session -> Set Working Directory -> Choose Directory In R console: Use the command setwd("path/to/directory") 3. And at anytime you have questions about a certain R function: Type help(functionname) to read the documentation Type example(functionname) to see a usage example

9 R Basics

10 First steps The > R prompt indicates R is ready to receive & interpret commands We can now type commands into the R console > ## [1] 3 > bla <- 100 # Assign the value 100 to the variable / object bla > # <- is the assignment operator in R > > bla # Now, just typing bla would print its value ## [1] 100 > Bla # Bla is not the same as bla! Everything is case sensitive! > # So this simply returns an error message.

11 More examples > var_a <- 1 > var_a ## [1] 1 > var_b <- 2 > var_b ## [1] 2 > var_c <- var_a + var_b > var_c ## [1] 3

12 Objects and data types Objects / variables are simply handles or names for different kinds of data Object names may be alphanumeric, but must begin with an alphabet No spaces allowed in object names! Every object is of a certain data type > var_a <- 1 > typeof(var_a) # typeof(xyz) : returns the type of the object xyz ## [1] "double" > var_vector <- c(1,2,3) # c() : combines multiple objects > # of the same type into a vector > var_name <- "Mr. Bean" # Notice the " "??? > # ==> character string object > typeof(var_name) ## [1] "character"

13 Let s type > var_name <- "Mr. Bean" > var_age <- 45 And now type > var_x <- var_name + var_age What is the output you get?

14 Let s try > var_y <- c(var_name, var_age) > var_y ## [1] "Mr. Bean" "45" > typeof(var_y) ## [1] "character" Why?

15 Why is the type of an object important? Identifying objects with a certain data type ensures data integrity Because, only functions appropriate for that data type can apply to them. Refer: Quick-R Data Types

16 Scalars Scalars are nothing but singleton values > # Singleton values # typeof(.) returns the following > var_numeric_int <- 1 # 'double'! > > var_numeric_double <- 1.0 # 'double' > > var_char_string <- "A1" # 'char' > > var_logical_tf <- TRUE # 'logical' > # Not a character string! No " ", see? > var_logical_notavailable <- NA # 'logical'!

17 Vectors Vectors are 1D arrays The elements of a vector must all be of the identical type! > var_vector_numeric <- c(1,2,3) > var_vector_char <- c("a","b","1") > var_vector_logical <- c(true, FALSE) # Notice the absence of ""?

18 Matrices and Data Frames Matrices are 2D arrays All columns of a matrix must be of the identical type and length! Data frames are more generic than matrices; comparable to excel tables; Each column can be of any type Each column is accessible as a vector Data frames are the most common type of data in R Refer: Quick-R Data Types Refer: Quick-R Matrices

19 Arithmetic operators Arithmetic operators work on scalars, vectors and matrices Also called binary operators in R Operator Description + addition - subtraction * multiplication / division ** exponentiation (circumflex also works) x %% y modulus (x mod y) 5%%2 is 1 x %/% y integer division 5%/%2 is 2 Refer:

20 Logical Operators Logical operators are for comparing things; they return TRUE / FALSE Operator Description < less than <= less than or equal to > greater than >= greater than or equal to == exactly equal to!= not equal to!x Not x x && y short circuit AND; for single values; used in if checks x y short circuit OR; for single values; used in if checks x & y vectorised AND (applies to all elements in a vector) x y vectorised OR (applies to all elements in a vector)

21 Loops, condition checks, user-defined functions etc. Base R Cheatsheet base-r.pdf

22 Data Analysis Workflow

23 Workflow General workflow These steps often happen in a repeating cycle 1. Read data into R. Input files can be, among other things: an excel sheet, a comma / space / tab separated text file an xml file, or something directly from the web running text such as a corpus 2. Understand the data structure and what you want to do with it 3. Transform data to do what you want 4. Do what you want: calculate descriptive statistics generate plots run various statistical tests you name it! 5. Save your R code for later use, say as SomethingMeaningful.R

24 R Scripts The code we write on the console can be saved as an R Script In RStudio: File -> New -> R Script opens editor panel to write & save code Elsewhere: Simply use any text editor to write & save code Save the file as, say SomethingMeaningful.R To save output generated by R script (and not see it on the console), include: sink("meaningfuloutput.txt") in the beginning of the R script and sink() at the end of the R script Now, you can source the R Script, meaning execute all the commands in it all at once In R Studio, just click the Source button above the editor panel In R console, type source("somethingmeaningful.r")

25 Data formats: Wide-format Data Each row contains multiple variables of interest for each observation > WideData ## # A tibble: 4 7 ## Participant ExpV RT1 RT2 RT3 RT4 YorN ## <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 S Y ## 2 S N ## 3 S Y ## 4 S Y

26 Data formats: Long-format Data Each row contains a single variable of interest for a single observation > LongData ## # A tibble: 16 5 ## Participant ExpV YorN Trial Measurement ## <chr> <int> <chr> <chr> <dbl> ## 1 S001 1 Y RT ## 2 S001 1 Y RT ## 3 S001 1 Y RT ## 4 S001 1 Y RT ## 5 S002 1 N RT ## 6 S002 1 N RT ## 7 S002 1 N RT ## 8 S002 1 N RT ## 9 S003 2 Y RT ## 10 S003 2 Y RT

27 Which format is best? In most cases, long-format data is the easiest to work with, because: each observation is in its own row each variable is in its own column Transforming, visualising and analysing long-format data is straightforward There are R packages to convert between the two formats Refer: R Cookbook wide to long format and vice versa We ll learn one of the methods soon.

28 Read data from text files: a basic example Import tabular text data as a data frame # > BehavData <- read.table("allres.txt") > # BehavData : typing the data frame name displays the whole df > head(behavdata) # head() : displays the first few rows of the df ## V1 V2 V3 V4 V5 V6 V7 V8 ## 1 NF01 16 FOS F OS C 2 ## 2 NF01 25 MOS M OS C 2 ## 3 NF01 13 FSO F SO C 2 ## 4 NF01 12 MSO M SO C 2 ## 5 NF01 4 FOS F OS C 2 ## 6 NF01 8 FSO F SO C 2 > # tail(behavdata) # tail() : displays the last few rows of the df # We won t use this method to import data for ver long! We will learn a better method in a bit.

29 Data structure and its dimensions > str(behavdata) # str() : displays the structure of the object ## 'data.frame': 3478 obs. of 8 variables: ## $ V1: Factor w/ 29 levels "NF01","NF02",..: ## $ V2: int ## $ V3: Factor w/ 4 levels "FOS","FSO","MOS",..: ## $ V4: Factor w/ 2 levels "F","M": ## $ V5: Factor w/ 2 levels "OS","SO": ## $ V6: num ## $ V7: Factor w/ 2 levels "C","X": ## $ V8: int > dim(behavdata) # dim() : displays the dimensions of the object ## [1]

30 Name columns in a data frame > names(behavdata) <- + c("subj", "Item", "Condition", "WF1", "WF2", "RT", + "Accuracy", "Response") > head(behavdata) ## Subj Item Condition WF1 WF2 RT Accuracy Response ## 1 NF01 16 FOS F OS C 2 ## 2 NF01 25 MOS M OS C 2 ## 3 NF01 13 FSO F SO C 2 ## 4 NF01 12 MSO M SO C 2 ## 5 NF01 4 FOS F OS C 2 ## 6 NF01 8 FSO F SO C 2 # Again, we won t need this when we learn the better method to import data soon.

31 Access different fields of a data frame > head(behavdata) ## Subj Item Condition WF1 WF2 RT Accuracy Response ## 1 NF01 16 FOS F OS C 2 ## 2 NF01 25 MOS M OS C 2 ## 3 NF01 13 FSO F SO C 2 ## 4 NF01 12 MSO M SO C 2 ## 5 NF01 4 FOS F OS C 2 ## 6 NF01 8 FSO F SO C 2 > head(behavdata$rt) ## [1]

32 Plots and Statistics

33 Histogram > library(ggplot2) > ggplot(behavdata, aes(x = RT)) + geom_histogram(binwidth = 0.2) count RT Refer: Refer:

34 Density Plot > ggplot(behavdata, aes(x = RT)) + geom_density() 0.3 density RT

35 Checking for Normality: Q-Q Norm Plot > ggplot(behavdata) + geom_qq(aes(sample = RT)) 6 sample theoretical

36 Statistical Normality Tests > # Anderson-Darling normality test > library(nortest) > ad.test(behavdata$rt) ## ## Anderson-Darling normality test ## ## data: BehavData$RT ## A = , p-value < 2.2e-16 > # Shapiro-Wilk normality test > shapiro.test(behavdata$rt) ## ## Shapiro-Wilk normality test ## ## data: BehavData$RT ## W = , p-value < 2.2e-16

37 Statistical Normality Tests > # Kolmogorov-Smirnot normality test > ks.test(behavdata$rt, "pnorm") ## Warning in ks.test(behavdata$rt, "pnorm"): ties should not be present ## the Kolmogorov-Smirnov test ## ## One-sample Kolmogorov-Smirnov test ## ## data: BehavData$RT ## D = , p-value < 2.2e-16 ## alternative hypothesis: two-sided Refer: Blog entry on the topic Refer: Stackexchange page on the topic

38 Mean, Median, Standard Deviation > # Arithmetic Mean > mean(behavdata$rt) ## [1] > # Median > median(behavdata$rt) ## [1] > # Standard Deviation > sd(behavdata$rt) ## [1] > # Variance = SD2 > var(behavdata$rt) ## [1] Refer: Quick-R Descriptive Statistics

39 Aggregating over factors Calculate mean, sd etc. over specified factor(s): aggregate function > # Aggregate Variable by a single Factor > aggregate(variable ~ Factor, data = XyzData, FUN = mean) > # FUN = mean => calculate mean; > # Other possible options: sd, var, length... > > # Aggregate Variable by a multiple Factors > aggregate(variable ~ Factor1 * Factor2, data = XyzData, FUN = mean) > # The Variable ~ Factors part is referred to as the 'formula'

40 Aggregating over factors > RT_m_Subj <- + aggregate(rt ~ Subj, data = BehavData, FUN = mean, na.rm = TRUE) > # RT ~ Subj => aggregate RT by the factor Subj > # na.rm = TRUE => exclude missing values (NA = not available) > head(rt_m_subj) ## Subj RT ## 1 NF ## 2 NF ## 3 NF ## 4 NF ## 5 NF ## 6 NF

41 ANOVA > # Repeated Measures ANOVA : Reaction Time -- Analysis by SUBJECTS > # To test if the SUBJECTS differ significantly between each other > > # First calculate a mean per subject per condition. > RT_m_Subj_WF1_WF2 <- aggregate(rt ~ Subj * WF1 * WF2, + data = BehavData, FUN = mean, na.rm = T > # Run the ANOVA > RT_aov_Subj <- aov(rt ~ WF1 * WF2 + Error(Subj/(WF1*WF2)), + data = RT_m_Subj_WF1_WF2) > > print(summary(rt_aov_subj))

42 ANOVA > # Repeated Measures ANOVA : Reaction Time -- Analysis by ITEMS > # To test if the ITEMS differ significantly between each other > BehavData$Item <- as.factor(behavdata$item) > # First calculate a mean per item per condition. > RT_m_Item_WF1_WF2 <- aggregate(rt ~ Item * WF1 * WF2, + data = BehavData, FUN = mean, + na.rm = T) > # Run the ANOVA > RT_aov_Item <- aov(rt ~ WF1 * WF2 + Error(Item/(WF1*WF2)), + data = RT_m_Item_WF1_WF2 ) > > print(summary(rt_aov_item)) Refer:

43 Correlations, t-tests, An exhaustive list of statistical tests

44 Good to know Many ways to do the same thing Many common tasks can be accomplished in more than one way in R This is both appealing and frustrating, depending on the context Hmmm

45 Good to know Many ways to do the same thing Many common tasks can be accomplished in more than one way in R This is both appealing and frustrating, depending on the context Hmmm This begs the question: wouldn t it be lovely if there s a way to do most of the common tasks in a consistent manner???

46 Good to know Many ways to do the same thing Many common tasks can be accomplished in more than one way in R This is both appealing and frustrating, depending on the context Hmmm This begs the question: wouldn t it be lovely if there s a way to do most of the common tasks in a consistent manner??? Enter The Tidyverse

47 The Tidyverse

48 The Tidyverse A collection of R packages that share common philosophies and are designed to work together tidyverse.org Goal : Solve complex problems by combining simple, uniform pieces! Package Design See Data Science in tidyverse: Hadley Wickham One function = one task Input and output of every function is a tidy dataframe (= tibble) Consequence: tidyverse functions are pipeable! # > install.packages("tidyverse") # Installs the tidyverse collection # Curious what pipeable means??? Wait a bit more to know :-)

49 The Tidyverse > library(tidyverse) # Loads the core tidyverse packages ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse: readr ## Loading tidyverse: purrr ## Loading tidyverse: dplyr ## Conflicts with tidy packages ## filter(): dplyr, stats ## lag(): dplyr, stats > library(readxl) # Other tidyverse packages loaded when needed

50 Tidy Data Each variable is a column, each obser ation / case is a row! See Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59 (1), > LongData ## # A tibble: 16 5 ## Participant ExpV YorN Trial Measurement ## <chr> <int> <chr> <chr> <dbl> ## 1 S001 1 Y RT ## 2 S001 1 Y RT ## 3 S001 1 Y RT ## 4 S001 1 Y RT ## 5 S002 1 N RT ## 6 S002 1 N RT ## 7 S002 1 N RT ## 8 S002 1 N RT

51 Read data from text files: readr::read_delim > library(tidyverse) # This also loads readr, among other packages! > # For comma separated file with header row present in the input file > ExpData <- read_delim("filename.csv", delim = ",", col_names = TRUE) > # delim => delimiter, i.e., the column separator in the input > > # For tab separated file with no header row present in the input > ExpData <- read_delim("filename.txt", delim = "\t", + col_names = c("subject", "Task", "RT")) > # We provide meaningful column names in the command > > # For space separated file: > ExpData <- read_delim("filename.xyz", delim = " ", col_names = TRUE) > # For semicolon separated file: > ExpData <- read_delim("filename.log", delim = ";", + col_names = c("name", "Age"))

52 Read data from excel files: readxl::read_excel > library(readxl) > # Read a single worksheet (the first by default, if multiple worksheet > ExpData <- read_excel("filename.xlsx", col_names = TRUE) > # Read specific worksheet from the file, by index > ExpData <- read_excel("filename.xlsx", 3, + col_names = c("name", "Age", "RT")) > # Read specific worksheet from the file, by index > ExpData <- read_excel("filename.xlsx", 3, + col_names = c("name", "Age", "RT")) Attention please!!! Spaces are bad bad bad in filenames, column names and basically any names! Bad apples: Exp Data.xlsx, Subj ID, bla bla bla etc. Instead, use: Exp_Data.xlsx, Subj-ID, bla_blabla etc.

53 Read Data : Tidy Example > BehavDataTidy <- + readr::read_delim("allres.txt", delim = " ", + col_names = c("subj", "Item", "Condition", "WF1", + "WF2", "RT", "Accuracy", "Resp")) ## Parsed with column specification: ## cols( ## Subj = col_character(), ## Item = col_integer(), ## Condition = col_character(), ## WF1 = col_character(), ## WF2 = col_character(), ## RT = col_double(), ## Accuracy = col_character(), ## Resp = col_integer() ## )

54 Read Data : Tidy Example > BehavDataTidy ## # A tibble: 3,478 8 ## Subj Item Condition WF1 WF2 RT Accuracy Resp ## <chr> <int> <chr> <chr> <chr> <dbl> <chr> <int> ## 1 NF01 16 FOS F OS C 2 ## 2 NF01 25 MOS M OS C 2 ## 3 NF01 13 FSO F SO C 2 ## 4 NF01 12 MSO M SO C 2 ## 5 NF01 4 FOS F OS C 2 ## 6 NF01 8 FSO F SO C 2 ## 7 NF01 6 MOS M OS X 1 ## 8 NF01 2 FOS F OS C 2 ## 9 NF01 9 MSO M SO C 2 ## 10 NF01 28 MOS M OS C 2 ## #... with 3,468 more rows So what is so tidy about it??? Compare with the dataframe created earlier!

55 Tidy tibble enhanced data frame Most non-tidyverse functions that take a data frame work with tibbles For legacy functions that won t work with a tibble: use as.data.frame() See: > mean(behavdata$rt) ## [1] > mean(behavdatatidy$rt) ## [1] > aggregate(rt ~ WF2, data = BehavData, FUN = mean) ## WF2 RT ## 1 OS ## 2 SO > aggregate(rt ~ WF2, data = BehavDataTidy, FUN = mean) ## WF2 RT ## 1 OS ## 2 SO

56 Should all data be tidy data? Of course not! Other types of non-tidy data have their uses, too. Not every dataset needs to be wrangled into a tidy dataset! Nevertheless, the tidy format works well for most kinds of rectangular data.

57 Data Wrangling and Transformations

58 Why focus on Data Wrangling? Some form of data transformation is almost always inevitable prior to analysis This is usually the most time consuming and error prone part The actual statistical analysis is usually only one or two lines of R code Most analytical functions work best if the data is in a certain format Efficient data wrangling techniques are thus very important

59 Wide-format to long-format conversion Use the gather function from tidyr package of the tidyverse > library(tidyr) > gather(widedata, Trial, Measurement, RT1:RT4) ## # A tibble: 16 5 ## Participant ExpV YorN Trial Measurement ## <chr> <int> <chr> <chr> <dbl> ## 1 S001 1 Y RT ## 2 S002 1 N RT ## 3 S003 2 Y RT ## 4 S004 2 Y RT ## 5 S001 1 Y RT ## 6 S002 1 N RT ## 7 S003 2 Y RT ## 8 S004 2 Y RT ## 9 S001 1 Y RT

60 Long-format to wide-format conversion Use the spread function from tidyr package of the tidyverse > spread(longdata, Trial, Measurement) ## # A tibble: 4 7 ## Participant ExpV YorN RT1 RT2 RT3 RT4 ## * <chr> <int> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 S001 1 Y ## 2 S002 1 N ## 3 S003 2 Y ## 4 S004 2 Y

61 Summarising data: dplyr::summarise, dplyr::count > WideData <- readxl::read_excel("widedata.xlsx", 3, col_names = TRUE) > TidyData <- gather(widedata, Trial, Measurement, RT1:RT4) > OverallMeanRT <- summarise(tidydata, MeanRT = mean(measurement)) > OverallMeanRT ## # A tibble: 1 1 ## MeanRT ## <dbl> ## > N_of_Measurements <- count(tidydata, Participant) > N_of_Measurements ## # A tibble: 4 2 ## Participant n ## <chr> <int> ## 1 S001 4 ## 2 S002 4 ## 3 S003 4

62 The real power and elegance of tidyverse: pipeable functions All functions in the tidyverse share a consistent syntax Therefore the output of one function can be piped to the next function magrittr::%>% Piping avoids having to save temporary intermediate variables Piping results in code that is: simple and more efficient linear, reflecting each simple step that contributed to the complex analysis concise and more legible less error-prone overall

63 Pipe versus no pipe > # The more common non-pipe method ================================= > SomeData_1 <- f1(somedata_0, param1, param2) > SomeData_2 <- f2(somedata_1, bla1, bla2, bla3) > SomeData_3 <- f3(somedata_2, whatever1) > Result_1 <- f4(somedata_3, younameit) > # Another method ================================================== > Result_2 <- f1( f2( f3( f4(somedata_0, param1, param2), + bla1, bla2, bla3), whatever1), younameit) > # And now with the pipe! ========================================== > Result_3 <- + SomeData_0 %>% + f1(param1, param2) %>% + f2(bla1, bla2, bla3) %>% + f3(whatever1) %>% + f4(younameit)

64 Pipe : Example Non-pipe version > WideData <- readxl::read_excel("widedata.xlsx", 3, col_names = TRUE) > TidyData <- gather(widedata, Trial, Measurement, RT1:RT4) > N_of_Measurements <- count(tidydata, Participant) Pipe version > readxl::read_excel("widedata.xlsx", 3, col_names = TRUE) %>% + gather(trial, Measurement, RT1:RT4) %>% + count(meanrt = mean(measurement)) -> N_of_Measurements

65 Grouping data by factor(s): dplyr::group_by > readxl::read_excel("widedata.xlsx", 3, col_names = TRUE) %>% + gather(trial, Measurement, RT1:RT4) %>% + group_by(participant) %>% + count(meanrt = mean(measurement)) ## Source: local data frame [4 x 3] ## Groups: Participant [?] ## ## Participant MeanRT n ## <chr> <dbl> <int> ## 1 S ## 2 S ## 3 S ## 4 S

66 Renaming a column: dplyr::rename > readxl::read_excel("widedata.xlsx", 3, col_names = TRUE) %>% + gather(trial, Measurement, RT1:RT4) -> LongData > > library(magrittr) ## ## Attaching package: 'magrittr' ## The following object is masked from 'package:purrr': ## ## set_names ## The following object is masked from 'package:tidyr': ## ## extract > LongData %<>% rename(rt = Measurement) What s that %<>% thing??? And where did <- go??? Do you see the point?

67 Let s take stock a bit The tidyverse packages share a consistent syntax such that piping is possible Piping with %>% feeds the LHS to the RHS The RHS generates an output to feed further or assign or print or plot Double-piping with %<>% also feeds the LHS to the RHS, but The RHS generates an output and feeds (= assigns) it back to the LHS! There s more: %T% and %$% See

68 and import a new dataset to work further > IntData <- readxl::read_excel("intensity-data.xlsx", col_names = T) > IntData ## # A tibble: ## Participant Note NoteType Time Intensity OnsetInterval ## <chr> <dbl> <chr> <dbl> <dbl> <chr> ## 1 S01 1 NA NA ## 2 S01 2 Note_M ## 3 S01 3 Note_S ## 4 S01 4 Note_S ## 5 S01 5 Note_S ## 6 S01 6 Note_S ## 7 S01 7 Note_M ## 8 S01 8 Note_M ## 9 S01 9 Note_M ## 10 S01 10 Note_M ## #... with 410 more rows

69 Extract columns by name: dplyr::select > IntData %>% select(participant, NoteType, Time, Intensity) ## # A tibble: ## Participant NoteType Time Intensity ## <chr> <chr> <dbl> <dbl> ## 1 S01 NA ## 2 S01 Note_M ## 3 S01 Note_S ## 4 S01 Note_S ## 5 S01 Note_S ## 6 S01 Note_S ## 7 S01 Note_M ## 8 S01 Note_M ## 9 S01 Note_M ## 10 S01 Note_M ## #... with 410 more rows

70 Extract rows that meet certain criteria: dplyr::filter > IntData %>% filter(onsetinterval > 0.75 & OnsetInterval < 0.85) ## # A tibble: 5 6 ## Participant Note NoteType Time Intensity OnsetInterval ## <chr> <dbl> <chr> <dbl> <dbl> <chr> ## 1 S01 16 Note_L ## 2 S07 16 Note_L ## 3 S08 16 Note_L ## 4 S12 16 Note_L ## 5 S14 16 Note_L Notice the use of single & : this is the vectorised AND operator Unlike the scalar AND &&, this applies to all the elements of a column! There s of course the vectorised OR, as opposed to the scalar OR

71 Compute a new column: dplyr::mutate > IntData %>% + select(participant, Intensity) %>% + mutate(sno = row_number(), + GoodBad = if_else(intensity >= 120, "Good", "Bad")) ## # A tibble: ## Participant Intensity SNo GoodBad ## <chr> <dbl> <int> <chr> ## 1 S Good ## 2 S Bad ## 3 S Bad ## 4 S Good ## 5 S Good ## 6 S Good ## 7 S Good ## 8 S Good ## 9 S Good

72 Compute a new column, drop others: dplyr::transmute > IntData %>% + select(participant) %>% + distinct() %>% # Get rid of duplicate rows + transmute(subject = Participant, + NewSubjID = paste("drummer", row_number() + 100, sep="")) ## # A tibble: 14 2 ## Subject NewSubjID ## <chr> <chr> ## 1 S01 Drummer101 ## 2 S02 Drummer102 ## 3 S03 Drummer103 ## 4 S04 Drummer104 ## 5 S05 Drummer105 ## 6 S06 Drummer106 ## 7 S07 Drummer107 ## 8 S08 Drummer108

73 Exercise 1 Add a new column with the mean of the OnsetInterval. This mean should be on a per Participant and per NoteType basis! Before attempting to do this, see if mean(intdata$onsetinterval) works Have a very charful look at the output of typing IntData Do you see a / the problem?

74 Solution : Know your data well OnsetInterval contains the string NA in some cases So read_excel assumed that this column is made up of strings! Type readxl::read_excel("intensity-data.xlsx", col_names = T) Study what you see on the console Now type help(read_excel) to see what could be done

75 Exercise 1 : Solution > IntData <- readxl::read_excel("intensity-data.xlsx", col_names = T, + na = "NA") # <NA> is "NA" in the input vector! > IntData %>% select(-time, -Intensity) %>% # - => drop these vectors + group_by(participant, NoteType) %>% + mutate(oimean = mean(onsetinterval)) ## Source: local data frame [420 x 5] ## Groups: Participant, NoteType [56] ## ## Participant Note NoteType OnsetInterval OIMean ## <chr> <dbl> <chr> <dbl> <dbl> ## 1 S01 1 <NA> NA NA ## 2 S01 2 Note_M ## 3 S01 3 Note_S ## 4 S01 4 Note_S ## 5 S01 5 Note_S ## 6 S01 6 Note_S

76 Exercise 2 Add a new column with the name AdjustedTime This should be the Time for the current Note minus the Time for Note 1. This value should be on a per Participant basis! Do you need something specific to solve this???

77 Solution : Extract first value by position: dplyr::first > IntData %>% + group_by(participant) %>% + mutate(timebegin = first(time)) %>% + select(participant, Time, TimeBegin) ## Source: local data frame [420 x 3] ## Groups: Participant [14] ## ## Participant Time TimeBegin ## <chr> <dbl> <dbl> ## 1 S ## 2 S ## 3 S ## 4 S ## 5 S ## 6 S ## 7 S

78 Exercise 2 : Solution > IntData %>% + group_by(participant) %>% + mutate(timebegin = first(time)) %>% + select(participant, Time, TimeBegin) %>% + mutate(adjustedtime = Time - TimeBegin) ## Source: local data frame [420 x 4] ## Groups: Participant [14] ## ## Participant Time TimeBegin AdjustedTime ## <chr> <dbl> <dbl> <dbl> ## 1 S ## 2 S ## 3 S ## 4 S ## 5 S ## 6 S

79 ..

80 Some resources R Cheatsheets : R for Data Science : Cookbook for R : Graphs with ggplot2 : Tidy Text Mining : Quick R : Advanced R :

81 Thanks! > Thanks <- "Thanks for your attention!" > Thanks ## [1] "Thanks for your attention!" > # Command to quit from R Console > q()

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

Data Import and Formatting

Data Import and Formatting Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data

More information

Assignment 5.5. Nothing here to hand in

Assignment 5.5. Nothing here to hand in Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse:

More information

Data Wrangling in the Tidyverse

Data Wrangling in the Tidyverse Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction

More information

Data Import and Export

Data Import and Export Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you

More information

Data Input/Output. Introduction to R for Public Health Researchers

Data Input/Output. Introduction to R for Public Health Researchers Data Input/Output Introduction to R for Public Health Researchers Common new user mistakes we have seen 1. Working directory problems: trying to read files that R "can't find" RStudio can help, and so

More information

Loading Data into R. Loading Data Sets

Loading Data into R. Loading Data Sets Loading Data into R Loading Data Sets Rather than manually entering data using c() or something else, we ll want to load data in stored in a data file. For this class, these will usually be one of three

More information

The Tidyverse BIOF 339 9/25/2018

The Tidyverse BIOF 339 9/25/2018 The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri)

R: BASICS. Andrea Passarella. (plus some additions by Salvatore Ruggieri) R: BASICS Andrea Passarella (plus some additions by Salvatore Ruggieri) BASIC CONCEPTS R is an interpreted scripting language Types of interactions Console based Input commands into the console Examine

More information

Dplyr Introduction Matthew Flickinger July 12, 2017

Dplyr Introduction Matthew Flickinger July 12, 2017 Dplyr Introduction Matthew Flickinger July 12, 2017 Introduction to Dplyr This document gives an overview of many of the features of the dplyr library include in the tidyverse of related R pacakges. First

More information

Session 3 Nick Hathaway;

Session 3 Nick Hathaway; Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................

More information

Data Input/Output. Introduction to R for Public Health Researchers

Data Input/Output. Introduction to R for Public Health Researchers Data Input/Output Introduction to R for Public Health Researchers Common new user mistakes we have seen 1. Working directory problems: trying to read files that R can t find RStudio can help, and so do

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution

More information

Subsetting, dplyr, magrittr Author: Lloyd Low; add:

Subsetting, dplyr, magrittr Author: Lloyd Low;  add: Subsetting, dplyr, magrittr Author: Lloyd Low; Email add: wai.low@adelaide.edu.au Introduction So you have got a table with data that might be a mixed of categorical, integer, numeric, etc variables? And

More information

A Whistle-Stop Tour of the Tidyverse

A Whistle-Stop Tour of the Tidyverse A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available

More information

Lab2 Jacob Reiser September 30, 2016

Lab2 Jacob Reiser September 30, 2016 Lab2 Jacob Reiser September 30, 2016 Introduction: An R-Blogger recently found a data set from a project of New York s Public Library called What s on the Menu, which can be found at https://www.r-bloggers.com/a-fun-gastronomical-dataset-whats-on-the-menu/.

More information

Data Manipulation. Module 5

Data Manipulation.   Module 5 Data Manipulation http://datascience.tntlab.org Module 5 Today s Agenda A couple of base-r notes Advanced data typing Relabeling text In depth with dplyr (part of tidyverse) tbl class dplyr grammar Grouping

More information

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics Introduction to R: Using R for Statistics and Data Analysis BaRC Hot Topics http://barc.wi.mit.edu/hot_topics/ Why use R? Perform inferential statistics (e.g., use a statistical test to calculate a p-value)

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 The purpose of this handout is to lead you through a simple exercise using the R computing language. It is essentially an assignment, although there will be nothing to hand in.

More information

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012 An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

SISG/SISMID Module 3

SISG/SISMID Module 3 SISG/SISMID Module 3 Introduction to R Ken Rice Tim Thornton University of Washington Seattle, July 2018 Introduction: Course Aims This is a first course in R. We aim to cover; Reading in, summarizing

More information

Package tidystats. May 6, 2018

Package tidystats. May 6, 2018 Title Create a Tidy Statistics Output File Version 0.2 Package tidystats May 6, 2018 Produce a data file containing the output of statistical models and assist with a workflow aimed at writing scientific

More information

Mails : ; Document version: 14/09/12

Mails : ; Document version: 14/09/12 Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary

More information

Session 1 Nick Hathaway;

Session 1 Nick Hathaway; Session 1 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents R Basics 1 Variables/objects.............................................. 1 Functions..................................................

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Financial Econometrics Practical

Financial Econometrics Practical Financial Econometrics Practical Practical 3: Plotting in R NF Katzke Table of Contents 1 Introduction 1 1.0.1 Install ggplot2................................................. 2 1.1 Get data Tidy.....................................................

More information

Tidy Evaluation. Lionel Henry and Hadley Wickham RStudio

Tidy Evaluation. Lionel Henry and Hadley Wickham RStudio Tidy Evaluation Lionel Henry and Hadley Wickham RStudio Tidy evaluation Our vision for dealing with a special class of R functions Usually called NSE but we prefer quoting functions Most interesting language

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant.

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant. BIMM 143 Data analysis with R Lecture 4 Barry Grant http://thegrantlab.org/bimm143 Recap From Last Time: Substitution matrices: Where our alignment match and mis-match scores typically come from Comparing

More information

R Basics / Course Business

R Basics / Course Business R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats

More information

Introduction to R and the tidyverse

Introduction to R and the tidyverse Introduction to R and the tidyverse Paolo Crosetto Paolo Crosetto Introduction to R and the tidyverse 1 / 58 Lecture 3: merging & tidying data Paolo Crosetto Introduction to R and the tidyverse 2 / 58

More information

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT NAVAL POSTGRADUATE SCHOOL LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT Statistics (OA3102) Lab #2: Sampling, Sampling Distributions, and the Central Limit Theorem Goal: Use R to demonstrate sampling

More information

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also

More information

1 Introduction to Matlab

1 Introduction to Matlab 1 Introduction to Matlab 1. What is Matlab? Matlab is a computer program designed to do mathematics. You might think of it as a super-calculator. That is, once Matlab has been started, you can enter computations,

More information

Introduction to R. Andy Grogan-Kaylor October 22, Contents

Introduction to R. Andy Grogan-Kaylor October 22, Contents Introduction to R Andy Grogan-Kaylor October 22, 2018 Contents 1 Background 2 2 Introduction 2 3 Base R and Libraries 3 4 Working Directory 3 5 Writing R Code or Script 4 6 Graphical User Interface 4 7

More information

CS1114: Matlab Introduction

CS1114: Matlab Introduction CS1114: Matlab Introduction 1 Introduction The purpose of this introduction is to provide you a brief introduction to the features of Matlab that will be most relevant to your work in this course. Even

More information

Introduction to Statistics using R/Rstudio

Introduction to Statistics using R/Rstudio Introduction to Statistics using R/Rstudio R and Rstudio Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations sent) R on Windows R on MACs

More information

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA Session 26 TS, Predictive Analytics: Moving Out of Square One Moderator: Jean-Marc Fix, FSA, MAAA Presenters: Jean-Marc Fix, FSA, MAAA Jeffery Robert Huddleston, ASA, CERA, MAAA Predictive Modeling: Getting

More information

CITS2401 Computer Analysis & Visualisation

CITS2401 Computer Analysis & Visualisation FACULTY OF ENGINEERING, COMPUTING AND MATHEMATICS CITS2401 Computer Analysis & Visualisation SCHOOL OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING Topic 3 Introduction to Matlab Material from MATLAB for

More information

Incident Response Programming with R. Eric Zielinski Sr. Consultant, Nationwide

Incident Response Programming with R. Eric Zielinski Sr. Consultant, Nationwide Incident Response Programming with R Eric Zielinski Sr. Consultant, Nationwide About Me? Cyber Defender for Nationwide Over 15 years in Information Security Speaker at various conferences FIRST, CEIC,

More information

Lab 1: Getting started with R and RStudio Questions? or

Lab 1: Getting started with R and RStudio Questions? or Lab 1: Getting started with R and RStudio Questions? david.montwe@ualberta.ca or isaacren@ualberta.ca 1. Installing R and RStudio To install R, go to https://cran.r-project.org/ and click on the Download

More information

social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40

social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 welcome Course Description The objective of this course is to learn how to

More information

Data Input/Output. Andrew Jaffe. January 4, 2016

Data Input/Output. Andrew Jaffe. January 4, 2016 Data Input/Output Andrew Jaffe January 4, 2016 Before we get Started: Working Directories R looks for files on your computer relative to the working directory It s always safer to set the working directory

More information

Exercise 1-Solutions TMA4255 Applied Statistics

Exercise 1-Solutions TMA4255 Applied Statistics Exercise 1-Solutions TMA4255 Applied Statistics January 16, 2017 Intro 0.1 Start MINITAB Start MINITAB on your laptop, or remote desktop to cauchy.math.ntnu.no and log in with win-ntnu-no\yourusername

More information

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.

More information

Lecture 12: Data carpentry with tidyverse

Lecture 12: Data carpentry with tidyverse http://127.0.0.1:8000/.html Lecture 12: Data carpentry with tidyverse STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University options(repr.plot.width=5, repr.plot.height=3)

More information

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics Introduction to R: Using R for Statistics and Data Analysis BaRC Hot Topics http://barc.wi.mit.edu/hot_topics/ Why use R? Perform inferential statistics (e.g., use a statistical test to calculate a p-value)

More information

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Lecture 5. Essential skills for bioinformatics: Unix/Linux Lecture 5 Essential skills for bioinformatics: Unix/Linux UNIX DATA TOOLS Text processing with awk We have illustrated two ways awk can come in handy: Filtering data using rules that can combine regular

More information

Lecture 1: Getting Started and Data Basics

Lecture 1: Getting Started and Data Basics Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate

More information

Лекция 4 Трансформация данных в R

Лекция 4 Трансформация данных в R Анализ данных Лекция 4 Трансформация данных в R Гедранович Ольга Брониславовна, старший преподаватель кафедры ИТ, МИУ volha.b.k@gmail.com 2 Вопросы лекции Фильтрация (filter) Сортировка (arrange) Выборка

More information

Statistics for Biologists: Practicals

Statistics for Biologists: Practicals Statistics for Biologists: Practicals Peter Stoll University of Basel HS 2012 Peter Stoll (University of Basel) Statistics for Biologists: Practicals HS 2012 1 / 22 Outline Getting started Essentials of

More information

The Average and SD in R

The Average and SD in R The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the

More information

UAccess ANALYTICS Next Steps: Working with Bins, Groups, and Calculated Items: Combining Data Your Way

UAccess ANALYTICS Next Steps: Working with Bins, Groups, and Calculated Items: Combining Data Your Way UAccess ANALYTICS Next Steps: Working with Bins, Groups, and Calculated Items: Arizona Board of Regents, 2014 THE UNIVERSITY OF ARIZONA created 02.07.2014 v.1.00 For information and permission to use our

More information

SESSION 9: Data Entry

SESSION 9: Data Entry Data Entry 74 SESSION 9: Data Entry 9.1 Introduction and general principles for entering data using Excel Excel is a powerful tool to extract meaningful information and insights from the data you have

More information

SECTION 1: INTRODUCTION. ENGR 112 Introduction to Engineering Computing

SECTION 1: INTRODUCTION. ENGR 112 Introduction to Engineering Computing SECTION 1: INTRODUCTION ENGR 112 Introduction to Engineering Computing 2 Course Overview What is Programming? 3 Programming The implementation of algorithms in a particular computer programming language

More information

TUTORIAL. HCS- Tools + Scripting Integrations

TUTORIAL. HCS- Tools + Scripting Integrations TUTORIAL HCS- Tools + Scripting Integrations HCS- Tools... 3 Setup... 3 Task and Data... 4 1) Data Input Opera Reader... 7 2) Meta data integration Expand barcode... 8 3) Meta data integration Join Layout...

More information

Getting Started. Slides R-Intro: R-Analytics: R-HPC:

Getting Started. Slides R-Intro:   R-Analytics:   R-HPC: Getting Started Download and install R + Rstudio http://www.r-project.org/ https://www.rstudio.com/products/rstudio/download2/ TACC ssh username@wrangler.tacc.utexas.edu % module load Rstats %R Slides

More information

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0 Type Package Title Tidy Statistical Inference Version 0.3.0 Package infer July 11, 2018 The objective of this package is to perform inference using an epressive statistical grammar that coheres with the

More information

Tabular data management. Jennifer Bryan RStudio, University of British Columbia

Tabular data management. Jennifer Bryan RStudio, University of British Columbia Tabular data management Jennifer Bryan RStudio, University of British Columbia @JennyBryan @jennybc data cleaning data wrangling descriptive stats inferential stats reporting data cleaning data wrangling

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

STAT 113: R/RStudio Intro

STAT 113: R/RStudio Intro STAT 113: R/RStudio Intro Colin Reimer Dawson Last Revised September 1, 2017 1 Starting R/RStudio There are two ways you can run the software we will be using for labs, R and RStudio. Option 1 is to log

More information

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software. Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means

More information

Assignment 0. Nothing here to hand in

Assignment 0. Nothing here to hand in Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very

More information

Introduction to R. Introduction to Econometrics W

Introduction to R. Introduction to Econometrics W Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,

More information

ECO375 Tutorial 1 Introduction to Stata

ECO375 Tutorial 1 Introduction to Stata ECO375 Tutorial 1 Introduction to Stata Matt Tudball University of Toronto Mississauga September 14, 2017 Matt Tudball (University of Toronto) ECO375H5 September 14, 2017 1 / 25 What Is Stata? Stata is

More information

UNIT 4. Research Methods in Business

UNIT 4. Research Methods in Business UNIT 4 Preparing Data for Analysis:- After data are obtained through questionnaires, interviews, observation or through secondary sources, they need to be edited. The blank responses, if any have to be

More information

Spectroscopic Analysis: Peak Detector

Spectroscopic Analysis: Peak Detector Electronics and Instrumentation Laboratory Sacramento State Physics Department Spectroscopic Analysis: Peak Detector Purpose: The purpose of this experiment is a common sort of experiment in spectroscopy.

More information

1 Introduction to Using Excel Spreadsheets

1 Introduction to Using Excel Spreadsheets Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)

More information

An Introduction to MATLAB

An Introduction to MATLAB An Introduction to MATLAB Day 1 Simon Mitchell Simon.Mitchell@ucla.edu High level language Programing language and development environment Built-in development tools Numerical manipulation Plotting of

More information

STATA 13 INTRODUCTION

STATA 13 INTRODUCTION STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA

More information

Python Programming Exercises 1

Python Programming Exercises 1 Python Programming Exercises 1 Notes: throughout these exercises >>> preceeds code that should be typed directly into the Python interpreter. To get the most out of these exercises, don t just follow them

More information

AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS

AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS 24 January 2017 Stefan Breet breet@rsm.nl www.stefanbreet.com TODAY What is R? How to use R? The Basics How to use R? The Data Analysis Process WHAT IS R? AN

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3A Visualizing Data By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to visualize data. If you intend to

More information

JME Language Reference Manual

JME Language Reference Manual JME Language Reference Manual 1 Introduction JME (pronounced jay+me) is a lightweight language that allows programmers to easily perform statistic computations on tabular data as part of data analysis.

More information

Modeling in the Tidyverse. Max Kuhn (RStudio)

Modeling in the Tidyverse. Max Kuhn (RStudio) Modeling in the Tidyverse Max Kuhn (RStudio) Modeling in R R has always had a rich set of modeling tools that it inherited from S. For example, the formula interface has made it simple to specify potentially

More information

Introduction to R Programming

Introduction to R Programming Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data

More information

Lecture 4 CSE July 1992

Lecture 4 CSE July 1992 Lecture 4 CSE 110 6 July 1992 1 More Operators C has many operators. Some of them, like +, are binary, which means that they require two operands, as in 4 + 5. Others are unary, which means they require

More information

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Contents 1 Suggested ahead activities 1 2 Introduction to R 2 2.1 Learning Objectives......................................... 2 3 Starting

More information

CSSS 512: Lab 1. Logistics & R Refresher

CSSS 512: Lab 1. Logistics & R Refresher CSSS 512: Lab 1 Logistics & R Refresher 2018-3-30 Agenda 1. Logistics Labs, Office Hours, Homeworks Goals and Expectations R, R Studio, R Markdown, L ATEX 2. Time Series Data in R Unemployment in Maine

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

Source df SS MS F A a-1 [A] [T] SS A. / MS S/A S/A (a)(n-1) [AS] [A] SS S/A. / MS BxS/A A x B (a-1)(b-1) [AB] [A] [B] + [T] SS AxB

Source df SS MS F A a-1 [A] [T] SS A. / MS S/A S/A (a)(n-1) [AS] [A] SS S/A. / MS BxS/A A x B (a-1)(b-1) [AB] [A] [B] + [T] SS AxB Keppel, G. Design and Analysis: Chapter 17: The Mixed Two-Factor Within-Subjects Design: The Overall Analysis and the Analysis of Main Effects and Simple Effects Keppel describes an Ax(BxS) design, which

More information

Data types and structures

Data types and structures An introduc+on to Data types and structures Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 3 Review GeFng started with R Crea+ng Objects Data types in R Data structures in R

More information

TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA

TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA 1 TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA Notes adapted from Introduction to Computing and Programming with Java: A Multimedia Approach by M. Guzdial and B. Ericson, and instructor materials prepared

More information

An Introduction to Stata

An Introduction to Stata An Introduction to Stata Instructions Statistics 111 - Probability and Statistical Inference Jul 3, 2013 Lab Objective To become familiar with the software package Stata. Lab Procedures Stata gives us

More information

Stat 302 Statistical Software and Its Applications SAS: Data I/O

Stat 302 Statistical Software and Its Applications SAS: Data I/O Stat 302 Statistical Software and Its Applications SAS: Data I/O Yen-Chi Chen Department of Statistics, University of Washington Autumn 2016 1 / 33 Getting Data Files Get the following data sets from the

More information

An Introduction to MATLAB See Chapter 1 of Gilat

An Introduction to MATLAB See Chapter 1 of Gilat 1 An Introduction to MATLAB See Chapter 1 of Gilat Kipp Martin University of Chicago Booth School of Business January 25, 2012 Outline The MATLAB IDE MATLAB is an acronym for Matrix Laboratory. It was

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Introduction to Minitab The interface for Minitab is very user-friendly, with a spreadsheet orientation. When you first launch Minitab, you will see

More information

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018 Lecture 09 Graphics::ggplot I 2018 R Teaching Team October 1, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3. R Markdown: The

More information

Instructions on Adding Zeros to the Comtrade Data

Instructions on Adding Zeros to the Comtrade Data Instructions on Adding Zeros to the Comtrade Data Required: An excel spreadshheet with the commodity codes for all products you want included. In this exercise we will want all 4-digit SITC Revision 2

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC

If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC sample). All examples use your Workshop directory (e.g. /Users/peggy/workshop)

More information

Package furniture. November 10, 2017

Package furniture. November 10, 2017 Package furniture November 10, 2017 Type Package Title Furniture for Quantitative Scientists Version 1.7.2 Date 2017-10-16 Maintainer Tyson S. Barrett Contains three main

More information

A QUICK INTRODUCTION TO MATLAB

A QUICK INTRODUCTION TO MATLAB A QUICK INTRODUCTION TO MATLAB Very brief intro to matlab Basic operations and a few illustrations This set is independent from rest of the class notes. Matlab will be covered in recitations and occasionally

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Welcome to Workshop: Introduction to R, Rstudio, and Data

Welcome to Workshop: Introduction to R, Rstudio, and Data Welcome to Workshop: Introduction to R, Rstudio, and Data I. Please sign in on the sign in sheet (this is so I can follow up to get feedback). II. If you haven t already, download R and Rstudio, install

More information