R Commander Tutorial Introduction R is a powerful, freely available software package that allows analyzing and graphing data. However, for somebody who does not frequently use statistical software packages, the big drawback of R is that it is command line based and thus not very intuitive to use. For users who do not use statistical software very often, R commander might be a good alternative. The R commander is a software package that allows running R from a graphical user interface. This makes analyzing and graphing your data in R a lot easier. Objective The objective of this tutorial is to give you a basic introduction to R Commander and how to use it to run basic statistics and create graphs. 1. Start the R Commander Open R by either clicking on the R icon on your desktop or by navigating to R in your programs folder. Once you opened R, go to Packages/Load Packages on the R menu bar and find Rcmdr in the R packages list (R packages are similar to software programs that have been written by different contributors for R). Highlight Rcmdr by clicking on it and click OK. R might give you a warning message. If so, just ignore it and click No. The R Commander console should now appear on your screen and you are ready to run some statistics and make some graphs in R. 1
2. Reading your data into R After you come back from the field, your notebook shows the following data recordings: Now you want to create a digital copy of your data. To do this, start your computer and type the data table into Excel. Very important: Make sure your column headings do not have any spaces e.g., write soil_moisture instead of Soil Moisture since spaces confuse most statistics programs such as R. To be able to read your dataset into R Commander for statistical analysis, you have to save the data table as either a comma delimited (.csv file, see figure below) or tab delimited file (.txt file, see figure below) on your hard drive (To save the file as a.csv or.txt file in Excel, go to Save As/Other Formats and select CSV (Comma delimited)(*.csv) or Text (Tab delimited) (*.txt) from the Save as type pull down menu). Make 2
sure you remember where you save the data on your computer so you can navigate to the dataset later on. Now we are ready to read our data into R Commander. On the R Commander menu bar, go to Data/Import data and select from text file, clipboard, or URL which should bring up the window below. Make the same selections as shown in the window below e.g., name your data set cover_moisture and select either Commas (if you saved your file as a comma delimited (*.csv) file) or Tabs (if you saved your file as a tab delimited (*.txt) file). Click OK and a window appears that allows you to navigate to your data file. Once you navigated to your data file, highlight it by clicking on it and click Open. You can now view your data by clicking on View data set on the R Commander menu bar. 3
You can also directly enter your data into R by selecting Data from the R Commander menu bar and clicking on New dataset.this will bring up the following window. The Data Editor window appears that allows you to directly enter your data into R. By clicking on the column header, you can change the variable name of each column (e.g. change var1 to location, var2 to cover, and var3 to soil moisture). The variable editor also allows you to select the type of your variables you are entering. Since you are entering numeric values, select numeric under variable type. Type in your data as shown below. 3. Summary statistics To get some summary statistics of your data, go to Statistics/Summaries and select Numerical summaries. Now you should see the following window: 4
Pick cover and soil.moisture (Note: to select more than one variable you have to hold down the Ctrl key) and click OK. A summary table will appear that shows the mean, standard deviation, and the 0, 0.25, 0.50, 0.75, 1 quantiles of the cover and soil.moisture data. 4. Scatterplot To see if there is a relationship between cover and soil moisture it is a good idea to first look at a scatterplot of the data. To create a scatterplot, go to Graphs on the R Commander menu bar and select Scatterplot. This will bring up a table. Select cover as you x-variable and soil moisture as your y- variable. Under x-axis label and y-axis label, label your x- and y-axis Cover (%) and Soil Moisture (%), respectively. Under Options, deselect Marginal boxplots, Smooth line, and Show spread. Next, click OK and a scatterplot will appear (Important: Make sure you highlight the R Console by clicking on it to be able to see the scatterplot). You can save the scatterplot (or any other plot you create) by clicking on the plot (Important: if you do not select the plot you won t be able to save it) and on the R menu bar (Note: R menu bar and not the R commander menu bar) going to File/Save as/jpeg and click on 100% quality. This will bring up a window that allows you to specify the location on your computer where you want to save the plot as a Jpeg image. 5. Fitting a linear regression model The scatterplot above shows us that there is a positive relationship between soil moisture and cover. However, the scatterplot does not tell us how strong the relationship is, if the relationship is significant etc. To get this information we do have to fit a linear regression model. To fit a linear regression model go to Statistics/Fit models on the R Commander menu bar and select Linear model. Select soil 5
moisture as your response variable (aka y- variable or dependent variable) and cover as your explanatory variable (aka x-variable or independent variable) and click OK. The following output will appear in the Output Window of the R Commander: We will talk in class how to interpret the output table (e.g. what do those numbers mean).to check the basic model diagnostics for the linear model you just fit, go to Models/Graphs on the R Commander menu bar and select Basic diagnostic plots. This brings up the following window (We will discuss in class how to interpret the model diagnostic plot): 6
6. Fitting multiple regression models In this part of the tutorial you learn how to fit a multiple regression model. Your hypothesis is that air temperature, solar radiation, and wind speed are significant predictors of ozone. To test this hypothesis, you collected the data called airquality that are available for download from our class website (http://ecosensing.org/teaching/css-560/digital-library/data) (Note: The data was taken from Daalgard, 2002). Let's import the data into R commander and call the dataset airquality (if you can't remember how to import data please refer to section 2 in the document). Let's take a look at the data to familiarize ourselves with the data by selecting airquality from the Data set dropdown menu. Next, let's plot the relationships between the different variables in the dataset. To do this, make the R Console active by clicking on it and type the following command into the R Console command line prompt: pairs(airqualit). 7
Now you should see the following figure: This is how you read the figure: It looks like there is some sort of relationship between ozone and temperature and ozone and wind. However, there seems to be no relationship between ozone and solar radiation. OK - let's now fit a multiple regression model to test if solar radiation, wind, and temperature are significant predictors of ozone. To fit a multiple regression model let's go to Statistics/Fit models... on 8
the R Commander menu bar and select Linear model.... A window appears that should be somewhat familiar to you from section 5 of this tutorial. The model you want to fit basically says that ozone is a function of solar radiation, air temperature, and wind. Mathematically, we can write this model as follows: Ozone ~ Solar.R + Temp + Wind [1] After typing model [1] in the appropriate section of the linear model window (see above) click OK. You should now see the following output: 9
Let's also take a look at the model diagnostics: We will discuss the interpretation of the model output as well the interpretation of the model diagnostics in more detail in class. 7. Paired t-test Next, we will to conduct a paired t-test to see if there is a statistical significant difference in soil moisture before and after a rain event. The data for the paired t-test called paired_t_test is available for download from our class websiste (http://ecosensing.org/teaching/css-560/digital-library/data). Import the data into R by following the steps you learned about at the beginning of this tutorial and name the dataset soil_moisture (Hint: Open the paired_t_test.txt file in a text editor. You will see that the paired_t_test.txt file is a tab delimited file and not comma delimited file. You need that information to properly import the data into R). Before conducting a paired t-test (and any other t-test) it might be a good idea to look at a boxplot of the data first. To do this you do have to stack your data first (you just re-arranging the data so they are in a format that can be used by the computer to create a boxplot of your data for more detail on stacking, please refer to the Appendix of this tutorial) by going to Data/Active data set on the R Commander menu bar and click on Stack variables in active data set. 10
You should now see the Stack Variables window shown below. Select both the soil.moisture.after and soil.moisture.before variables and name the stacked dataset stacked_soil_moisture. Keep the rest of the default settings as shown below and click OK. Next, go to Graphs/Boxplots on the R Commander menu bar. In the window that pops up select Plot by groups and group your variables by factor and click OK. Now you should see the following boxplot: Based on the boxplot, do you think the soil moisture changed significantly after the rain event? After visually looking at the data we are ready to run a paired t-test. To do this, let s go back to our original, unstacked dataset by going to Data set on the R Commander menu bar and selecting soil_moisture. Click OK. 11
Next, go to Statistics/Means on the R Commander menu bar and select Paired t-test. Next, select soil.moisture.before as your first variable and soil.moisture.after as you second variable. Keep the rest at the default settings as shown below. After clicking OK you should get the following output. We will discuss in class how to interpret the output. 8. Two-sample t-test In this section of the tutorial we will learn how to conduct a two sample t-test. We want to test the following hypothesis: soil ph of the non treated stand in the Ponderosa State Park is statistically 12
significantly different than the soil ph in the treated part of the Park. The hypothetical dataset called ph that was collected is available for download from our class website (http://ecosensing.org/teaching/css- 560/digital-library/data). Let's import the data into the R commander and create a boxplot of the data as we learned in section 7 of this tutorial (remember: you first have to stack the data in order to create the boxplot below. For more details please refer to section 7 of this tutorial). OK - it looks like the soil ph in the non treated part of the forest is lower than in the treated part. Let's now do a two-sample t-test to see if the soil ph are statistically significantly different from each other. To do this, keep your stacked ph dataset active and go to the R Commander menu bar and select Statistics/Means and select Independent samples t-test... (in case Independent samples t-test... option is greyed out make sure you i) stacked the ph dataset and ii) that the stacked ph dataset is the active dataset). 13
The window that now appears should look similar to the one below: Keep the default settings and click OK. Now you should see the following output: We will discuss in the class how to interpret the output. 9. Customize your graphs If you want to customize your figures, you do have to do a little bit of programming. For example, the boxplot you creaed in section 8 of this tutorial is associated with the following line of code in your R Commander script window: boxplot(variable ~ factor, ylab = "ph", xlab="factor", data = ph_stacked) 14
We can now change this line of code some to make the boxplot a little nicer. For example, we could type the following into the R Console: boxplot(variable ~ factor, ylab = "Soil ph", xlab = "", names = c("treated Forest", "Untreated Forest"), data = ph_stacked) If you write the code above into the R Console and hit enter you should see the following boxplot: It becomes clear that you need some R programming experience and knowledge to change the appearance of the figure beyond what the R Commander allows you to do. If you do want to learn more about how to program in R, the R website is a good starting point (http://www.r-project.org/ ) as well as Peter Dalgaard's book "Introductory Statistics in R". 10. Closing R Commander and R To close the R Commander and R, go to File/Exit and select From Commander and R. 15
Next, the R Commander will ask you if you want to exit the program. Click OK. Next it will ask you if you want to save the script file and the output file. Click No in both cases. Congratulations - you successfully finished the R Commander tutorial. Other resources Getting started with the R Commander. You can find a pdf of this tutorial on our class website (http://ecosensing.org/teaching/css-560/digital-library/tutorials). If you want to learn more about the R commander I recommend you working through this tutorial. Literature cited Dalgaard, Peter. 2002. Introductory Statistics in R. Springer Science and Business Media, Inc. Important: If you used a MOSS computer for this tutorial, please make sure you delete all the files you created from the computer after you are done with the tutorial. Thanks! Disclaimer Always consult a trained statistician to validate the correctness of the statistical approach you are taking. Please e-mail any suggestions of how to potentially improve this document to Jan Eitel (jeitel@ uidaho.edu). Use of trade names does not constitute an official endorsement by the McCall Outdoor Science School. 16
Appendix A) Data Stacking what s that? When you stack the data in R Commander, you are simply re-arranging the data so it can be properly read in by R. So what happens when you stack the data? Well, you simply re-arrange the data (e.g., in the example below dissolved oxygen, see Figure I) so all your data are within a single column (see Figure II) and you create a second column with factors (e.g., 1, 2 etc.) that let the R know where each of the observation originated from (e.g., in the example below, all data associated with 1 originated from plot 1, and all data associated with 2 originated from plot 2). Figure I. Unstacked data. Column one (entitled DO_oxygen_plot1 ) shows dissolved oxygen values collected at plot 1 and column two (entitled DO_oxygen_plot1 ) shows dissolved oxygen values collected at plot 2. Figure II. Stacked data. Column one (entitled Data ) shows all the collected dissolved oxygen data (here in this example from plot 1 and 2). Column two (entitled Factor ). 17