An Introductory Guide to R
|
|
- Christian Thomas
- 5 years ago
- Views:
Transcription
1 An Introductory Guide to R By Claudia Mahler
2 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics 12 Correlation 14 Regression 15 T-tests 18 Basic Graphics 19 Please note: Everything in this guide is presented as I learned it or figured it out on my own. For many operations listed, there may be other techniques that might be easier or more intuitive to use. Please note that I am writing this as someone who had previously used SAS and has used R for a little over two years now. As titled, this is a very introductory guide, and as such there are many things not covered. Other operations can easily be found in other guides, but I hope that what I provide here will be enough to get you started with R. Enjoy!
3 2 Installing and Operating R The main website for R is You can find links for download as well as links to manuals and other documentations. Once you ve installed it, operating R is simple. The window into which you will type commands, the R Console, is the only window open once you start up the program* (if you ve used SAS, you will notice the difference between the three windows used in that program and the one window used in this). The only other windows that appear are the graphics window when you create a plot and a help window when you look something up. *Note: you can also choose the built-in script editor (File New Script) into which you can type commands and send them to the workspace via Ctrl + R, but I find that once you get to a level where you re comfortable enough with R to start using this, an additional code editor works better and is easier to use. For more info, see the Optional R Editors section). Basic Operations R essentially runs what you type. What you type is displayed in red, and what is output by R is blue. I ll use these colors throughout the rest of the guide to help show you what s going on when I use R examples. The drop-down menus are mainly for locating help, finding additional libraries, and customizing the layout of the R window. In this respect, it s pretty different from SPSS. If you ever want to clear the window of all the commands typed so far, simply press Ctrl + L, or right click in the command space and select clear window. Though everything typed so far will be cleared from your screen, pressing the up and down arrows will allow you to scroll line by line through what has been previously typed in your R session. When you quit R, you will be asked whether or not you would like to save your workspace. I ve never found it especially helpful to do this; if I want to retrieve code, I either copy it into a Notepad file for later use or save it in an editor (to be discussed later). Saving a workspace allows you to re-access everything you had typed in that workspace. When you restore a workspace in later sessions, you should see previous workspace restored when you open R again. Downloading libraries Many useful statistical operations in R require the installation of additional libraries, or packages. For example, in order to perform more complex linear regression analyses, you will need to install the faraway library. To install specific libraries, open R, go to the Packages menu at the top of the screen, then click on Install Package(s). You will need to select a mirror off of which to download the package. After this, you will be directed to a list of all available packages. Since many of the names are not self-explanatory, it helps to know what specific library you need. Once you found the library you want, double click it and the download will automatically begin. It is important to remember that in order to utilize a downloaded library, you will have to load it in your R session. Suppose I had just downloaded the library sem (a library used for structural equation modeling), I would type library(sem) or
4 3 require(sem). This has to be done every time you start a new R session and want to use that library, but it only has to be done once per session. Built-In Datasets R has quite a few built-in datasets available. I often use them to test out new functions or to practice creating graphics. I will use several of the built-in datasets in this guide, mainly for the convenience as well as to allow anyone who wants to replicate results to have easy access to the data used. If you ever want to view a list and brief descriptions of these built-in datasets, type?datasets. This will bring up a new help window in which you can view the descriptions of these datasets. Optional R Editors If you choose to use R as your primary tool for analyses, it could be beneficial to download a supported code editor to make things easier for yourself. Code editors essentially function as a notepad in which you can enter and modify R code before actually running any analyses in the R workspace. Many editors color-code what you enter as well, so they can make it easy for you to better decipher and read your own code. My editor of choice is Tinn-R, which runs on Windows. It is free and is available at as well as many other places. Once it is installed and running, you can directly open the R workspace from the editor, and send code line-by-line from Tinn-R to the workspace. I highly recommend using an editor if you re going to be working with R frequently. Other editors for both Windows and Mac are available as well.
5 4 Basics Now we get into the basics of how to actually use R. Named storage, or assigning variable names to data, is the way R stores data and the results of calculations. For example: > x <- 36 > x [1] 36 Unless you reassign x to mean something else, every time you type x, it will represent 36. > x + 73 [1] 109 You can also assign variable names to characters (for example, if you want to create a vector of names). To do this, simply put the characters in quotes. > y <- "This is a character variable" > y [1] "This is a character variable" Named storage isn t necessary if you re doing simple and non-stored calculations (such as addition, subtraction, multiplication, etc.), but if you have variables or a matrix of variables, it is best to assign a variable name to it. While R supports both <- and = to assign variable names, it is recommended to use <- and reserve = for stating a relation, to eliminate confusion when code gets a little more complicated. This distinction will be more apparent later. Reminders: R is case sensitive. X and x are two different names. Variable names can contain periods and underscores, but cannot contain spaces. Anything to which do don t assign a variable name is non-retrievable in the workspace. In other words, if you wish to save a calculation and make it easily accessible later on during your work session, it s best to assign it a variable name. Sometimes you may be performing multiple operations in one line such that you might be using parentheses around certain components (example: 4 + 7*(3/.6)). If you return this equation and get a + at the start of the next line rather than an actual output, this means that you forgot to close a parentheses statement. Double check your statement to make sure you close all parenthetical components!
6 5 Importing Data Clipboard Importing data using the clipboard is an easy and fast way to get data into R, but is less optimal than importing using a data file if you plan on using your data set more than once. To import using the clipboard, simply highlight the whole of your data from wherever you re getting it (a page on the internet, an excel file, Notepad, etc.) and copy it to the clipboard. In R, type the following: > read.table('clipboard') [your data will appear after hitting ENTER ] If you wish you save your data, don t forget to assign it a name. > x <- read.table('clipboard') > x [your data will appear here] Suppose, in the file you re importing, that the columns of data are named. For example: Subject Height Weight If you try the above method, you ll get an error, since R will read the first line as data instead of as the names of the columns. To fix this, you have to specify that there is a header for your data. > x <- read.table('clipboard', header = TRUE) > x Subject Height Weight Data File Importing data from a data file ensures that, if you save your R code, you will easily be able to load the data file again, assuming you keep your data in the same place. Importing data this way is slightly different than the way you import from the clipboard. x <- read.table(file("c:\\....txt ) x [your data will appear here] The header command works with this type of data import as well. This is also the more preferred way of importing data over using the clipboard.
7 6 Types of Data Data Frame Data frames in R are the most convenient ways to store data, as they are compatible with most types of operations. R views data frame rows as cases and data frame columns as variables. Data frames are also able to include columns of different types, such as both numerical and character, and support column names (which can be included using the header command as above). > x <- read.table("clipboard", header = TRUE) > x Height Weight Gender Female Female Female Male Male Male This is a data frame. If a data set you import has both numerical and character columns, it is automatically imported as a data frame. Matrix Unlike a data frame, a matrix must consist of either all numerical or all character components. However, the matrix specification for a table of data is ideal if the data you re examining requires matrix operations to be performed. More will be said on this later. > as.matrix(x) Height Weight Gender [1,] "60" "104" "Female" [2,] "62" "113" "Female" [3,] "64" "130" "Female" [4,] "66" "150" "Male" [5,] "68" "155" "Male" [6,] "70" "167" "Male" As you can see, if we try to convert our x data into a matrix, all the data gets converted into character data, since there is a mixture of both character and numerical. However, if we import a set of data that is all numerical: > as.matrix(y) Height Weight [1,] [2,] [3,] [4,] [5,] [6,] All the data remain numerical. Viewing data Suppose you have a large dataset that you want to make sure has been imported correctly. Rather than having to view the entire dataset, you can either view the first
8 7 few observations of the set or the last few observations of the set. You can do this with head( ) or tail( ). For example, the built-in dataset beaver1 has 114 observations. What if I wanted to check if the column names were in the correct places? > x <- beaver1 > head(x) day time temp activ What if I wanted to make sure that there were 114 observations? > tail(x) day time temp activ You can also get a quick summary of a dataset using dim( ). > dim(x) [1] This tells you the number of rows and the number of columns in a dataset. These commands become even more useful as you start working with larger and larger datasets.
9 8 Basic Operations R can perform many basic operations, including addition, subtraction, multiplication, division, exponentiation, square root, and logarithms. > [1] 10 > 6-4 [1] 2 > 6 * 4 [1] 24 > 6 / 4 [1] 1.5 > 6^4 [1] 1296 > sqrt(6) [1] > log(10) [1] > 6*pi [1] Note that pi is a built-in number in R, so that you don t have to define it, but the name pi WILL become a different value if you assign it something other than
10 9 Selecting and Specifying Data Unfortunately, the selection of specific data out of data frames and matrices in R is not necessarily intuitive. For these examples, I ll be using the x data frame of height, weight, and gender that was used in the above section. > x Height Weight Gender Female Female Female Male Male Male Columns and rows can be selected using the standard formula dataframename[,column] or dataframename[row,]. Note the placement of the commas. For example: To select the first column of the x data set: > x[,1] [1] To select the first row of the x data set: > x[1,] Height Weight Gender Female If the data frame you re working with has column names, you can specify columns using them as well: > x$weight [1] To select a specific point in a data frame, specify the row and the column. > x[3,2] [1] 130 Don t forget that you can select a row or column and give them a separate name if you want to use that specific row or column multiple times and don t want to keep typing a long command. > y <- x[,1] > y [1] You can also select subsets of data if you want to break a large data frame into two or more parts. Suppose we wanted to select only the data from the females in the data frame above. To do so, we would use the command subset( ). > sub1 <- subset(x, Gender == "Female") > sub1 Height Weight Gender Female Female Female
11 10 What if you only wanted the height data for the females? > sub2 <- subset(x, Gender == "Female", select = Height) > sub2 Height This is another command you ll be using a lot once you start working with larger datasets in R. Reminders: It s dataframename[row,column]. Type the column names exactly as they appear in the data frame, otherwise you ll get an error (e.g., if I had typed x$weight instead of x$weight, R would have returned an error). When using the subset( ) command, be sure to type the column names and the categorical variables (if that s what you re using to create the subset) exactly as they appear in the data frame. Also, remember the double equals sign.
12 11 Matrices Matrix operations are performed quite easily in R. I will use the same matrix z for all operations. > z <- matrix(data = 1:9, nrow = 3, ncol = 3, byrow = FALSE) > z [,1] [,2] [,3] [1,] [2,] [3,] > t(z) [,1] [,2] [,3] [1,] [2,] [3,] This transposes a matrix. This can also be performed on vectors. > sum(diag(z)) [1] 15 This gives you the trace of a matrix. > det(z) [1] 0 This calculates the determinant of a matrix. Anyone who has had to do this by hand knows how convenient this command is! > q <- matrix(data = 10:18, nrow = 3, ncol = 3, byrow = FALSE) > q [,1] [,2] [,3] [1,] [2,] [3,] To multiply matrices, you have to remember that matrices must be of compatible dimensions to be multiplied (if they are not, R will give you an error message). > z%*%q [,1] [,2] [,3] [1,] [2,] [3,] > z%*%t(q) [,1] [,2] [,3] [1,] [2,] [3,] Reminders: When multiplying matrices, order matters!
13 12 Simple Statistics R can perform many simple statistics, including calculation of means, standard deviations, sums, and medians, with a single command. For this section, I will use the built-in dataset attitude, which includes the survey responses of 30 individuals who are clerical employees of a large financial organization. > x <- attitude I will select the column of ratings to use as an example. > w <- x$rating Commands for the mean, median, standard deviation, variance, sum, and range are as follows: > mean(w) [1] > median(w) [1] 65.5 > sd(w) [1] > var(w) [1] > sum(w) [1] 1939 > range(w) [1] These commands are most useful if you only need to find the mean of a set of data, or the sum of a set of data, etc. However, if you want a general statistical summary, you can do that as well and save yourself a few lines of commands. > summary(w) Min. 1st Qu. Median Mean 3rd Qu. Max You can also perform these commands on entire data frames of information. > mean(x) rating complaints privileges learning raises critical advance > sd(x) rating complaints privileges learning raises critical advance > summary(x) rating complaints privileges learning raises Min. :40.00 Min. :37.0 Min. :30.00 Min. :34.00 Min. : st Qu.: st Qu.:58.5 1st Qu.: st Qu.: st Qu.:58.25 Median :65.50 Median :65.0 Median :51.50 Median :56.50 Median :63.50 Mean :64.63 Mean :66.6 Mean :53.13 Mean :56.37 Mean : rd Qu.: rd Qu.:77.0 3rd Qu.: rd Qu.: rd Qu.:71.00
14 13 Max. :85.00 Max. :90.0 Max. :83.00 Max. :75.00 Max. :88.00 critical advance Min. :49.00 Min. : st Qu.: st Qu.:35.00 Median :77.50 Median :41.00 Mean :74.77 Mean : rd Qu.: rd Qu.:47.75 Max. :92.00 Max. :72.00 Obviously, many more statistical procedures can be performed with R, some of which will be discussed in the following pages. Also remember that many of these basic summary statistics can be better described and understood using graphics. Methods for creating basic graphics in R will be discussed at the end of this guide.
15 14 Correlation For this section, I will use the included faithful data. R can perform several different types of correlation. Pearson Product Moment Correlation r > cor(faithful) eruptions waiting eruptions waiting This creates a correlation matrix for the two vectors. This short command is especially useful if you want to create a correlation matrix for a longer list of variables. You can also specify different vectors if you want to compute a correlation between two vectors in any given data set. > cor(faithful$eruptions, faithful$waiting) [1] This will just give you the single correlation, not a correlation matrix. Polychoric Correlation Polychoric correlations are used to correlate two sets of forced polychotomous data. In order to perform this type of correlation in R, you first need to download the polycor library. > polychor(x, y) This will give you a polychoric correlation. Polyserial Correlation Polyserial correlations are used to correlate a polychotomous variable with a continuous variable. In R, this correlation requires the polycor library. > polyserial(x, y) This will give you a polyserial correlation.
16 15 Regression R can perform multiple forms of regression, including basic and multiple linear regressions, logistic regression, and Poisson regression. I will use the built-in dataset attitude again for the first two examples. Linear Regression Linear regression can be performed with the lm command. > q <- lm(x$rating ~ x$learning) Here, I m predicting the rating score based on the learning score. R reads regression input as lm(predictor variable ~ criterion variable(s)). > summary(q) Call: lm(formula = x$rating ~ x$learning) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** x$learning *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 28 degrees of freedom Multiple R-squared: 0.389, Adjusted R-squared: F-statistic: on 1 and 28 DF, p-value: Multiple Linear Regression Multiple linear regression follows from simple linear regression. > q <- lm(x$rating ~ x$learning + x$raises + x$critical) > summary(q) Call: lm(formula = x$rating ~ x$learning + x$raises + x$critical) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x$learning * x$raises x$critical Signif. codes: 0 *** ** 0.01 * Residual standard error: on 26 degrees of freedom
17 16 Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 26 DF, p-value: Logistic Regression Logistic regression is appropriate when the variable you re predicting is binary. To perform this type of regression in R, you will need to install and load the faraway library. q <- glm(y ~ x1 + x2 + x3, data = x, family = binomial) Note the differences between this and linear regression. You have to use the function glm as well as have to specify that you are drawing from the binomial family of distributions. summary(q) will display a similar result layout to that of linear regression. Diagnostic Plots Four essential diagnostic plots can be displayed by typing one command. > plot(q)
18 17 Reminders: For the diagnostic plots, plot the name of the regression analysis, not the dataset itself!
19 18 T-Tests Several types of t-tests can also be performed in R. I will be using the built-in dataset Puromycin for these examples. First, I split the data into treated and untreated so that I have two groups. x <- Puromycin sub1 <- subset(x, state == "treated", select = conc) sub2 <- subset(x, state == "untreated", select = conc) If you examine the dataset, you will see that I m just looking at the conc variable. Now I have two sets of data whose means on the conc variable can be compared. T-tests are performed with the t.test command, which can be modified to perform different types of t-tests based on your data. > t.test(sub1, sub2, alternative = "greater", mu = 0, paired = FALSE, var.equal = TRUE, conf.level = 0.95) Two Sample t-test data: sub1 and sub2 t = , df = 21, p-value = alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: Inf sample estimates: mean of x mean of y To better understand this command, it helps to look at the individual components. > t.test(sub1, sub2, alternative = "greater", mu = 0, paired = FALSE, var.equal = TRUE, conf.level = 0.95) These indicate the two samples being tested. If you re only testing one sample against a hypothesized mu, type y = NULL for the second value. > t.test(sub1, sub2, alternative = "greater", mu = 0, paired = FALSE, var.equal = TRUE, conf.level = 0.95) This indicates the alternative test you want: greater, less, or two.tailed. The last four commands are obvious. Mu is your mu value set it to zero when comparing two samples, or to whatever number you hypothesize when doing a onesample t-test. Specify if the values are paired or not, if there is equal variance or not, and specify a confidence interval.
20 19 Basic Graphics Most graphics included here require you to install the graphics library. Assuming you have installed it, to load this library simply type library(graphics) and press enter. Basic Plot For this plot, I will use the cars dataset, which lists a speed and a distance required to stop for 50 cars. Suppose you just want a quick and easy way to visualize your data. A quick way to do so is simply typing: > plot(cars) This will give you this: Notice that the axes labels are automatically set to the column names of the dataset. If you have no column names, the axes will simply be named V1 and V2. Prettying Up the Basic Plot Suppose, instead of creating a graphic that is used for quick visualization of your data, you wanted to create a graphic that looked nice enough to present to others? In base graphics, it s a little less than intuitive to customize your graphics, but what I will show you here can be easily applied to almost every single base graphic. I will give all my examples using a dot/line plot, but the code used is generalizable. I will also be using the built-in dataset Puromycin.
21 Here is the basic plot of conc versus rate (the state will be involved in plotting later). > x <- Puromycin > plot(x$conc, x$rate) 20
22 21 Axes Labels and Titles xlab = changes the x-axis label to whatever you put into the quotes. ylab = changes the y-axis label to whatever you put into the quotes. main = changes the title to whatever you put into the quotes. > plot(x$conc, x$rate, xlab='this is the x-axis', ylab = 'This is the y- axis', main = 'This is the title') Modification of Points col = changes the color of the plotting symbol. A list of acceptable colors that can be put into this specification can be found by typing colors(). However, if you want to actually see what the colors look like, check here: pch = changes the plotting symbol shape. The possible numbers (note that there are no parentheses used) that relate to symbols for this specification are: pch=19: solid circle, pch=20: bullet (smaller circle), pch=21: circle, pch=22: square, pch=23: diamond, pch=24: triangle point-up, pch=25: triangle point down.
23 > plot(x$conc, x$rate, col='red', pch = 19) 22
24 23 Grouping by Class What if you wanted the color of the points to reflect the groups of observations in this case, the two classes? Setting col= to the column containing the class specifications will automatically color the points by group. > plot(x$conc, x$rate, col=x$state)
25 24 Legends When you color code the points by group, it s necessary to add a legend. This can be done by, after creating your plot, using the legend command to superimpose one onto the plot. > plot(x$conc, x$rate, col=x$state) > legend (.8, 75, c("treated","untreated"), col = c("black","red"), pch=21) Note that you don t have to specify what data you re working with if you ve already plotted it and have left the graphics window open the legend will simply appear on the already-existing graphic. The legend command requires some explanation to be understood. > legend (.8, 75, c("treated","untreated"), col = c("black","red"), pch=21) These correspond to the coordinates of the top and left sides of the legend box. Note that they are in terms of the scales of the axes. If I had typed (0, 100), the legend would have been touching the left side of the plot and in the middle of a bunch of points. If I had typed (100, 0), it would have been displayed off the range of the axes of the graph, and you would not have been able to see it.
26 25 > legend (.8, 75, c("treated","untreated"), col = c("black","red"), pch=21) Whatever you put in quotes is what the legend displays as labels. If I had typed c( Black Point, Red Point ), those would be what were listed next to the two points in the legend. You can have as many labels as you require, just remember to put quotations around the words you want displayed and commas in between the quoted statements. > legend (.8, 75, c("treated","untreated"), col = c("black","red"), pch=21) This assigns color to the labels you created above. Putting black first assigns that color to the first label ( Treated ) and putting red second assigns it to the second label. Make sure to have as many different colors as you have labels! > legend (.8, 75, c("treated","untreated"), col = c("black","red"), pch=21) The pch specifies which type of symbol should be used in the legend in this case, the circle symbol, since it was used in the default plot. If your symbol changes, you should match it in the legend. This is what a graph with all of these new components axis labels, color based on categories, and a legend looks like: > plot(x$conc, x$rate, xlab='this is the x-axis', ylab = 'This is the y- axis', main = 'This is the title', pch = 19, col=x$state) > legend (.8, 75, c("treated","untreated"), col = c("black","red"), pch=19)
27 26 Line plots can be constructed very similarly the only addition you need is a command that says you want a line connecting the points. > x <- women > plot(x, xlab='weight', ylab = 'Height', main = 'A Line Plot!', pch = 19, col= "red") > lines (x, type = "l", col = "red", lwd =.5) type = indicates the type of line, but for my purposes, I always just use l, as it just draws the simple line. lty = indicates what type of line you want 1, 2, 3, and 4 for solid, dashed, dotted, and dot-dash-dot, respectively. You can completely leave out this command if you just want a solid line, though, as it defaults to that automatically. lwd = indicates the width of the line. I usually find.5 to 1 to be a good line width. Other Graphs Various other types of graphs can be created as well. Many of these can be customized in similar ways as the ones above, when applicable. For more information on these types of graphs in R, type the name of the command with a? in front of it. hist(x) creates a histogram barplot(x) creates a bar graph plot(density(x)) creates a density plot pie(x) creates a pie chart
28 stripchart(x) creates a univariate scatterplot boxplot(x) creates a boxplot pairs(x) creates pairwise scatterplot matrices 27
1 Introduction to Using Excel Spreadsheets
Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)
More informationSurvey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9
Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationExcel Tips and FAQs - MS 2010
BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my
More informationStatistics 251: Statistical Methods
Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationA (very) brief introduction to R
A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce
More informationIntroduction to R, Github and Gitlab
Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and
More informationAn Introduction to R- Programming
An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University
More informationLogical operators: R provides an extensive list of logical operators. These include
meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few
More informationExcel Spreadsheets and Graphs
Excel Spreadsheets and Graphs Spreadsheets are useful for making tables and graphs and for doing repeated calculations on a set of data. A blank spreadsheet consists of a number of cells (just blank spaces
More informationAn Introduction to the R Commander
An Introduction to the R Commander BIO/MAT 460, Spring 2011 Christopher J. Mecklin Department of Mathematics & Statistics Biomathematics Research Group Murray State University Murray, KY 42071 christopher.mecklin@murraystate.edu
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationLastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.
Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means
More informationIntro To Excel Spreadsheet for use in Introductory Sciences
INTRO TO EXCEL SPREADSHEET (World Population) Objectives: Become familiar with the Excel spreadsheet environment. (Parts 1-5) Learn to create and save a worksheet. (Part 1) Perform simple calculations,
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationHomework 1 Excel Basics
Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the
More informationHow to Make Graphs in EXCEL
How to Make Graphs in EXCEL The following instructions are how you can make the graphs that you need to have in your project.the graphs in the project cannot be hand-written, but you do not have to use
More informationIntroduction to Minitab 1
Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,
More informationExcel 2010 with XLSTAT
Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with
More informationTips and Guidance for Analyzing Data. Executive Summary
Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to
More informationAdvanced Econometric Methods EMET3011/8014
Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer
More informationExcel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller
Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Table of Contents Introduction!... 1 Part 1: Entering Data!... 2 1.a: Typing!... 2 1.b: Editing
More informationSPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL
SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationExample how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler
JMP in a nutshell 1 HR, 17 Apr 2018 The software JMP Pro 14 is installed on the Macs of the Phonetics Institute. Private versions can be bought from
More informationThis document is designed to get you started with using R
An Introduction to R This document is designed to get you started with using R We will learn about what R is and its advantages over other statistics packages the basics of R plotting data and graphs What
More informationSPSS. (Statistical Packages for the Social Sciences)
Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.
More information610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison
610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationBluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition
Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created
More informationApplied Calculus. Lab 1: An Introduction to R
1 Math 131/135/194, Fall 2004 Applied Calculus Profs. Kaplan & Flath Macalester College Lab 1: An Introduction to R Goal of this lab To begin to see how to use R. What is R? R is a computer package for
More informationDepending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.
1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More informationStatistical Software Camp: Introduction to R
Statistical Software Camp: Introduction to R Day 1 August 24, 2009 1 Introduction 1.1 Why Use R? ˆ Widely-used (ever-increasingly so in political science) ˆ Free ˆ Power and flexibility ˆ Graphical capabilities
More informationLab1: Use of Word and Excel
Dr. Fritz Wilhelm; physics 230 Lab1: Use of Word and Excel Page 1 of 9 Lab partners: Download this page onto your computer. Also download the template file which you can use whenever you start your lab
More informationOrientation Assignment for Statistics Software (nothing to hand in) Mary Parker,
Orientation to MINITAB, Mary Parker, mparker@austincc.edu. Last updated 1/3/10. page 1 of Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, mparker@austincc.edu When you
More informationR for IR. Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University
R for IR Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University For presentation at the June 2013 Meeting of the Higher Education Data Sharing Consortium Table of Contents I.
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationThe goal of this handout is to allow you to install R on a Windows-based PC and to deal with some of the issues that can (will) come up.
Fall 2010 Handout on Using R Page: 1 The goal of this handout is to allow you to install R on a Windows-based PC and to deal with some of the issues that can (will) come up. 1. Installing R First off,
More informationAlgebra 2 Semester 1 (#2221)
Instructional Materials for WCSD Math Common Finals The Instructional Materials are for student and teacher use and are aligned to the 2016-2017 Course Guides for the following course: Algebra 2 Semester
More informationIntroductory Guide to SAS:
Introductory Guide to SAS: For UVM Statistics Students By Richard Single Contents 1 Introduction and Preliminaries 2 2 Reading in Data: The DATA Step 2 2.1 The DATA Statement............................................
More informationUsing Microsoft Excel
Using Microsoft Excel Introduction This handout briefly outlines most of the basic uses and functions of Excel that we will be using in this course. Although Excel may be used for performing statistical
More informationa. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.
Probability and Statistics Chapter 2 Notes I Section 2-1 A Steps to Constructing Frequency Distributions 1 Determine number of (may be given to you) a Should be between and classes 2 Find the Range a The
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationOpening a Data File in SPSS. Defining Variables in SPSS
Opening a Data File in SPSS To open an existing SPSS file: 1. Click File Open Data. Go to the appropriate directory and find the name of the appropriate file. SPSS defaults to opening SPSS data files with
More informationSPSS 11.5 for Windows Assignment 2
1 SPSS 11.5 for Windows Assignment 2 Material covered: Generating frequency distributions and descriptive statistics, converting raw scores to standard scores, creating variables using the Compute option,
More informationIntroduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data
Introduction About this Document This manual was written by members of the Statistical Consulting Program as an introduction to SPSS 12.0. It is designed to assist new users in familiarizing themselves
More informationStatistics with a Hemacytometer
Statistics with a Hemacytometer Overview This exercise incorporates several different statistical analyses. Data gathered from cell counts with a hemacytometer is used to explore frequency distributions
More informationLECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N.
LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I Part Two Introduction to R Programming RStudio November 2016 Written by N.Nilgün Çokça Introduction to R Programming 5 Installing R & RStudio 5 The R Studio
More informationExcel Primer CH141 Fall, 2017
Excel Primer CH141 Fall, 2017 To Start Excel : Click on the Excel icon found in the lower menu dock. Once Excel Workbook Gallery opens double click on Excel Workbook. A blank workbook page should appear
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More information1 Introduction to Matlab
1 Introduction to Matlab 1. What is Matlab? Matlab is a computer program designed to do mathematics. You might think of it as a super-calculator. That is, once Matlab has been started, you can enter computations,
More information4. Descriptive Statistics: Measures of Variability and Central Tendency
4. Descriptive Statistics: Measures of Variability and Central Tendency Objectives Calculate descriptive for continuous and categorical data Edit output tables Although measures of central tendency and
More informationaddition + =5+C2 adds 5 to the value in cell C2 multiplication * =F6*0.12 multiplies the value in cell F6 by 0.12
BIOL 001 Excel Quick Reference Guide (Office 2010) For your lab report and some of your assignments, you will need to use Excel to analyze your data and/or generate graphs. This guide highlights specific
More informationIntroduction to the workbook and spreadsheet
Excel Tutorial To make the most of this tutorial I suggest you follow through it while sitting in front of a computer with Microsoft Excel running. This will allow you to try things out as you follow along.
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationCreating a data file and entering data
4 Creating a data file and entering data There are a number of stages in the process of setting up a data file and analysing the data. The flow chart shown on the next page outlines the main steps that
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationMatlab Tutorial 1: Working with variables, arrays, and plotting
Matlab Tutorial 1: Working with variables, arrays, and plotting Setting up Matlab First of all, let's make sure we all have the same layout of the different windows in Matlab. Go to Home Layout Default.
More informationExcel for Gen Chem General Chemistry Laboratory September 15, 2014
Excel for Gen Chem General Chemistry Laboratory September 15, 2014 Excel is a ubiquitous data analysis software. Mastery of Excel can help you succeed in a first job and in your further studies with expertise
More informationBasics of Plotting Data
Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based
More informationExcel R Tips. is used for multiplication. + is used for addition. is used for subtraction. / is used for division
Excel R Tips EXCEL TIP 1: INPUTTING FORMULAS To input a formula in Excel, click on the cell you want to place your formula in, and begin your formula with an equals sign (=). There are several functions
More informationChapter One: Getting Started With IBM SPSS for Windows
Chapter One: Getting Started With IBM SPSS for Windows Using Windows The Windows start-up screen should look something like Figure 1-1. Several standard desktop icons will always appear on start up. Note
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationLet s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project
Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project Data Content: Example: Who chats on-line most frequently? This Technology Use dataset in
More informationAssignment 0. Nothing here to hand in
Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very
More informationData Analysis in Paleontology Using R. Looping Basics
Data Analysis in Paleontology Using R Session 4 26 Jan 2006 Gene Hunt Dept. of Paleobiology NMNH, SI Looping Basics Situation: you have a set of objects (sites, species, measurements, etc.) and want to
More informationAn introduction to plotting data
An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a
More information1 Pencil and Paper stuff
Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman
More informationAA BB CC DD EE. Introduction to Graphics in R
Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat
More informationLAB #1: DESCRIPTIVE STATISTICS WITH R
NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab
More informationLab 1 Intro to MATLAB and FreeMat
Lab 1 Intro to MATLAB and FreeMat Objectives concepts 1. Variables, vectors, and arrays 2. Plotting data 3. Script files skills 1. Use MATLAB to solve homework problems 2. Plot lab data and mathematical
More informationST Lab 1 - The basics of SAS
ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc
More informationGraphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):
Graphing on Excel Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version): The first step is to organize your data in columns. Suppose you obtain
More informationA. Using the data provided above, calculate the sampling variance and standard error for S for each week s data.
WILD 502 Lab 1 Estimating Survival when Animal Fates are Known Today s lab will give you hands-on experience with estimating survival rates using logistic regression to estimate the parameters in a variety
More informationChapter 2 Assignment (due Thursday, April 19)
(due Thursday, April 19) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should
More informationKey Strokes To make a histogram or box-and-whisker plot: (Using canned program in TI)
Key Strokes To make a histogram or box-and-whisker plot: (Using canned program in TI) 1. ing Data: To enter the variable, use the following keystrokes: Press STAT (directly underneath the DEL key) Leave
More informationHow to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something)
How to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something) 10.01.17 Before you do anything, set up your calculator so that it won t get in your way. Basics:
More informationMinitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.
Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Introduction to Minitab The interface for Minitab is very user-friendly, with a spreadsheet orientation. When you first launch Minitab, you will see
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationVariable Definition and Statement Suppression You can create your own variables, and assign them values using = >> a = a = 3.
MATLAB Introduction Accessing Matlab... Matlab Interface... The Basics... 2 Variable Definition and Statement Suppression... 2 Keyboard Shortcuts... More Common Functions... 4 Vectors and Matrices... 4
More informationIntroduction to Stata: An In-class Tutorial
Introduction to Stata: An I. The Basics - Stata is a command-driven statistical software program. In other words, you type in a command, and Stata executes it. You can use the drop-down menus to avoid
More informationResearch Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel
Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement
More informationIntroduction to Excel Workshop
Introduction to Excel Workshop Empirical Reasoning Center September 9, 2016 1 Important Terminology 1. Rows are identified by numbers. 2. Columns are identified by letters. 3. Cells are identified by the
More informationExcel Basics Fall 2016
If you have never worked with Excel, it can be a little confusing at first. When you open Excel, you are faced with various toolbars and menus and a big, empty grid. So what do you do with it? The great
More informationOur Changing Forests Level 2 Graphing Exercises (Google Sheets)
Our Changing Forests Level 2 Graphing Exercises (Google Sheets) In these graphing exercises, you will learn how to use Google Sheets to create a simple pie chart to display the species composition of your
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationDealing with Data in Excel 2013/2016
Dealing with Data in Excel 2013/2016 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More information= 3 + (5*4) + (1/2)*(4/2)^2.
Physics 100 Lab 1: Use of a Spreadsheet to Analyze Data by Kenneth Hahn and Michael Goggin In this lab you will learn how to enter data into a spreadsheet and to manipulate the data in meaningful ways.
More information