R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R
|
|
- Angela Burke
- 5 years ago
- Views:
Transcription
1 R Workshop Guide This guide reviews the examples we will cover in today s workshop. It should be a helpful introduction to R, but for more details, you can access a more extensive user guide for R on the ERC website. 1 Some Programming Basics You should always write code in a script that you can save and modify as necessary. To start a new script, access the File menu, chose New File, and then choose R Script. It s always a good idea to start by clearing your workspace. rm(list=ls(all=true)) # clear all objects in memory 1.1 Writing and executing code in R Basic calculations 4 [1] 4 "yes" [1] "yes" 2+3 [1] /49 [1] ^700 [1] Inf ( )/(900*2) [1] Assignment operator x <- 3 x [1] 3 y <- "this is a string" y [1] "this is a string" z <- 2 z 1
2 [1] 2 x+z [1] 5 x==5 # this is a logical operator [1] FALSE x [1] 3 x <- TRUE # assign logical values to variables x+z # explain this output numeric value of TRUE = 1, so [1] 3 # clear your workspace again rm(list=ls(all=true)) 1.2 Data objects in R Vectors The function c() allows you to concatenate multiple items into a vector x <- c(1,2,3,4) x [1] x[2] [1] 2 y <- c(5,6,7,8,9) y [1] y[5] [1] 9 You can append one vector to another z <- c(x,y) z [1] Another way to produce a vector containing a sequence of integers 2
3 q <- 1:5 q [1] You can repeat vectors multiple times ab <- rep(1:5, times=3) ab [1] ab <- rep(1:5, 3) you do not need the "times" with rep; also, notice that R lets you overwrite cd <- rep(c(1,3,7,9), times=2) cd [1] a <- seq(from=2, to=100, by=2) a [1] [22] [43] Performing Basic Tasks 2.1 Setting up your work space See the objects currently in memory ls() [1] "a" "ab" "cd" "q" "x" "y" "z" Clear your workspace many R users include this as the first line in any script rm(list=ls(all=true)) Working Directory The working directory is the location on your computer where R will access and save files. You can seeyour working directory, and you can set your working diretory. getwd() [1] "/Users/patriciakirkland/Dropbox/Empiprical Reasoning Center/R Workshop" setwd("/users/patriciakirkland/dropbox/empiprical Reasoning Center/R Workshop") getwd() # check again [1] "/Users/patriciakirkland/Dropbox/Empiprical Reasoning Center/R Workshop" 3
4 2.2 Installing and loading packages You will need to install packages to handle certain tasks. You only need to install packages once, but you will need to load them any time you want to use them. # install.packages("dplyr", dependencies=true) # install.packages("ggplot2", dependencies=true) # install.packages("foreign", dependencies=true) # install.packages("xtable", dependencies = TRUE) # install.packages("stargazer", dependencies = TRUE) # install.packages("arm", dependencies = TRUE) # load packages library(foreign) library(xtable) library(arm) Loading required package: Matrix Loading required package: lme4 arm (Version 1.8-6, built: ) Working directory is /Users/patriciakirkland/Dropbox/Empiprical Reasoning Center/R Workshop Attaching package: arm The following object is masked from package:xtable : display library(ggplot2) library(dplyr) library(stargazer) Please cite as: Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version Some useful packages: foreign load data formatted for other software xtable export code to produce tables in LaTeX arm applied regression and multi-level modeling ggplot2 make plots and figures dplyr user-friendly data cleaning & manipulation more packages: 4
5 2.3 Read in data R can read data files in a variety of formats. Today, we will use a.csv file, but see below for code to read other types of data files. Note: If the data file is stored in your working directory, you need only specify the file name. However, if the file is stored somewhere else on your computer, you will need to include the file path. # csv file data <- read.csv("teachingratingsexcel.csv", header=true) #.dta file (Stata) # dtafile <- read.dta("fakedata.dta") # dtafile #.RData file # load("fakedata1.rdata") # data 2.4 Looking at data: basic info, printing objects, and generating basic summary stats See variable names and dimensions of the data names(data) [1] "minority" "age" "female" "onecredit" "beauty" "course_eval" [7] "intro" "nnenglish" dim(data) [1] dim(data)[1] [1] 463 dim(data)[2] [1] 8 You can refer to specific rows or columns in a data frame by row or column number(s) this allows you to see a subset of your data. You could even assign it to a new object and you would have effectively subset your data. data[1,] # row 1 only minority age female onecredit beauty course_eval intro nnenglish data[1:3,] # rows 1 to 3 only minority age female onecredit beauty course_eval intro nnenglish
6 # data[,1] # column 1 only # data[,2:4] # columns 2 to 4 only Print some or all of the data to the console # data # data[1:5,] # data[,3] head(data) minority age female onecredit beauty course_eval intro nnenglish # data$course_eval # data$female # data$beauty # course_eval # error! why? Find out the classification or type of an object such as a data frame or a variable class(data) [1] "data.frame" class(data$course_eval) [1] "numeric" class(data$female) [1] "integer" Summarize your dataset or a specific variable. # summary() function summary(data) minority age female onecredit beauty Min. : Min. :29.00 Min. : Min. : Min. : st Qu.: st Qu.: st Qu.: st Qu.: st Qu.: Median : Median :48.00 Median : Median : Median : Mean : Mean :48.37 Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. :73.00 Max. : Max. : Max. : course_eval intro nnenglish Min. :2.100 Min. : Min. : st Qu.: st Qu.: st Qu.: Median :4.000 Median : Median :
7 Mean :3.998 Mean : Mean : rd Qu.: rd Qu.: rd Qu.: Max. :5.000 Max. : Max. : summary(data$beauty) Min. 1st Qu. Median Mean 3rd Qu. Max Tables # table() function table(data$female, usena="always") 0 1 <NA> crosstab <- table(data$female, data$minority, usena="always", dnn=c("gender", "Race or Ethnicity")) crosstab <- crosstab[c(2, 1, 3), c(2, 1, 3)] row.names(crosstab) <- c("female", "Male", "NA") colnames(crosstab) <- c("minority", "White", "NA") mytable <- table(data$female, data$minority, usena="always", dnn=c("female", "Minority")) margin.table(mytable, 1) Female 0 1 <NA> margin.table(mytable, 2) Minority 0 1 <NA> prop.table(mytable) Minority Female 0 1 <NA> <NA> prop.table(mytable, 1) Minority Female 0 1 <NA> <NA> prop.table(mytable, 2) Minority Female 0 1 <NA> <NA>
8 2.5 Basic histograms and scatterplots A few easy ways to see the distribution of your data. We will look at some more complex figures later. Histogram # hist() hist(data$course_eval, breaks=25, main="histogram of Outcome Variable - Course Evaluation", xlab="outcom Histogram of Outcome Variable Course Evaluation Frequency Outcome Variable Y Scatterplot # plot() plot(data$beauty, data$course_eval, main="scatterplot of Beauty and Course Evaluations", pch=16) abline(v=0, col="red") abline(h=3.5, col="grey80", lty=2, lwd=3) 8
9 Scatterplot of Beauty and Course Evaluations data$course_eval data$beauty You can save a plot in PDF format. R will save the file to your working directory unless you specify a different file path. # save to disk pdf("basic_plot.pdf") plot(data$beauty, data$course_eval, main="scatterplot of Beauty and Course Evaluations", pch=16) abline(v=0, col="red") abline(h=3.5, col="grey80", lty=2, lwd=3) dev.off() pdf 2 9
10 2.6 Basic operators Arithmetic/Math/Numeric Operators + / addition subtraction multiplication division An example make a new variable age_squared data$age_squared <- data$age^2 2.7 Logical Operators Logical operators test conditions. For example, you might want a subset of data that includes observations for which a specific variable exceeds some value, or you may want to find observations with missing values. You can also use these operators to generate variables and data often using the if() or ifelse() function. < less than <= less than or equal to > greater than >= greater than or equal to == exactly equal to! = not equal to!x Not x x y x OR y x & y x AND y istrue(x) test if X is TRUE Example make a new variable using a logical test to determine which subjects are minorities who are non-native English speakers data$nnenglish_minority <- data$minority == 1 & data$nnenglish # data$nnenglish_minority data$nnenglish_minority <- as.numeric(data$nnenglish_minority) # data$nnenglish_minority Now make a new variable to indicate whether a subject is older than the average age. We can use the ifelse() function. 10
11 data$older <- ifelse(data$age > mean(data$age), 1, 0) 2.8 Subsetting data data[4,3] [1] 1 data[4,] minority age female onecredit beauty course_eval intro nnenglish age_squared nnenglish_minority older data[,3] [1] [43] [85] [127] [169] [211] [253] [295] [337] [379] [421] [463] 1 data[4:10, 2:3] age female You can also subset by variables. Designate variables to keep or exclude. select.vars <- c("course_eval", "female") # data[select.vars] # data[ data$female==1,] Make a new data frame that includes only women female <- data[ data$female==1,] Here is another way to make a new data frame that includes only women 11
12 female2 <- subset(data, female==1) 2.9 Writing data to disk write.csv(data, "evaluation_data.csv", row.names=false) write.dta(data, "evaluation_data.dta") save(data, file="evaluation_data.rdata") # save just a data frame save.image(file="course_evaluations.rdata") # save your current workspace 2.10 Regression We could simply proceed, but let s clear the workspace and load the.rdata file we just saved rm(list=ls(all=true)) load("evaluation_data.rdata") # clear all objects in memory # load the data Specify a regression model the following examples are OLS models. See the more detailed user guide for more information on other classes of models. fit_1 <- lm(course_eval ~ female, data=data) summary(fit_1) Call: lm(formula = course_eval ~ female, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** female ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 461 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 461 DF, p-value: To include additional independent variables... fit_2 <- lm(course_eval ~ female + beauty + age + minority + nnenglish, data=data) summary(fit_2) 12
13 Call: lm(formula = course_eval ~ female + beauty + age + minority + nnenglish, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** female e-05 *** beauty e-05 *** age minority nnenglish ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 457 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: 8.95 on 5 and 457 DF, p-value: 4.001e-08 To add fixed effects... fit_3 <- lm(course_eval ~ factor(intro) + female + beauty + age + minority + nnenglish, data=data) summary(fit_3) Call: lm(formula = course_eval ~ factor(intro) + female + beauty + age + minority + nnenglish, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** factor(intro) female *** beauty e-05 *** age minority nnenglish * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 456 degrees of freedom Multiple R-squared: ,Adjusted R-squared:
14 F-statistic: on 6 and 456 DF, p-value: 2.836e-08 To include an interaction... fit_4 <- lm(course_eval ~ female*beauty + age + minority + nnenglish, data=data) summary(fit_4) Call: lm(formula = course_eval ~ female * beauty + age + minority + nnenglish, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** female *** beauty e-05 *** age minority nnenglish ** female:beauty Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 456 degrees of freedom Multiple R-squared: 0.095,Adjusted R-squared: F-statistic: on 6 and 456 DF, p-value: 3.372e-08 You can also export regression results there are multiple packages you could use, but the example below uses stargazer. # stargazer(fit_1, fit_2, fit_3, fit_4, omit=("intro"), # omit.stat=("n"), # add.lines=list(c("fixed Effects", "No", "intro", "No")), # notes=c("ols Regression models."), # notes.align="l", # notes.append=t, # covariate.labels = c(), # float=f, dep.var.labels = "Course Evaluation", # out = "course_eval_regressions") 3 More Plots & Figures We can start by creating factors factors designate groups or categories (this is optional, depending on the figures you need). 14
15 data$gender <- factor(data$female,levels=c(0, 1), labels=c("male","female")) data$minority_status <- factor(data$minority,levels=c(0,1), labels=c("non-minority","minority")) data$age_status <- factor(data$older,levels=c(0, 1), labels=c("younger","older")) You need the ggplo2t package to use the qplot() & ggplot() functions. # Kernel density plots for course evaluations # grouped by number of gender (indicated by color) qplot(course_eval, data=data, geom="density", fill=gender, alpha=i(.5), main="distribution of Course Evaluations", xlab="evaluation Score", ylab="density") 15
16 Distribution of Course Evaluations 0.6 Density 0.4 gender Male Female Evaluation Score # Histogram for course evaluations # grouped by number of gender (indicated by color) qplot(course_eval, data=data, geom="histogram", fill=gender, alpha=i(.75), main="distribution of Course Evaluations", xlab="evaluation Score", ylab="density") stat_bin: binwidth defaulted to range/30. Use binwidth = x to adjust this. 16
17 Distribution of Course Evaluations 30 Density 20 gender Male Female Evaluation Score # Scatterplot of course evaluations vs. beauty for each combination of gender and age_status # in each facet, gender is represented by shape and color qplot(course_eval, beauty, data=data, shape=gender, color=gender, facets=age_status~minority_status, size=i(3), xlab="beauty", ylab="course Evaluation") 17
18 2 Non minority Minority 1 Course Evaluation Younger Older gender Male Female Beauty # Separate regressions of course evaluations on beauty for each gender qplot(beauty, course_eval, data=data, geom=c("point", "smooth"), method="lm", formula=y~x, color=gender, main="regression of Evaluations on Beauty", xlab="beauty", ylab="course Evaluation") 18
19 Regression of Evaluations on Beauty 5 4 Course Evaluation gender Male Female Beauty # Boxplots of course evaluations by gender # observations (points) are overlayed and jittered qplot(gender, course_eval, data=data, geom=c("boxplot", "jitter"), fill=gender, main="course Evaluations by Gender", xlab="", ylab="course Evaluations") 19
20 Course Evaluations by Gender 5 4 Course Evaluations gender Male Female 3 2 Male Female plot <- ggplot(data, aes(beauty, course_eval)) + geom_point(alpha=.5) + geom_smooth() plot geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use method = x to change the smoothing method. 20
21 5 4 course_eval beauty plot <- ggplot(data, aes(beauty, course_eval)) + geom_point(colour="green", alpha=1) + geom_smooth(method="lm", colour="black", se=false) + scale_y_continuous(limits=c(0, 10)) + scale_x_continuous(limits=c(-2, 2.5)) + theme_bw() + xlab("beauty") + ylab("course Evaluations") + ggtitle("course Evaluations & Beauty") + geom_vline(xintercept = 0, colour="grey") plot 21
22 Course Evaluations & Beauty Course Evaluations Beauty plot_2 <- plot + theme_bw() + ylab("course Evaluations") + xlab("beauty") + ggtitle("course Evaluations & Beauty") + scale_y_continuous(limits=c(0, 6), breaks=seq(1, 6, 1.5)) + scale_x_continuous(limits=c(-2, 2), breaks=seq(-2, 2,.5)) Scale for y is already present. Adding another scale for y, which will replace the existing scale. Scale for x is already present. Adding another scale for x, which will replace the existing scale. plot_2 22
23 Course Evaluations & Beauty 5.5 Course Evaluations Beauty plot_3 <- ggplot(data, aes(beauty, course_eval)) + geom_point(alpha=.5) + geom_smooth(se=false) + theme_bw() + ylab("course Evaluation") + xlab("beauty") + ggtitle("course Evaluations & Beauty") + scale_y_continuous(limits=c(1, 6), breaks=seq(1.5, 6, 1.5)) + scale_x_continuous(limits=c(-1, 2.5), breaks=seq(-1.5, 2.5, 1)) plot_3 geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use method = x to change the smoothing method. Warning: Removed 46 rows containing missing values (stat_smooth). Warning: Removed 46 rows containing missing values (geom_point). 23
24 Course Evaluations & Beauty Course Evaluation Beauty plot_4 <- plot_3 %+% aes(age, course_eval) + ylab("course Evaluation") + xlab("age") + ggtitle("course Evaluations & Age") + scale_x_continuous(limits=c(25, 75), breaks=seq(25, 75, 15)) Scale for x is already present. Adding another scale for x, which will replace the existing scale. plot_4 geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use method = x to change the smoothing method. 24
25 Course Evaluations & Age Course Evaluation Age To save a plot as a PDF... name the file pdf("plot_evals_age.pdf") print the object (plot) print(plot_4) geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use method = x to change the smoothing method. close the figure file dev.off() pdf 2 25
Empirical Reasoning Center R Workshop (Summer 2016) Session 1. 1 Writing and executing code in R. 1.1 A few programming basics
Empirical Reasoning Center R Workshop (Summer 2016) Session 1 This guide reviews the examples we will cover in today s workshop. It should be a helpful introduction to R, but for more details, the ERC
More informationA (very) brief introduction to R
A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce
More informationNon-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel
Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common
More informationIntroduction to R. Introduction to Econometrics W
Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,
More informationggplot2 for beginners Maria Novosolov 1 December, 2014
ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working
More information7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel?
Lecture 3: Programming Statistics in R Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Questions from last lecture? Problems with Stata? Problems with Excel? 2
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationGgplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)
Ggplot2 QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-4_ggplot2.html#(1) 1/27 Ggplot2 ggplot2 is a plotting system for R, based on the
More informationIntroduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010
UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview
More informationBernt Arne Ødegaard. 15 November 2018
R Bernt Arne Ødegaard 15 November 2018 To R is Human 1 R R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data,
More informationLab 10 Regression IV
ggplot2 package: Lab 10 Regression IV Dave presented analysis of a data set on body fat which I would like to use to show features I think are worth knowing about in ggplot2 (and associated) packages.
More informationImporting and visualizing data in R. Day 3
Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation
More informationAn introduction to WS 2015/2016
An introduction to WS 2015/2016 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Prof. Dr. Martin Hutzenthaler (previously AG Metzler, now University of Duisburg-Essen) course
More informationSome issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,
R To R is Human R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data, also in finance. Good parts It is freely
More informationBasics of Plotting Data
Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based
More informationIllustrations - Simple and Multiple Linear Regression Steele H. Valenzuela February 18, 2015
Illustrations - Simple and Multiple Linear Regression Steele H. Valenzuela February 18, 2015 Illustrations for Simple and Multiple Linear Regression February 2015 Simple Linear Regression 1. Introduction
More informationBIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...
BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...
More informationIntroduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives
Introduction to R and R-Studio 2018-19 Toy Program #2 Basic Descriptives Summary The goal of this toy program is to give you a boiler for working with your own excel data. So, I m hoping you ll try!. In
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationMath 263 Excel Assignment 3
ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce
More informationAA BB CC DD EE. Introduction to Graphics in R
Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationIntroduction to R, Github and Gitlab
Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and
More informationAn Introduction to R- Programming
An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University
More informationGelman-Hill Chapter 3
Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal
More informationBIOSTAT640 R Lab1 for Spring 2016
BIOSTAT640 R Lab1 for Spring 2016 Minming Li & Steele H. Valenzuela Feb.1, 2016 This is the first R lab session of course BIOSTAT640 at UMass during the Spring 2016 semester. I, Minming (Matt) Li, am going
More informationAn Introductory Guide to R
An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationThe following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.
Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created
More informationplots Chris Parrish August 20, 2015
plots Chris Parrish August 20, 2015 plots We construct some of the most commonly used types of plots for numerical data. dotplot A stripchart is most suitable for displaying small data sets. data
More informationPart I { Getting Started & Manipulating Data with R
Part I { Getting Started & Manipulating Data with R Gilles Lamothe February 21, 2017 Contents 1 URL for these notes and data 2 2 Origins of R 2 3 Downloading and Installing R 2 4 R Console and Editor 3
More informationDr. Barbara Morgan Quantitative Methods
Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In
More informationIntroduction to R. Andy Grogan-Kaylor October 22, Contents
Introduction to R Andy Grogan-Kaylor October 22, 2018 Contents 1 Background 2 2 Introduction 2 3 Base R and Libraries 3 4 Working Directory 3 5 Writing R Code or Script 4 6 Graphical User Interface 4 7
More informationHandout #1. The abbreviations of FIVE references are PE, MPS, BR, FCDAE, and PRA. There is additional reference about the use of R (BR).
Handout #1 Title: FAE Course: Econ 368/01 Spring/2015 Instructor: Dr. I-Ming Chiu The abbreviations of FIVE references are PE, MPS, BR, FCDAE, and PRA. There is additional reference about the use of R
More informationINTRODUCTION TO DATA. Welcome to the course!
INTRODUCTION TO DATA Welcome to the course! High School and Beyond id gender race socst 70 male white 57 121 female white 61 86 male white 31 137 female white 61 Loading data > # Load package > library(openintro)
More informationEric Pitman Summer Workshop in Computational Science
Eric Pitman Summer Workshop in Computational Science 2. Data Structures: Vectors and Data Frames Jeanette Sperhac Data Objects in R These objects, composed of multiple atomic data elements, are the bread
More informationCreating elegant graphics in R with ggplot2
Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is
More informationLab #7 - More on Regression in R Econ 224 September 18th, 2018
Lab #7 - More on Regression in R Econ 224 September 18th, 2018 Robust Standard Errors Your reading assignment from Chapter 3 of ISL briefly discussed two ways that the standard regression inference formulas
More informationIntroduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio
Introduction to R and R-Studio 2018-19 Toy Program #1 R Essentials This illustration Assumes that You Have Installed R and R-Studio If you have not already installed R and RStudio, please see: Windows
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More informationDataset Used in This Lab (download from course website framingham_1000.rdata
Introduction to R and R- Studio Sring 2019 Lab #1 Some Basics Before you begin: If you have not already installed R and RStudio, lease see Windows Users: htt://eole.umass.edu/bie540w/df/how%20to%20install%20r%20and%20r%20studio%20windows%20users%20fall%20201
More informationFacets and Continuous graphs
Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display
More informationPractice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)
Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:
More informationBivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:
More informationKnitr. Introduction to R for Public Health Researchers
Knitr Introduction to R for Public Health Researchers Introduction Exploratory Analysis Plots of bike length Multiple Facets Means by type Linear Models Grabbing coefficients Broom package Testing Nested
More informationExploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018
Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 1, 218 Vignette Outline This vignette provides examples of conducting exploratory data analysis (EDA) on NAEP
More informationUniversity of Wollongong School of Mathematics and Applied Statistics. STAT231 Probability and Random Variables Introductory Laboratory
1 R and RStudio University of Wollongong School of Mathematics and Applied Statistics STAT231 Probability and Random Variables 2014 Introductory Laboratory RStudio is a powerful statistical analysis package.
More informationAdvanced Econometric Methods EMET3011/8014
Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer
More informationPractice for Learning R and Learning Latex
Practice for Learning R and Learning Latex Jennifer Pan August, 2011 Latex Environments A) Try to create the following equations: 1. 5+6 α = β2 2. P r( 1.96 Z 1.96) = 0.95 ( ) ( ) sy 1 r 2 3. ˆβx = r xy
More informationA brief introduction to R
A brief introduction to R Cavan Reilly September 29, 2017 Table of contents Background R objects Operations on objects Factors Input and Output Figures Missing Data Random Numbers Control structures Background
More informationAmong those 14 potential explanatory variables,non-dummy variables are:
Among those 14 potential explanatory variables,non-dummy variables are: Size: 2nd column in the dataset Land: 14th column in the dataset Bed.Rooms: 5th column in the dataset Fireplace: 7th column in the
More informationIntroduction to R Commander
Introduction to R Commander 1. Get R and Rcmdr to run 2. Familiarize yourself with Rcmdr 3. Look over Rcmdr metadata (Fox, 2005) 4. Start doing stats / plots with Rcmdr Tasks 1. Clear Workspace and History.
More informationStat 579: More Preliminaries, Reading from Files
Stat 579: More Preliminaries, Reading from Files Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu September 1, 2011, 1/10 Some more
More informationGraphing Bivariate Relationships
Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship
More informationLaTeX packages for R and Advanced knitr
LaTeX packages for R and Advanced knitr Iowa State University April 9, 2014 More ways to combine R and LaTeX Additional knitr options for formatting R output: \Sexpr{}, results='asis' xtable - formats
More informationIntroduction to R Programming
Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data
More informationData Visualization. Andrew Jaffe Instructor
Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data
More informationLAB #1: DESCRIPTIVE STATISTICS WITH R
NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab
More informationUsing Built-in Plotting Functions
Workshop: Graphics in R Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky September 15, 2016 Using Built-in Plotting Functions ## Plotting One Quantitative
More informationfile:///users/williams03/a/workshops/2015.march/final/intro_to_r.html
Intro to R R is a functional programming language, which means that most of what one does is apply functions to objects. We will begin with a brief introduction to R objects and how functions work, and
More informationLab #13 - Resampling Methods Econ 224 October 23rd, 2018
Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section
More informationGraphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created
Graphics in R There are three plotting systems in R base Convenient, but hard to adjust after the plot is created lattice Good for creating conditioning plot ggplot2 Powerful and flexible, many tunable
More informationIntroduction to Graphics with ggplot2
Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to
More informationIntroduction to Statistics using R/Rstudio
Introduction to Statistics using R/Rstudio R and Rstudio Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations sent) R on Windows R on MACs
More informationPractical 2: Plotting
Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory
More informationIntroduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW
Introduction to R for Beginners, Level II Jeon Lee Bio-Informatics Core Facility (BICF), UTSW Basics of R Powerful programming language and environment for statistical computing Useful for very basic analysis
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationContents 1 Admin 2 Testing hypotheses tests 4 Simulation 5 Parallelization Admin
magrittr t F F dplyr lfe readr magrittr parallel parallel auto.csv y x y x y x y x # Setup ---- # Settings options(stringsasfactors = F) # Packages library(dplyr) library(lfe) library(magrittr) library(readr)
More informationQuick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont
Quick introduction to descriptive statistics and graphs in R Commander Written by: Robin Beaumont e-mail: robin@organplayers.co.uk http://www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html Date
More informationAdvanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel
Advanced Plotting with ggplot2 Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel Today s Lecture Objectives 1 Distinguishing different types of plots and their purpose 2 Learning
More informationR For Sql Developers. Kiran Math
R For Sql Developers Kiran Math - kiranmath@outlook.com 2017-09-22 R Figure 1: R Download R CRAN The Comprehensive R Archive Network ~ OpenSource R MRAN Micrsoft R Open ~ Enhanced R Distribution Microsoft
More informationVisualizing Data: Customization with ggplot2
Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers
More informationIntroduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016
Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also
More informationAfter opening Stata for the first time: set scheme s1mono, permanently
Stata 13 HELP Getting help Type help command (e.g., help regress). If you don't know the command name, type lookup topic (e.g., lookup regression). Email: tech-support@stata.com. Put your Stata serial
More informationTo complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.
An Introduction to EViews The purpose of the computer assignments in BUEC 333 is to give you some experience using econometric software to analyse real-world data. Along the way, you ll become acquainted
More informationProperties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013
Properties of Data Digging into Data: Jordan Boyd-Graber University of Maryland February 11, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Properties of Data February 11, 2013 1 / 43 Roadmap Munging
More informationGetting started with ggplot2
Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for
More informationReferences R's single biggest strenght is it online community. There are tons of free tutorials on R.
Introduction to R Syllabus Instructor Grant Cavanaugh Department of Agricultural Economics University of Kentucky E-mail: gcavanugh@uky.edu Course description Introduction to R is a short course intended
More informationA set of rules describing how to compose a 'vocabulary' into permissible 'sentences'
Lecture 8: The grammar of graphics STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University Grammar? A set of rules describing how to compose a 'vocabulary'
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More information1 Standard Errors on Different Models
1 Standard Errors on Different Models Math 158, Spring 2018 Jo Hardin Regression Splines & Smoothing/Kernel Splines R code First we scrape some weather data from NOAA. The resulting data we will use is
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationStat 5303 (Oehlert): Response Surfaces 1
Stat 5303 (Oehlert): Response Surfaces 1 > data
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationDemo yeast mutant analysis
Demo yeast mutant analysis Jean-Yves Sgro February 20, 2018 Contents 1 Analysis of yeast growth data 1 1.1 Set working directory........................................ 1 1.2 List all files in directory.......................................
More informationSalary 9 mo : 9 month salary for faculty member for 2004
22s:52 Applied Linear Regression DeCook Fall 2008 Lab 3 Friday October 3. The data Set In 2004, a study was done to examine if gender, after controlling for other variables, was a significant predictor
More informationStatistics Lab #7 ANOVA Part 2 & ANCOVA
Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More informationData visualization with ggplot2
Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2
More informationStat 4510/7510 Homework 4
Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader
More informationggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017
ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017 Introduction Know your data: data exploration is an important part of research Data visualization is an excellent way to explore data ggplot2
More informationData Visualization in R
Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 216 Introduction Motivation for Data Visualization Humans are outstanding at detecting
More informationIntroduction to R and the tidyverse. Paolo Crosetto
Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:
More informationAn R Package for the Panel Approach Method for Program Evaluation: pampe by Ainhoa Vega-Bayo
CONTRIBUTED RESEARCH ARTICLES 105 An R Package for the Panel Approach Method for Program Evaluation: pampe by Ainhoa Vega-Bayo Abstract The pampe package for R implements the panel data approach method
More informationGetting Started in R
Getting Started in R Giles Hooker May 28, 2007 1 Overview R is a free alternative to Splus: a nice environment for data analysis and graphical exploration. It uses the objectoriented paradigm to implement
More informationA Short Guide to R with RStudio
Short Guides to Microeconometrics Fall 2013 Prof. Dr. Kurt Schmidheiny Universität Basel A Short Guide to R with RStudio 2 1 Introduction A Short Guide to R with RStudio 1 Introduction 3 2 Installing R
More informationYou will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables
Jennie Murack You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables How to conduct basic descriptive statistics
More informationOVERVIEW OF ESTIMATION FRAMEWORKS AND ESTIMATORS
OVERVIEW OF ESTIMATION FRAMEWORKS AND ESTIMATORS Set basic R-options upfront and load all required R packages: > options(prompt = " ", digits = 4) setwd('c:/klaus/aaec5126/test/')#r Sweaves to the default
More informationPython for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT
Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.
More information