R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R

Similar documents
Empirical Reasoning Center R Workshop (Summer 2016) Session 1. 1 Writing and executing code in R. 1.1 A few programming basics

A (very) brief introduction to R

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Introduction to R. Introduction to Econometrics W

ggplot2 for beginners Maria Novosolov 1 December, 2014

7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel?

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Bernt Arne Ødegaard. 15 November 2018

Lab 10 Regression IV

Importing and visualizing data in R. Day 3

An introduction to WS 2015/2016

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,

Basics of Plotting Data

Illustrations - Simple and Multiple Linear Regression Steele H. Valenzuela February 18, 2015

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

Introduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

Install RStudio from - use the standard installation.

Math 263 Excel Assignment 3

AA BB CC DD EE. Introduction to Graphics in R

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Introduction to R, Github and Gitlab

An Introduction to R- Programming

Gelman-Hill Chapter 3

BIOSTAT640 R Lab1 for Spring 2016

An Introductory Guide to R

36-402/608 HW #1 Solutions 1/21/2010

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

plots Chris Parrish August 20, 2015

Part I { Getting Started & Manipulating Data with R

Dr. Barbara Morgan Quantitative Methods

Introduction to R. Andy Grogan-Kaylor October 22, Contents

Handout #1. The abbreviations of FIVE references are PE, MPS, BR, FCDAE, and PRA. There is additional reference about the use of R (BR).

INTRODUCTION TO DATA. Welcome to the course!

Eric Pitman Summer Workshop in Computational Science

Creating elegant graphics in R with ggplot2

Lab #7 - More on Regression in R Econ 224 September 18th, 2018

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio

Multiple Linear Regression

Dataset Used in This Lab (download from course website framingham_1000.rdata

Facets and Continuous graphs

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Knitr. Introduction to R for Public Health Researchers

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

University of Wollongong School of Mathematics and Applied Statistics. STAT231 Probability and Random Variables Introductory Laboratory

Advanced Econometric Methods EMET3011/8014

Practice for Learning R and Learning Latex

A brief introduction to R

Among those 14 potential explanatory variables,non-dummy variables are:

Introduction to R Commander

Stat 579: More Preliminaries, Reading from Files

Graphing Bivariate Relationships

LaTeX packages for R and Advanced knitr

Introduction to R Programming

Data Visualization. Andrew Jaffe Instructor

LAB #1: DESCRIPTIVE STATISTICS WITH R

Using Built-in Plotting Functions

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Graphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created

Introduction to Graphics with ggplot2

Introduction to Statistics using R/Rstudio

Practical 2: Plotting

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW

Section 2.1: Intro to Simple Linear Regression & Least Squares

Contents 1 Admin 2 Testing hypotheses tests 4 Simulation 5 Parallelization Admin

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

R For Sql Developers. Kiran Math

Visualizing Data: Customization with ggplot2

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016

After opening Stata for the first time: set scheme s1mono, permanently

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013

Getting started with ggplot2

References R's single biggest strenght is it online community. There are tons of free tutorials on R.

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

Applied Statistics and Econometrics Lecture 6

1 Standard Errors on Different Models

22s:152 Applied Linear Regression

Stat 5303 (Oehlert): Response Surfaces 1

Section 2.3: Simple Linear Regression: Predictions and Inference

22s:152 Applied Linear Regression

Demo yeast mutant analysis

Salary 9 mo : 9 month salary for faculty member for 2004

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Data visualization with ggplot2

Stat 4510/7510 Homework 4

ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017

Data Visualization in R

Introduction to R and the tidyverse. Paolo Crosetto

An R Package for the Panel Approach Method for Program Evaluation: pampe by Ainhoa Vega-Bayo

Getting Started in R

A Short Guide to R with RStudio

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

OVERVIEW OF ESTIMATION FRAMEWORKS AND ESTIMATORS

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Transcription:

R Workshop Guide This guide reviews the examples we will cover in today s workshop. It should be a helpful introduction to R, but for more details, you can access a more extensive user guide for R on the ERC website. 1 Some Programming Basics You should always write code in a script that you can save and modify as necessary. To start a new script, access the File menu, chose New File, and then choose R Script. It s always a good idea to start by clearing your workspace. rm(list=ls(all=true)) # clear all objects in memory 1.1 Writing and executing code in R Basic calculations 4 [1] 4 "yes" [1] "yes" 2+3 [1] 5 1039/49 [1] 21.20408 46^700 [1] Inf (3.5+2.7)/(900*2) [1] 0.003444444 Assignment operator x <- 3 x [1] 3 y <- "this is a string" y [1] "this is a string" z <- 2 z 1

[1] 2 x+z [1] 5 x==5 # this is a logical operator [1] FALSE x [1] 3 x <- TRUE # assign logical values to variables x+z # explain this output numeric value of TRUE = 1, so 1 + 2 [1] 3 # clear your workspace again rm(list=ls(all=true)) 1.2 Data objects in R Vectors The function c() allows you to concatenate multiple items into a vector x <- c(1,2,3,4) x [1] 1 2 3 4 x[2] [1] 2 y <- c(5,6,7,8,9) y [1] 5 6 7 8 9 y[5] [1] 9 You can append one vector to another z <- c(x,y) z [1] 1 2 3 4 5 6 7 8 9 Another way to produce a vector containing a sequence of integers 2

q <- 1:5 q [1] 1 2 3 4 5 You can repeat vectors multiple times ab <- rep(1:5, times=3) ab [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 ab <- rep(1:5, 3) you do not need the "times" with rep; also, notice that R lets you overwrite cd <- rep(c(1,3,7,9), times=2) cd [1] 1 3 7 9 1 3 7 9 a <- seq(from=2, to=100, by=2) a [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 [22] 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 [43] 86 88 90 92 94 96 98 100 2 Performing Basic Tasks 2.1 Setting up your work space See the objects currently in memory ls() [1] "a" "ab" "cd" "q" "x" "y" "z" Clear your workspace many R users include this as the first line in any script rm(list=ls(all=true)) Working Directory The working directory is the location on your computer where R will access and save files. You can seeyour working directory, and you can set your working diretory. getwd() [1] "/Users/patriciakirkland/Dropbox/Empiprical Reasoning Center/R Workshop" setwd("/users/patriciakirkland/dropbox/empiprical Reasoning Center/R Workshop") getwd() # check again [1] "/Users/patriciakirkland/Dropbox/Empiprical Reasoning Center/R Workshop" 3

2.2 Installing and loading packages You will need to install packages to handle certain tasks. You only need to install packages once, but you will need to load them any time you want to use them. # install.packages("dplyr", dependencies=true) # install.packages("ggplot2", dependencies=true) # install.packages("foreign", dependencies=true) # install.packages("xtable", dependencies = TRUE) # install.packages("stargazer", dependencies = TRUE) # install.packages("arm", dependencies = TRUE) # load packages library(foreign) library(xtable) library(arm) Loading required package: Matrix Loading required package: lme4 arm (Version 1.8-6, built: 2015-7-7) Working directory is /Users/patriciakirkland/Dropbox/Empiprical Reasoning Center/R Workshop Attaching package: arm The following object is masked from package:xtable : display library(ggplot2) library(dplyr) library(stargazer) Please cite as: Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2. http://cran.r-project.org/package=stargazer Some useful packages: foreign load data formatted for other software xtable export code to produce tables in LaTeX arm applied regression and multi-level modeling ggplot2 make plots and figures dplyr user-friendly data cleaning & manipulation more packages: http://cran.r-project.org/web/packages/ 4

2.3 Read in data R can read data files in a variety of formats. Today, we will use a.csv file, but see below for code to read other types of data files. Note: If the data file is stored in your working directory, you need only specify the file name. However, if the file is stored somewhere else on your computer, you will need to include the file path. # csv file data <- read.csv("teachingratingsexcel.csv", header=true) #.dta file (Stata) # dtafile <- read.dta("fakedata.dta") # dtafile #.RData file # load("fakedata1.rdata") # data 2.4 Looking at data: basic info, printing objects, and generating basic summary stats See variable names and dimensions of the data names(data) [1] "minority" "age" "female" "onecredit" "beauty" "course_eval" [7] "intro" "nnenglish" dim(data) [1] 463 8 dim(data)[1] [1] 463 dim(data)[2] [1] 8 You can refer to specific rows or columns in a data frame by row or column number(s) this allows you to see a subset of your data. You could even assign it to a new object and you would have effectively subset your data. data[1,] # row 1 only minority age female onecredit beauty course_eval intro nnenglish 1 1 36 1 0 0.2899157 4.3 0 0 data[1:3,] # rows 1 to 3 only minority age female onecredit beauty course_eval intro nnenglish 1 1 36 1 0 0.2899157 4.3 0 0 2 0 59 0 0-0.7377322 4.5 0 0 3 0 51 0 0-0.5719836 3.7 0 0 5

# data[,1] # column 1 only # data[,2:4] # columns 2 to 4 only Print some or all of the data to the console # data # data[1:5,] # data[,3] head(data) minority age female onecredit beauty course_eval intro nnenglish 1 1 36 1 0 0.2899157 4.3 0 0 2 0 59 0 0-0.7377322 4.5 0 0 3 0 51 0 0-0.5719836 3.7 0 0 4 0 40 1 0-0.6779634 4.3 0 0 5 0 31 1 0 1.5097940 4.4 0 0 6 0 62 0 0 0.5885687 4.2 0 0 # data$course_eval # data$female # data$beauty # course_eval # error! why? Find out the classification or type of an object such as a data frame or a variable class(data) [1] "data.frame" class(data$course_eval) [1] "numeric" class(data$female) [1] "integer" Summarize your dataset or a specific variable. # summary() function summary(data) minority age female onecredit beauty Min. :0.0000 Min. :29.00 Min. :0.0000 Min. :0.00000 Min. :-1.4504940 1st Qu.:0.0000 1st Qu.:42.00 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:-0.6562689 Median :0.0000 Median :48.00 Median :0.0000 Median :0.00000 Median :-0.0680143 Mean :0.1382 Mean :48.37 Mean :0.4212 Mean :0.05832 Mean : 0.0000001 3rd Qu.:0.0000 3rd Qu.:57.00 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.: 0.5456024 Max. :1.0000 Max. :73.00 Max. :1.0000 Max. :1.00000 Max. : 1.9700230 course_eval intro nnenglish Min. :2.100 Min. :0.0000 Min. :0.00000 1st Qu.:3.600 1st Qu.:0.0000 1st Qu.:0.00000 Median :4.000 Median :0.0000 Median :0.00000 6

Mean :3.998 Mean :0.3391 Mean :0.06048 3rd Qu.:4.400 3rd Qu.:1.0000 3rd Qu.:0.00000 Max. :5.000 Max. :1.0000 Max. :1.00000 summary(data$beauty) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.4500000-0.6563000-0.0680100 0.0000001 0.5456000 1.9700000 Tables # table() function table(data$female, usena="always") 0 1 <NA> 268 195 0 crosstab <- table(data$female, data$minority, usena="always", dnn=c("gender", "Race or Ethnicity")) crosstab <- crosstab[c(2, 1, 3), c(2, 1, 3)] row.names(crosstab) <- c("female", "Male", "NA") colnames(crosstab) <- c("minority", "White", "NA") mytable <- table(data$female, data$minority, usena="always", dnn=c("female", "Minority")) margin.table(mytable, 1) Female 0 1 <NA> 268 195 0 margin.table(mytable, 2) Minority 0 1 <NA> 399 64 0 prop.table(mytable) Minority Female 0 1 <NA> 0 0.51835853 0.06047516 0.00000000 1 0.34341253 0.07775378 0.00000000 <NA> 0.00000000 0.00000000 0.00000000 prop.table(mytable, 1) Minority Female 0 1 <NA> 0 0.8955224 0.1044776 0.0000000 1 0.8153846 0.1846154 0.0000000 <NA> prop.table(mytable, 2) Minority Female 0 1 <NA> 0 0.6015038 0.4375000 1 0.3984962 0.5625000 <NA> 0.0000000 0.0000000 7

2.5 Basic histograms and scatterplots A few easy ways to see the distribution of your data. We will look at some more complex figures later. Histogram # hist() hist(data$course_eval, breaks=25, main="histogram of Outcome Variable - Course Evaluation", xlab="outcom Histogram of Outcome Variable Course Evaluation Frequency 0 10 20 30 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Outcome Variable Y Scatterplot # plot() plot(data$beauty, data$course_eval, main="scatterplot of Beauty and Course Evaluations", pch=16) abline(v=0, col="red") abline(h=3.5, col="grey80", lty=2, lwd=3) 8

Scatterplot of Beauty and Course Evaluations data$course_eval 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 data$beauty You can save a plot in PDF format. R will save the file to your working directory unless you specify a different file path. # save to disk pdf("basic_plot.pdf") plot(data$beauty, data$course_eval, main="scatterplot of Beauty and Course Evaluations", pch=16) abline(v=0, col="red") abline(h=3.5, col="grey80", lty=2, lwd=3) dev.off() pdf 2 9

2.6 Basic operators Arithmetic/Math/Numeric Operators + / addition subtraction multiplication division An example make a new variable age_squared data$age_squared <- data$age^2 2.7 Logical Operators Logical operators test conditions. For example, you might want a subset of data that includes observations for which a specific variable exceeds some value, or you may want to find observations with missing values. You can also use these operators to generate variables and data often using the if() or ifelse() function. < less than <= less than or equal to > greater than >= greater than or equal to == exactly equal to! = not equal to!x Not x x y x OR y x & y x AND y istrue(x) test if X is TRUE Example make a new variable using a logical test to determine which subjects are minorities who are non-native English speakers data$nnenglish_minority <- data$minority == 1 & data$nnenglish # data$nnenglish_minority data$nnenglish_minority <- as.numeric(data$nnenglish_minority) # data$nnenglish_minority Now make a new variable to indicate whether a subject is older than the average age. We can use the ifelse() function. 10

data$older <- ifelse(data$age > mean(data$age), 1, 0) 2.8 Subsetting data data[4,3] [1] 1 data[4,] minority age female onecredit beauty course_eval intro nnenglish age_squared 4 0 40 1 0-0.6779634 4.3 0 0 1600 nnenglish_minority older 4 0 0 data[,3] [1] 1 0 0 1 1 0 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 [43] 1 0 0 0 1 1 1 0 1 0 1 1 0 0 1 1 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 0 1 1 [85] 0 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 [127] 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 [169] 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 [211] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 [253] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 [295] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 [337] 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [379] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 [421] 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 [463] 1 data[4:10, 2:3] age female 4 40 1 5 31 1 6 62 0 7 33 1 8 51 1 9 33 1 10 47 0 You can also subset by variables. Designate variables to keep or exclude. select.vars <- c("course_eval", "female") # data[select.vars] # data[ data$female==1,] Make a new data frame that includes only women female <- data[ data$female==1,] Here is another way to make a new data frame that includes only women 11

female2 <- subset(data, female==1) 2.9 Writing data to disk write.csv(data, "evaluation_data.csv", row.names=false) write.dta(data, "evaluation_data.dta") save(data, file="evaluation_data.rdata") # save just a data frame save.image(file="course_evaluations.rdata") # save your current workspace 2.10 Regression We could simply proceed, but let s clear the workspace and load the.rdata file we just saved rm(list=ls(all=true)) load("evaluation_data.rdata") # clear all objects in memory # load the data Specify a regression model the following examples are OLS models. See the more detailed user guide for more information on other classes of models. fit_1 <- lm(course_eval ~ female, data=data) summary(fit_1) Call: lm(formula = course_eval ~ female, data = data) Residuals: Min 1Q Median 3Q Max -1.96903-0.36903 0.03097 0.43097 0.99897 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 4.06903 0.03355 121.29 < 2e-16 *** female -0.16800 0.05169-3.25 0.00124 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.5492 on 461 degrees of freedom Multiple R-squared: 0.0224,Adjusted R-squared: 0.02028 F-statistic: 10.56 on 1 and 461 DF, p-value: 0.001239 To include additional independent variables... fit_2 <- lm(course_eval ~ female + beauty + age + minority + nnenglish, data=data) summary(fit_2) 12

Call: lm(formula = course_eval ~ female + beauty + age + minority + nnenglish, data = data) Residuals: Min 1Q Median 3Q Max -1.87797-0.35784 0.04323 0.37956 1.02073 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 4.241680 0.143283 29.604 < 2e-16 *** female -0.207452 0.052563-3.947 9.17e-05 *** beauty 0.140942 0.032938 4.279 2.29e-05 *** age -0.002707 0.002750-0.984 0.32545 minority -0.044374 0.075725-0.586 0.55817 nnenglish -0.313490 0.108630-2.886 0.00409 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.5324 on 457 degrees of freedom Multiple R-squared: 0.08919,Adjusted R-squared: 0.07922 F-statistic: 8.95 on 5 and 457 DF, p-value: 4.001e-08 To add fixed effects... fit_3 <- lm(course_eval ~ factor(intro) + female + beauty + age + minority + nnenglish, data=data) summary(fit_3) Call: lm(formula = course_eval ~ factor(intro) + female + beauty + age + minority + nnenglish, data = data) Residuals: Min 1Q Median 3Q Max -1.84713-0.35266 0.04673 0.38961 1.05248 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 4.182596 0.146567 28.537 < 2e-16 *** factor(intro)1 0.098401 0.054097 1.819 0.069570. female -0.197257 0.052730-3.741 0.000207 *** beauty 0.140213 0.032858 4.267 2.41e-05 *** age -0.002238 0.002756-0.812 0.417182 minority -0.070909 0.076930-0.922 0.357154 nnenglish -0.274246 0.110484-2.482 0.013415 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.5311 on 456 degrees of freedom Multiple R-squared: 0.09575,Adjusted R-squared: 0.08385 13

F-statistic: 8.047 on 6 and 456 DF, p-value: 2.836e-08 To include an interaction... fit_4 <- lm(course_eval ~ female*beauty + age + minority + nnenglish, data=data) summary(fit_4) Call: lm(formula = course_eval ~ female * beauty + age + minority + nnenglish, data = data) Residuals: Min 1Q Median 3Q Max -1.84616-0.34549 0.04303 0.39253 1.05515 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 4.217031 0.143705 29.345 < 2e-16 *** female -0.204181 0.052488-3.890 0.000115 *** beauty 0.193681 0.045052 4.299 2.1e-05 *** age -0.002169 0.002763-0.785 0.432713 minority -0.017367 0.077195-0.225 0.822097 nnenglish -0.330643 0.108864-3.037 0.002525 ** female:beauty -0.111446 0.065107-1.712 0.087627. --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.5313 on 456 degrees of freedom Multiple R-squared: 0.095,Adjusted R-squared: 0.08309 F-statistic: 7.978 on 6 and 456 DF, p-value: 3.372e-08 You can also export regression results there are multiple packages you could use, but the example below uses stargazer. # stargazer(fit_1, fit_2, fit_3, fit_4, omit=("intro"), # omit.stat=("n"), # add.lines=list(c("fixed Effects", "No", "intro", "No")), # notes=c("ols Regression models."), # notes.align="l", # notes.append=t, # covariate.labels = c(), # float=f, dep.var.labels = "Course Evaluation", # out = "course_eval_regressions") 3 More Plots & Figures We can start by creating factors factors designate groups or categories (this is optional, depending on the figures you need). 14

data$gender <- factor(data$female,levels=c(0, 1), labels=c("male","female")) data$minority_status <- factor(data$minority,levels=c(0,1), labels=c("non-minority","minority")) data$age_status <- factor(data$older,levels=c(0, 1), labels=c("younger","older")) You need the ggplo2t package to use the qplot() & ggplot() functions. # Kernel density plots for course evaluations # grouped by number of gender (indicated by color) qplot(course_eval, data=data, geom="density", fill=gender, alpha=i(.5), main="distribution of Course Evaluations", xlab="evaluation Score", ylab="density") 15

Distribution of Course Evaluations 0.6 Density 0.4 gender Male Female 0.2 0.0 2 3 4 5 Evaluation Score # Histogram for course evaluations # grouped by number of gender (indicated by color) qplot(course_eval, data=data, geom="histogram", fill=gender, alpha=i(.75), main="distribution of Course Evaluations", xlab="evaluation Score", ylab="density") stat_bin: binwidth defaulted to range/30. Use binwidth = x to adjust this. 16

Distribution of Course Evaluations 30 Density 20 gender Male Female 10 0 2 3 4 5 Evaluation Score # Scatterplot of course evaluations vs. beauty for each combination of gender and age_status # in each facet, gender is represented by shape and color qplot(course_eval, beauty, data=data, shape=gender, color=gender, facets=age_status~minority_status, size=i(3), xlab="beauty", ylab="course Evaluation") 17

2 Non minority Minority 1 Course Evaluation 0 1 2 1 0 Younger Older gender Male Female 1 2 3 4 5 2 3 4 5 Beauty # Separate regressions of course evaluations on beauty for each gender qplot(beauty, course_eval, data=data, geom=c("point", "smooth"), method="lm", formula=y~x, color=gender, main="regression of Evaluations on Beauty", xlab="beauty", ylab="course Evaluation") 18

Regression of Evaluations on Beauty 5 4 Course Evaluation gender Male Female 3 2 1 0 1 2 Beauty # Boxplots of course evaluations by gender # observations (points) are overlayed and jittered qplot(gender, course_eval, data=data, geom=c("boxplot", "jitter"), fill=gender, main="course Evaluations by Gender", xlab="", ylab="course Evaluations") 19

Course Evaluations by Gender 5 4 Course Evaluations gender Male Female 3 2 Male Female plot <- ggplot(data, aes(beauty, course_eval)) + geom_point(alpha=.5) + geom_smooth() plot geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use method = x to change the smoothing method. 20

5 4 course_eval 3 2 1 0 1 2 beauty plot <- ggplot(data, aes(beauty, course_eval)) + geom_point(colour="green", alpha=1) + geom_smooth(method="lm", colour="black", se=false) + scale_y_continuous(limits=c(0, 10)) + scale_x_continuous(limits=c(-2, 2.5)) + theme_bw() + xlab("beauty") + ylab("course Evaluations") + ggtitle("course Evaluations & Beauty") + geom_vline(xintercept = 0, colour="grey") plot 21

Course Evaluations & Beauty 10.0 7.5 Course Evaluations 5.0 2.5 0.0 2 1 0 1 2 Beauty plot_2 <- plot + theme_bw() + ylab("course Evaluations") + xlab("beauty") + ggtitle("course Evaluations & Beauty") + scale_y_continuous(limits=c(0, 6), breaks=seq(1, 6, 1.5)) + scale_x_continuous(limits=c(-2, 2), breaks=seq(-2, 2,.5)) Scale for y is already present. Adding another scale for y, which will replace the existing scale. Scale for x is already present. Adding another scale for x, which will replace the existing scale. plot_2 22

Course Evaluations & Beauty 5.5 Course Evaluations 4.0 2.5 1.0 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Beauty plot_3 <- ggplot(data, aes(beauty, course_eval)) + geom_point(alpha=.5) + geom_smooth(se=false) + theme_bw() + ylab("course Evaluation") + xlab("beauty") + ggtitle("course Evaluations & Beauty") + scale_y_continuous(limits=c(1, 6), breaks=seq(1.5, 6, 1.5)) + scale_x_continuous(limits=c(-1, 2.5), breaks=seq(-1.5, 2.5, 1)) plot_3 geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use method = x to change the smoothing method. Warning: Removed 46 rows containing missing values (stat_smooth). Warning: Removed 46 rows containing missing values (geom_point). 23

Course Evaluations & Beauty 6.0 4.5 Course Evaluation 3.0 1.5 0.5 0.5 1.5 2.5 Beauty plot_4 <- plot_3 %+% aes(age, course_eval) + ylab("course Evaluation") + xlab("age") + ggtitle("course Evaluations & Age") + scale_x_continuous(limits=c(25, 75), breaks=seq(25, 75, 15)) Scale for x is already present. Adding another scale for x, which will replace the existing scale. plot_4 geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use method = x to change the smoothing method. 24

Course Evaluations & Age 6.0 4.5 Course Evaluation 3.0 1.5 25 40 55 70 Age To save a plot as a PDF... name the file pdf("plot_evals_age.pdf") print the object (plot) print(plot_4) geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use method = x to change the smoothing method. close the figure file dev.off() pdf 2 25