Common Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics

Similar documents
Common R commands used in Data Analysis and Statistical Inference

Statistics 251: Statistical Methods

Applied Regression Modeling: A Business Approach

Minitab 17 commands Prepared by Jeffrey S. Simonoff

R Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock

Univariate Data - 2. Numeric Summaries

8. MINITAB COMMANDS WEEK-BY-WEEK

Stat 290: Lab 2. Introduction to R/S-Plus

Brief Guide on Using SPSS 10.0

Using R. Liang Peng Georgia Institute of Technology January 2005

DSCI 325: Handout 18 Introduction to Graphics in R

Fathom Dynamic Data TM Version 2 Specifications

Excel 2010 with XLSTAT

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Visualizing univariate data 1

Statistical Graphics

Appendix A: A Sample R Session

MINITAB 17 BASICS REFERENCE GUIDE

Univariate Data - 2. Numeric Summaries

Applied Regression Modeling: A Business Approach

Package visualizationtools

IST 3108 Data Analysis and Graphics Using R Week 9

Introductory Applied Statistics: A Variable Approach TI Manual

Descriptive Statistics, Standard Deviation and Standard Error

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0

Appendix A: A Sample R Session

Az R adatelemzési nyelv

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

DoE with Visual-XSel 13.0

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

An Introductory Guide to R

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /6/ /13

Quantitative - One Population

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

UP School of Statistics Student Council Education and Research

Solution to Tumor growth in mice

Multiple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Package GLDreg. February 28, 2017

CARTWARE Documentation

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113

Package areaplot. October 18, 2017

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA

TI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock

Package sciplot. February 15, 2013

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Statistics Lecture 6. Looking at data one variable

Chapter 5snow year.notebook March 15, 2018

Package cgh. R topics documented: February 19, 2015

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Statistical Programming Camp: An Introduction to R

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0.

Making Science Graphs and Interpreting Data

Lecture 25: Review I

R Bootcamp Part I (B)

IQR = number. summary: largest. = 2. Upper half: Q3 =

Correlation. January 12, 2019

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

STA215 Inference about comparing two populations

The RcmdrPlugin.HH Package

Package uclaboot. June 18, 2003

Nonparametric Approaches to Regression

Package nplr. August 1, 2015

Release notes for StatCrunch mid-march 2015 update

The simpleboot Package

Instructions for Using ABCalc James Alan Fox Northeastern University Updated: August 2009

Statistical Programming with R

CS&s/STAT 566 Class Lab 3 January 22, 2016

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition

Introduction to R, Github and Gitlab

The Power and Sample Size Application

STATA 13 INTRODUCTION

Handling Missing Values

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions

Types of Plotting Functions. Managing graphics devices. Further High-level Plotting Functions. The plot() Function

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

Minitab 18 Feature List

Basic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19

R for IR. Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University

Technical Support Minitab Version Student Free technical support for eligible products

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

An Introduction to R 2.2 Statistical graphics

This document is designed to get you started with using R

Graph tool instructions and R code

Package SmoothHazard

Using Built-in Plotting Functions

Chapter 2 Modeling Distributions of Data

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)

Package condir. R topics documented: February 15, 2017

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Appendix: The Analysis of Questionnaire Data using R: Memory Card

Package MatchingFrontier

Chapter 4: Analyzing Bivariate Data with Fathom

Courtesy :

R Graphics. SCS Short Course March 14, 2008

Transcription:

Common Sta 101 Commands for R 1 One quantitative variable summary(x) # most summary statitstics at once mean(x) median(x) sd(x) hist(x) boxplot(x) # horizontal = TRUE for horizontal plot qqnorm(x) qqline(x) # for normal probability plot and straight line 2 One categorical variable table(x) barplot(table(x)) 3 Two categorical variables 1

table(x,y) barplot(table(x,y)) # beside = TRUE for side-by-side barplot # legend = TRUE to include a color legend mosaicplot(table(x,y)) 4 One categorical and one quantitative variable y = quantitative x = categorical by(y, x, summary) # summary by group by(y, x, mean) # mean by group by(y, x, sd) # sd by group boxplot(y ~ x) 5 Two quantitative variables, Simple linear regression Note: Out of scope for project 1. cor(x,y) # use = "complete.obs" to get rid of NA values slr = lm(y ~ x) summary(slr) 2

# linear model and the model output plot(y ~ x) 6 Multiple linear regression mlr = lm(y ~ x1 + x2 +...) summary(mlr) # linear model and the model output 7 Regression diagnostics # in the code below m is the regression model plot(m$residuals ~ x) # residuals vs. an explanatory variable plot(m$residuals ~ m$fitted) # residuals vs. fitted (predicted) values of y from the model plot(m$residuals) # residuals vs. order of data collection hist(m$residuals) # histogram of residuals qqnorm(m$residuals) qqline(m$residuals) # normal probability plot of residuals 8 Subsetting subset(dataname,!is.na(x)) # the data set "dataname", but only cases for which x is not NA subset(dataname, x == "levela") # the data set "dataname", but only cases for which x is equal to "levela" x[!is.na(x)] # the variable x, but only cases for which x is not NA y[!is.na(x)] # the variable y, but only cases for which x is not NA x[x < 30] # the variable x, but only cases for which x is less than 30 3

x[x!= "levela"] # the variable x, but only cases for which x does not equal "levela" droplevels(x) # drops empty levels if you have removed all the cases from one level 9 Probability distributions pnorm(q, mean, sd) # calculate area under the normal curve below q # for a normal distribution with given mean and sd dnorm(x, mean, sd) # calculate the normal probability density at x (can be a vector) # for a normal distribution with given mean and sd, # useful for plotting a normal curve over a histogram dbinom(x, size, prob) # calculate the probability for x successes in size trials, # where probability of success is prob 10 Plotting lines abline(h = value) # add a horizontal line to an existing plot abline(v = value) # add a vertical line to an existing plot abline(lm(y~x)) # overlays linear regression line on the scatterplot of y vs. x, # only works if plot(y ~ x) ran first 11 Sampling sample(x, size, replace = FALSE) # sample from x size number of elements without replacement (default) # to sample with replacement replace = TRUE 12 Plotting options These arguments can be passed to the plot, or hist, or other similar functions. To learn more about all plotting parameters, type?par. 4

main = "main title" # title of plot, to be placed in the top center xlab = "x-axis label" # x-axis label ylab = "y-axis label" # y-axis label xlim = c(min,max) # x-axis limits ylim = c(min,max) # y-axis limits 13 inference function Use the following command to load the inference function: source("http://stat.duke.edu/~mc301/r/inference.r") inference(y, x, est, type, method, null, alternative, success, order, conflevel, siglevel, nsim, eda_plot, inf_plot, sum_stats) # y = response variable, categorical or numerical variable # x = explanatory variable, categorical (optional) # est = "mean", "median", or "proportion" # type = "ci" for confidence interval, or "ht" for hypothesis test # method = "theoretical" or "simulation" # null = (optional) null value, does not need to be defined for chi-square or ANOVA # alternative = (optional) "less", "greater", or "twosided" # success = (optional) if data is categorical, the name of the level that is # defined as success # order = (optional) if group is defined, the order in which to subtract the groups # conflevel = (optional) for confidence intervals, 0.95 by default, # can be any number between 0 and 1 # siglevel = (optional) for hypothesis testing, 0.05 by default, # can be any number between 0 and 1 # nsim = (optional) number of simulations, 10000 by default, # use lower if sample size is too high and simulations take a long time # eda_plot = (optional) FALSE to hide the exploratory analysis plot # inf_plot = (optional) FALSE to hide the inference plot # sum_stats = (optional) FALSE to hide summary statistics 5