Stat 579: More Preliminaries, Reading from Files

Similar documents
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

A Knitr Demo. Charles J. Geyer. February 8, 2017

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

1 Lab 1. Graphics and Checking Residuals

Math 263 Excel Assignment 3

Gelman-Hill Chapter 3

AA BB CC DD EE. Introduction to Graphics in R

Statistics Lab #7 ANOVA Part 2 & ANCOVA

36-402/608 HW #1 Solutions 1/21/2010

Bernt Arne Ødegaard. 15 November 2018

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

A (very) brief introduction to R

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,

Getting Started in R

Getting Started in R

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.1: Intro to Simple Linear Regression & Least Squares

Practical 2: Plotting

NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age

An Introductory Guide to R

Handling Missing Values

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

Stat 5303 (Oehlert): Response Surfaces 1

Estimating R 0 : Solutions

Comparing Fitted Models with the fit.models Package

STAT Statistical Learning. Predictive Modeling. Statistical Learning. Overview. Predictive Modeling. Classification Methods.

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

Stat 579: Objects in R Vectors

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Introduction to R, Github and Gitlab

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Section 2.2: Covariance, Correlation, and Least Squares

Binary Regression in S-Plus

Solution to Bonus Questions

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Cross-Validation Alan Arnholt 3/22/2016

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

Introduction to R. base -> R win32.exe (this will change depending on the latest version)

R package

Statistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015

Advanced Econometric Methods EMET3011/8014

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

References R's single biggest strenght is it online community. There are tons of free tutorials on R.

Multiple Linear Regression

Section 2.1: Intro to Simple Linear Regression & Least Squares

Applied Statistics and Econometrics Lecture 6

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

WINKS SDA Statistical Data Analysis and Graphs. WINKS R Command Summary Reference Guide

Regression on the trees data with R

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

S CHAPTER return.data S CHAPTER.Data S CHAPTER

Multiple Regression White paper

Introductory Guide to SAS:

R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R

Package qvcalc. R topics documented: September 19, 2017

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics

Among those 14 potential explanatory variables,non-dummy variables are:

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Linear Modeling with Bayesian Statistics

Lab 07: Multiple Linear Regression: Variable Selection

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1

Introduction to R. Introduction to Econometrics W

Simulating power in practice

22s:152 Applied Linear Regression

Introduction to hypothesis testing

22s:152 Applied Linear Regression

Regression III: Advanced Methods

A Short Guide to R with RStudio

Regression Analysis and Linear Regression Models

R - A Gentle Introduction

Data Management Project Using Software to Carry Out Data Analysis Tasks

Some methods for the quantification of prediction uncertainties for digital soil mapping: Universal kriging prediction variance.

MDM 4UI: Unit 8 Day 2: Regression and Correlation

A Very Brief EViews Tutorial

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

Salary 9 mo : 9 month salary for faculty member for 2004

BIOSTAT640 R Lab1 for Spring 2016

Chapter 1 Linear Equations

An R Package for the Panel Approach Method for Program Evaluation: pampe by Ainhoa Vega-Bayo

Stat 579: List Objects

STAT 540 Computing in Statistics

[1] CURVE FITTING WITH EXCEL

Lab 1: Introduction, Plotting, Data manipulation

Package sure. September 19, 2017

Stat 290: Lab 2. Introduction to R/S-Plus

Model Selection and Inference

Control Flow Structures

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

8.1 R Computational Toolbox Tutorial 3

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec

Package GLDreg. February 28, 2017

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.

Stat 4510/7510 Homework 4

Homework set 4 - Solutions

GEN BUS 806 R COMMANDS

STA 570 Spring Lecture 5 Tuesday, Feb 1

Regression III: Lab 4

Transcription:

Stat 579: More Preliminaries, Reading from Files Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu September 1, 2011, 1/10

Some more introductory examples I Let us make a vector containing the sequence 1 through 20: > x <- 1:20 How do we call this object? To do that, we simply type: > x Let us try a simple operation on this object: > w <- 1 + sqrt(x)/2 This operation takes element-wise square root of the vector x and adds 1 to each coordinate. Moving on, can we get what this does? > dummy <- data.frame(x = x, y = x + rnorm(x)*w) > dummy and we make a data frame of two columns, x and y and look at it., 1/10

Some more introductory examples II Consider the following: > fm <- lm(formula = y x, data=dummy) > summary(fm) Call: lm(formula = y x, data = dummy) Residuals: Min 1Q Median 3Q Max -3.6315-0.8137 0.2134 0.8470 5.0178 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.63569 0.97234 1.682 0.11 x 0.84072 0.08117 10.358 5.19e-09 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 2.093 on 18 degrees of freedom Multiple R-squared: 0.8563, Adjusted R-squared: 0.8483 F-statistic: 107.3 on 1 and 18 DF, p-value: 5.187e-09 We fit a simple linear regression of y on x, store as a dataframe and look at the results., 2/10

Some more introductory examples III > attach(dummy) Make the columns in the data frame visible as variables. > plot(x = x, y = y) > abline(a = 0, b = 1, lty=3) # The true regression line: (intercept 0, slope 1). > abline(coef(fm)) # The simple linear regression line. > detach() Removed data frame from the search path. > plot(x = fitted(fm), y = resid(fm), xlab = "Fitted values", ylab = "Residuals", main="residuals vs Fitted") A standard regression diagnostic plot to check for heteroscedasticity. Can you see it? > rm(fm, x, y, dummy) > q(), 3/10

Getting help with functions and features R has an inbuilt help facility similar to the man facility of UNIX. To get more information on any specific named function, for example solve, the command is > help(solve) An alternative is >?solve For a feature specified by special characters, the argument must be enclosed in double or single quotes, making it a haracter string This is also necessary for a few words with syntactic meaning including if, for and function. > help("[[") Either form of quote mark may be used to escape the other, as in the string It s important. Our convention is to use double quote marks for preference., 4/10

Additional Help Features The help.search command allows searching for help in various ways: try?help.search for details and examples. The examples on a help topic can normally be run by > example(topic) Windows versions of R have other optional help systems: use >?help for further details., 5/10

Additional Resources The R-help mailing list: subscribe to R-help from the CRAN webpage best way to get help here is to isolate the problem we are having, then create a simple self-contained example containing the problematic code and posting no questions on the class, homework, etc! (I monitor the list.) The R function RSiteSearch lets us search the archives of this mailing list. Online fora: http://cos.name/en/ or our TA s website: http://yihui.name/en/ Remember to make use of these resources, 6/10

Reading Data from Files For reading data files, we need to know a few things: R s input facilities are fairly simple. The requirements are fairly strict and rather inflexible. There is a clear presumption by the designers of R that we are able to modify input files to satisfy R s input requirements. In many cases, this is straightforward using tools such as file editors, or perl or awk, etc. If variables are to be held mainly in data frames, an entire data frame can be read directly with the read.table() function. There is also a more primitive input function, scan(), that can be called directly., 7/10

An Example: Housing Data I Price Floor Area Rooms Age Cent.heat 01 52.00 111.0 830 5 6.2 no 02 54.75 128.0 710 5 7.5 no 03 57.50 101.0 1000 5 4.2 no 04 57.50 131.0 690 6 8.8 no 05 59.75 93.0 900 5 1.9 yes By default numeric items (except row labels) are read as numeric variables and non-numeric variables, such as Cent.heat in the example, as factors. This can be changed if necessary. The function read.table() can then be used to read the data frame directly. > HousePrice <- read.table(file = "http://maitra.public.iastate.edu/stat579/houses.dat"), 8/10

An Example: Housing Data II Often we may want to omit including the row labels directly and use the default labels. In this case the file may omit the row label column. The data frame may then be read as > HousePrice <- read.table(file = "http://maitra.public.iastate.edu/stat579/houses.dat", header = T) where the header=true option specifies that the first line is a line of headings, and hence, by implication from the form of the file, that no explicit row labels are given. Reading from a local file? > HousePrice <- read.table(file = "houses.dat", header = T) In Windows, this is quite different (see next page)., 9/10

Reading Local Files on Windows Get the path name of the local file Let us say it is: C:\Documents and Settings\stat579\houses.dat Then we use: Houses <- read.table(file = C:\\Documents and Settings\\stat579\\houses.dat, header = T) Note the extra backslash before each backslash which tells R to read it in as a special character. More ways of reading in datafiles will be addressed later., 10/10