Binary Regression in S-Plus

Similar documents
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

Discriminant analysis in R QMMA

run ld50 /* Plot the onserved proportions and the fitted curve */ DATA SETR1 SET SETR1 PROB=X1/(X1+X2) /* Use this to create graphs in Windows */ gopt

Stat 4510/7510 Homework 4

Logistic Regression. (Dichotomous predicted variable) Tim Frasier

Generalized Additive Models

Multinomial Logit Models with R

Stat 290: Lab 2. Introduction to R/S-Plus

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

Poisson Regression and Model Checking

Dynamic Network Regression Using R Package dnr

Nina Zumel and John Mount Win-Vector LLC

Unit 5 Logistic Regression Practice Problems

Package ordinalnet. December 5, 2017

Using the SemiPar Package

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

STAT Statistical Learning. Predictive Modeling. Statistical Learning. Overview. Predictive Modeling. Classification Methods.

Stat 8053, Fall 2013: Additive Models

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

The glmmml Package. August 20, 2006

URLs identification task: Istat current status. Istat developed and applied a procedure consisting of the following steps:

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

1 The SAS System 23:01 Friday, November 9, 2012

Using HLM for Presenting Meta Analysis Results. R, C, Gardner Department of Psychology

Ben Baumer Instructor

Generalized Additive Model

Lasso.jl Documentation

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering

The lasso2 Package. April 5, 2006

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Package optimus. March 24, 2017

Stat 579: More Preliminaries, Reading from Files

1 Condence Intervals for Mean Value Parameters

Applied Regression Modeling: A Business Approach

Package swcrtdesign. February 12, 2018

Applied Regression Modeling: A Business Approach

Generalized least squares (GLS) estimates of the level-2 coefficients,

Package dglm. August 24, 2016

AA BB CC DD EE. Introduction to Graphics in R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

Beta-Regression with SPSS Michael Smithson School of Psychology, The Australian National University

Machine Learning: Practice Midterm, Spring 2018

A quick introduction to First Bayes

Introductory Guide to SAS:

Lab 1: Introduction, Plotting, Data manipulation

Chapitre 2 : modèle linéaire généralisé

Package ToTweedieOrNot

Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident.

Dr. Barbara Morgan Quantitative Methods

Discussion Notes 3 Stepwise Regression and Model Selection

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Package plsrbeta. February 20, 2015

Package endogenous. October 29, 2016

Fathom Dynamic Data TM Version 2 Specifications

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113

The linear mixed model: modeling hierarchical and longitudinal data

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Generalized Additive Models

Regression Analysis and Linear Regression Models

Instrumental variables, bootstrapping, and generalized linear models

ISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report. Team Member Names: Xi Yang, Yi Wen, Xue Zhang

Correctly Compute Complex Samples Statistics

Homework set 4 - Solutions

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

A (very) brief introduction to R

STATS PAD USER MANUAL

Package GAMBoost. February 19, 2015

Dealing with Categorical Data Types in a Designed Experiment

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

Curve Fitting with Linear Models

Correctly Compute Complex Samples Statistics

Gelman-Hill Chapter 3

Modelling Proportions and Count Data

Package lasso2. November 27, 2018

Modelling Proportions and Count Data

The biglm Package. August 25, bigglm... 1 biglm Index 5. Bounded memory linear regression

Also, for all analyses, two other files are produced upon program completion.

Package bigreg. July 25, 2016

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

CH9.Generalized Additive Model

Multiple Linear Regression

Package robustgam. February 20, 2015

Programming Exercise 1: Linear Regression

Getting Started in R

GxE.scan. October 30, 2018

options(width = 65) suppressmessages(library(mi)) data(nlsyv, package = "mi")

Stat 5303 (Oehlert): Response Surfaces 1

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

Introduction to Mixed-Effects Models for Hierarchical and Longitudinal Data

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Linear Methods for Regression and Shrinkage Methods

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7

Simulating power in practice

TI Technology Guide for Is leisure time really shrinking?

Transcription:

Fall 200 STA 216 September 7, 2000 1 Getting Started in UNIX Binary Regression in S-Plus Create a class working directory and.data directory for S-Plus 5.0. If you have used Splus 3.x before, then it is necessary to create a new.data directory for using S-Plus 5.0 as the way in which data and objects are stored is not compatible between the two versions. okeeffe% mkdir sta216 okeeffe% cd sta216 okeeffe% Splus5 CHAPTER okeeffe% ls -a./../.data/ To start S-Plus 5.0, enter Splus5 -e ( the -e option allows you to edit your commands on the command line) -or- I prefer to run S-Plus under emacs. The advantage of running under emacs is that you can easily edit your history, commands, and functions and create scripts to automate procedures. Start emacs in your class directory. Then enter M-x (M is the meta or Esc key) followed by entering S+5. You will be prompted for the starting directory (edit or hit return), the buffer will show S-PLUS : Copyright (c) 1988, 1999 MathSoft, Inc. S : Copyright Lucent Technologies, Inc. Version 5.1 Release 1 for DEC alpha, Digital UNIX (OSF/1) V4.0 : 1999 Working data will be in.data > If you are using the Windows version in the clusters, go to the Start button on the task bar, then select Programs. S-Plus 2000 should be under the Statistics programs. Under the Windows menu at the top, select Command Window to open the window where commands will be issued. While many of the functions we will use are available from the menus, I will cover the command version so that the same syntax can be used for both the PC s and Unix platforms. 2 Reading in Data The following data comprise temperature readings (in degrees F) and indicators of O-ring failure of the space shuttle for 24 launches prior to the Challenger disaster in 1986. temp failure 53 1 56 1 57 1 63 0 66 0 67 0 80 0 81 0 S-Plus stores data in objects called dataframes. To read the datafile orings.dat into a dataframe, use the command read.table: 1

orings <- read.table("orings.dat", header=t) The option header=t is used when the first line of the file contains the column or variable names. (In the Windows version you can use the Import option under the file menu. hint, for text files, you should rename them with the ending.asc for ASCII rather than.dat.) To refer to the a variable in a dataframe, you can use matrix notation i.e the first temperature observation is orings[1,1] or the entire vector is oring[,1]. Dataframes in S-Plus are also em lists so you can refer to columns by names, orings$temp. If you wish to refer to the variables by names without using the dataframe name, you may attach the dataframe: > attach(orings) To create a scatter plot of the data with a title enter: > plot(temp, failure, xlab="temperature", ylab="failure Indicator") > title("o-ring Failures") O-ring Failures Failure Indicator 0.0 0.2 0.4 0.6 0.8 1.0 55 60 65 70 75 80 Temperature How should we model failures as a function of temperature? Do failures depend on temperature? What is the failure probability at 31 o F? 2

3 Models Random component: Each observation Y i has a Bernoulli distribution with a probability of failure, π i, i = 1,., 24 (independent?) Systematic component: linear predictor η i = β 0 + β 1 temp i Quadratic temperature term? Link between π and η Which link? identity: π = η canonical or logit: logit(π) = θ = η probit: Φ 1 (π) = η Student-t or other inverse cdf: F 1 (π) = η complementary log-log: log( log(1 π)) = η 3.1 Estimation To fit a GLM in S-Plus, we will use the function glm. To fit a logit model with temperature plus an intercept as the linear predictor use: > oring.logit <- glm(failure ~ temp, family=binomial(link=logit), data=orings) form of the linear predictor is determined by the model expression, the first argument. By default an intercept is included. The output of this is a glm.object, assigned to oring.logit. To summarize the output use the function summary > summary(oring.logit) Call: glm(formula = failure ~ temp, family = binomial(link=logit), data = orings) Deviance Residuals: Min 1Q Median 3Q Max -1.212493-0.8252676-0.470546 0.5907502 2.051237 Coefficients: Value Std. Error t value (Intercept) 10.8753321 5.69793801 1.908643 temp -0.1713202 0.08336339-2.055102 (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: 28.97459 on 23 degrees of freedom Residual Deviance: 23.03045 on 22 degrees of freedom Number of Fisher Scoring Iterations: 4 Correlation of Coefficients: (Intercept) temp -0.9958713 3

For the clog-log link: > oring.cloglog <- glm(failure ~ temp, family=binomial(link=cloglog), data=orings) > summary(oring.cloglog) Call: glm(formula = failure ~ temp, family = binomial(link = cloglog), data = orings) Deviance Residuals: Min 1Q Median 3Q Max -1.215259-0.7975805-0.468455 0.3467605 2.062026 Coefficients: Value Std. Error t value (Intercept) 8.9361729 3.72824308 2.396886 temp -0.1466572 0.05662127-2.590144 (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: 28.97459 on 23 degrees of freedom Residual Deviance: 22.4359 on 22 degrees of freedom Number of Fisher Scoring Iterations: 5 Correlation of Coefficients: (Intercept) temp -0.9940828 To obtain estimates of the probabilities, use the predict function (see help for other options with it) > predict(oring.logit, type="response") > predict(oring.cloglog, type="response") # to add to the graph > lines(temp, predict(oring.logit, type="response"), lwd=2, lty=1) > lines(temp, predict(oring.cloglog, type="response"), lwd=2, lty=2) > lines(temp, predict(oring.probit, type="response"), lwd=2, lty=3) > legend(70,.8, c("logit", "cloglog", "probit"), lty=c(1,2,3)) 4

O-ring Failures Failure Indicator 0.0 0.2 0.4 0.6 0.8 1.0 logit cloglog probit 55 60 65 70 75 80 Temperature 5