Package sure. September 19, 2017

Similar documents
Package vip. June 15, 2018

Package PCADSC. April 19, 2017

Package lcc. November 16, 2018

Package ConvergenceClubs

Package spark. July 21, 2017

Package intccr. September 12, 2017

Package ArCo. November 5, 2017

Package vinereg. August 10, 2018

Package GLDreg. February 28, 2017

Package optimus. March 24, 2017

Package fastdummies. January 8, 2018

Package gtrendsr. October 19, 2017

Package gains. September 12, 2017

Package quickreg. R topics documented:

Package rereg. May 30, 2018

Package CausalImpact

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package gppm. July 5, 2018

Package datasets.load

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package TVsMiss. April 5, 2018

Package ECctmc. May 1, 2018

Package nprotreg. October 14, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Package harrypotter. September 3, 2018

Package simsurv. May 18, 2018

Package MIICD. May 27, 2017

Package bayesdp. July 10, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package texteffect. November 27, 2017

Package gameofthrones

Package gtrendsr. August 4, 2018

Package misclassglm. September 3, 2016

Package OLScurve. August 29, 2016

Package esabcv. May 29, 2015

Package queuecomputer

Package sparsereg. R topics documented: March 10, Type Package

Package OutlierDC. R topics documented: February 19, 2015

Package nonnest2. January 23, 2018

Package basictrendline

Package complmrob. October 18, 2015

Package rpst. June 6, 2017

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package raker. October 10, 2017

Package MatchIt. April 18, 2017

Package geogrid. August 19, 2018

Package messaging. May 27, 2018

Package FWDselect. December 19, 2015

Package citools. October 20, 2018

Package splines2. June 14, 2018

Package bisect. April 16, 2018

Package addhaz. September 26, 2018

Package mgc. April 13, 2018

Package docxtools. July 6, 2018

Package MixSim. April 29, 2017

Package balance. October 12, 2018

Package strat. November 23, 2016

Package SparseFactorAnalysis

Package ClusterBootstrap

Package OptimaRegion

Package fitur. March 11, 2018

Package canvasxpress

Package panelview. April 24, 2018

Package areaplot. October 18, 2017

Package GFD. January 4, 2018

Package sparsenetgls

Package PSTR. September 25, 2017

Package CvM2SL2Test. February 15, 2013

Package EMDomics. November 11, 2018

Package nima. May 23, 2018

Package ParetoPosStable

Package milr. June 8, 2017

Package lvm4net. R topics documented: August 29, Title Latent Variable Models for Networks

Package caretensemble

Package coga. May 8, 2018

Package simr. April 30, 2018

Package Mondrian. R topics documented: March 4, Type Package

Package qualmap. R topics documented: September 12, Type Package

Package SFtools. June 28, 2017

Package svalues. July 15, 2018

Package rfsa. R topics documented: January 10, Type Package

Package WordR. September 7, 2017

Package DALEX. August 6, 2018

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering

Package modmarg. R topics documented:

Package voxel. R topics documented: May 23, 2017

Package psda. September 5, 2017

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package BiDimRegression

Package endogenous. October 29, 2016

R topics documented: 2 genbernoullidata

Package glmnetutils. August 1, 2017

Package validara. October 19, 2017

Package iccbeta. January 28, 2019

Package dbx. July 5, 2018

Package breakdown. June 14, 2018

Package ecoseries. R topics documented: September 27, 2017

Package oasis. February 21, 2018

Transcription:

Type Package Package sure September 19, 2017 Title Surrogate Residuals for Ordinal and General Regression Models An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017) <doi:10.1080/01621459.2017.1292915>. These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an 'autoplot' function for producing standard diagnostic plots using 'ggplot2' graphics. The package currently supports cumulative link models from packages 'MASS', 'ordinal', 'rms', and 'VGAM'. Support for binary regression models using the standard 'glm' function is also available. Version 0.2.0 Depends R (>= 3.1) Imports ggplot2 (>= 2.2.1), goftest, gridextra, stats Suggests MASS, ordinal, rms, testthat, VGAM License GPL (>= 2) URL https://github.com/afit-r/sure BugReports https://github.com/afit-r/sure/issues Encoding UTF-8 LazyData true RoxygenNote 6.0.1 NeedsCompilation no Author Brandon Greenwell [aut, cre], Andrew McCarthy [aut], Brad Boehmke [aut], Dungang Liu [ctb] Maintainer Brandon Greenwell <greenwell.brandon@gmail.com> Repository CRAN Date/Publication 2017-09-19 18:04:46 UTC 1

2 autoplot.resid R topics documented: autoplot.resid........................................ 2 df1.............................................. 5 df2.............................................. 5 df3.............................................. 6 df4.............................................. 7 df5.............................................. 7 gof.............................................. 8 resids............................................ 9 sure............................................. 12 surrogate.......................................... 13 Index 15 autoplot.resid Residual Plots for Cumulative Link and General Regression Models Residual-based diagnostic plots for cumulative link and general regression models using ggplot2 graphics. autoplot.resid(object, what = c("qq", "fitted", "covariate"), x = NULL, fit = NULL, distribution = qnorm, alpha = 1, xlab = NULL, color = "444444", shape = 19, size = 2, qqpoint.color = "444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL,...) autoplot.clm(object, what = c("qq", "fitted", "covariate"), x = NULL, alpha = 1, xlab = NULL, color = "444444", shape = 19, size = 2, qqpoint.color = "444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL,...) autoplot.glm(object, what = c("qq", "fitted", "covariate"), x = NULL, alpha = 1, xlab = NULL, color = "444444", shape = 19, size = 2, qqpoint.color = "444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL,...) autoplot.lrm(object, what = c("qq", "fitted", "covariate"), x = NULL,

autoplot.resid 3 alpha = 1, xlab = NULL, color = "444444", shape = 19, size = 2, qqpoint.color = "444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL,...) autoplot.orm(object, what = c("qq", "fitted", "covariate"), x = NULL, alpha = 1, xlab = NULL, color = "444444", shape = 19, size = 2, qqpoint.color = "444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL,...) autoplot.polr(object, what = c("qq", "fitted", "covariate"), x = NULL, alpha = 1, xlab = NULL, color = "444444", shape = 19, size = 2, qqpoint.color = "444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL,...) autoplot.vgam(object, what = c("qq", "fitted", "covariate"), x = NULL, alpha = 1, xlab = NULL, color = "444444", shape = 19, size = 2, qqpoint.color = "444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL,...) autoplot.vglm(object, what = c("qq", "fitted", "covariate"), x = NULL, alpha = 1, xlab = NULL, color = "444444", shape = 19, size = 2, qqpoint.color = "444444", qqpoint.shape = 19, qqpoint.size = 2, qqline.color = "888888", qqline.linetype = "dashed", qqline.size = 1, smooth = TRUE, smooth.color = "red", smooth.linetype = 1, smooth.size = 1, fill = NULL,...) Arguments object An object of class clm, glm, lrm, orm, polr, or vglm. what Character string specifying what to plot. Default is "qq" which produces a quantile-quantile plots of the residuals. x fit distribution A vector giving the covariate values to use for residual-by- covariate plots (i.e., when what = "covariate"). The fitted model from which the residuals were extracted. (Only required if what = "fitted" and object inherits from class "resid".) Function that computes the quantiles for the reference distribution to use in the quantile-quantile plot. Default is qnorm which is only appropriate for models using a probit link function. When jitter.scale = "probability", the reference distribution is always U(-0.5, 0.5). (Only required if object inherits from class "resid".)

4 autoplot.resid Value alpha xlab color shape size qqpoint.color qqpoint.shape qqpoint.size A single values in the interval [0, 1] controlling the opacity alpha of the plotted points. Only used when nsim > 1. Character string giving the text to use for the x-axis label in residual-by-covariate plots. Default is NULL. Character string or integer specifying what color to use for the points in the residual vs fitted value/covariate plot. Default is "black". Integer or single character specifying a symbol to be used for plotting the points in the residual vs fitted value/covariate plot. Numeric value specifying the size to use for the points in the residual vs fitted value/covariate plot. Character string or integer specifying what color to use for the points in the quantile-quantile plot. Integer or single character specifying a symbol to be used for plotting the points in the quantile-quantile plot. Numeric value specifying the size to use for the points in the quantile-quantile plot. qqline.color Character string or integer specifying what color to use for the points in the quantile-quantile plot. qqline.linetype Integer or character string (e.g., "dashed") specifying the type of line to use in the quantile-quantile plot. qqline.size smooth Numeric value specifying the thickness of the line in the quantile-quantile plot. Logical indicating whether or not too add a nonparametric smooth to certain plots. Default is TRUE. smooth.color Character string or integer specifying what color to use for the nonparametric smooth. smooth.linetype Integer or character string (e.g., "dashed") specifying the type of line to use for the nonparametric smooth. smooth.size fill Numeric value specifying the thickness of the line for the nonparametric smooth. Character string or integer specifying the color to use to fill the boxplots for residual-by-covariate plots when x is of class "factor". Default is NULL which colors the boxplots according to the factor levels.... Additional optional arguments to be passed onto resids. A "ggplot" object. See?resids for an example?resids

df1 5 df1 Simulated Quadratic Data Format Data simulated from a probit model with a quadratic trend. The data are described in Example 2 of Liu and Zhang (2017). data(df1) A data frame with 2000 rows and 2 variables. References y The response variable; an ordered factor. x The predictor variable. Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). head(df1) df2 Simulated Heteroscedastic Data Format Data simulated from a probit model with heteroscedasticity. The data are described in Example 4 of Liu and Zhang (2017). data(df2) A data frame with 2000 rows and 2 variables. y The response variable; an ordered factor. x The predictor variable.

6 df3 References Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). head(df2) df3 Simulated Gumbel Data Data simulated from a log-log model with a quadratic trend. The data are described in Example 3 of Liu and Zhang (2017). data(df3) Format A data frame with 2000 rows and 2 variables. y The response variable; an ordered factor. x The predictor variable. References Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). head(df3)

df4 7 df4 Simulated Proportionality Data Format Data simulated from two separate ordered probit models with different coefficients. The data are described in Example 5 of Liu and Zhang (2017). data(df4) A data frame with 2000 rows and 2 variables. References y The response variable; an ordered factor. x The predictor variable. Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). head(df4) df5 Simulated Interaction Data Data simulated from an ordered probit model with an interaction term. data(df5) Format A data frame with 2000 rows and 3 variables. y The response variable; an ordered factor. x1 A continuous predictor. x2 A factor with two levels: Control and Treatment.

8 gof head(df5) gof Goodness-of-Fit Simulation Simulate p-values from a goodness-of-fit test. gof(object, nsim = 10, test = c("ks", "ad", "cvm"),...) Default S3 method: gof(object, nsim = 10, test = c("ks", "ad", "cvm"),...) S3 method for class 'gof' plot(x,...) Arguments object nsim test An object of class clm, glm, lrm, orm, polr, or vglm. Integer specifying the number of bootstrap replicates to use. Character string specifying which goodness-of-fit test to use. Current options include: "ks" for the Kolmogorov-Smirnov test, "ad" for the Anderson-Darling test, and "cvm" for the Cramer-Von Mises test. Default is "ks".... Additional optional arguments. (Currently ignored.) x Details An object of class "gof". Under the null hypothesis, the distribution of the p-values should appear uniformly distributed on the interval [0, 1]. This can be visually investigated using the plot method. A 45 degree line is indicative of a "good" fit. Value A numeric vector of class "gof", "numeric" containing the simulated p-values. See?resids for an example?resids

resids 9 resids Surrogate Residuals Surrogate-based residuals for cumulative link and general regression models. resids(object,...) Default S3 method: resids(object, method = c("latent", "jitter"), jitter.scale = c("probability", "response"), nsim = 1L,...) Arguments object An object of class clm, glm, lrm, orm, polr, vgam (jittering only), or vglm.... Additional optional arguments. (Currently ignored.) method jitter.scale nsim Character string specifying the type of surrogate to use; for details, see Liu and Zhang (2017). Can be one of "latent" or "jitter". Character string specifying the scale on which to perform the jittering. Should be one of "probability" or "response". (Currently ignored for cumulative link models.) Integer specifying the number of bootstrap replicates to use. Default is 1L meaning no bootstrap samples. Value A numeric vector of class c("numeric", "resid") containing the residuals. Additionally, if nsim > 1, then the result will contain the attributes: boot.reps A matrix with nsim columns, one for each bootstrap replicate of the residuals. Note, these are random and do not correspond to the original ordering of the data; boot.id A matrix with nsim columns. Each column contains the observation number each residual corresponds to in boot.reps. (This is used for plotting purposes.) Note Surrogate residuals require sampling from a continuous distribution; consequently, the result will be different with every call to resids. The internal functions used for sampling from truncated distributions when method = "latent" are based on modified versions of rtrunc and qtrunc.

10 resids References Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). URL http://www.tandfonline.com/doi/abs/10 Nadarajah, Saralees and Kotz, Samuel. R Programs for Truncated Distributions. Journal of Statistical Software, Code Snippet, 16(2), 1-8, 2006. URL https://www.jstatsoft.org/v016/c02. Residuals for binary GLMs using the jittering method Load the MASS package (for the polr function) library(mass) Simulated probit data with quadratic trend data(df1) Fit logistic regression models (with and without quadratic trend) fit1 <- polr(y ~ x + I(x ^ 2), data = df1, method = "probit") fit2 <- polr(y ~ x, data = df1, method = "probit") Construct residuals set.seed(102) for reproducibility res1 <- resids(fit1) res2 <- resids(fit2) Residual-vs-covariate plots par(mfrow = c(1, 2)) scatter.smooth(df1$x, res1, lpars = list(lwd = 2, col = "red"), xlab = expression(x), ylab = "Surrogate residual", main = "Correct model") scatter.smooth(df1$x, res2, lpars = list(lwd = 2, col = "red"), xlab = expression(x), ylab = "Surrogate residual", main = "Incorrect model") Not run: Residuals for cumulative link models using the latent method Load required packages library(ggplot2) for autoplot function library(mass) for polr function library(ordinal) for clm function Detecting a misspecified mean structure Data simulated from a probit model with a quadratic trend

resids 11 data(df1)?df1 Fit a probit model with/without a quadratic trend fit.bad <- polr(y ~ x, data = df1, method = "probit") fit.good <- polr(y ~ x + I(x ^ 2), data = df1, method = "probit") Some residual plots p1 <- autoplot(fit.bad, what = "covariate", x = df1$x) p2 <- autoplot(fit.bad, what = "qq") p3 <- autoplot(fit.good, what = "covariate", x = df1$x) p4 <- autoplot(fit.good, what = "qq") Display all four plots together (top row corresponds to bad model) grid.arrange(p1, p2, p3, p4, ncol = 2) Detecting heteroscedasticity Data simulated from a probit model with heteroscedasticity. data(df2)?df2 Fit a probit model with/without a quadratic trend fit <- polr(y ~ x, data = df2, method = "probit") Some residual plots p1 <- autoplot(fit, what = "covariate", x = df1$x) p2 <- autoplot(fit, what = "qq") p3 <- autoplot(fit, what = "fitted") Display all three plots together grid.arrange(p1, p2, p3, ncol = 3) Detecting a misspecified link function Data simulated from a log-log model with a quadratic trend. data(df3)?df3 Fit models with correctly specified link function clm.loglog <- clm(y ~ x + I(x ^ 2), data = df3, link = "loglog") polr.loglog <- polr(y ~ x + I(x ^ 2), data = df3, method = "loglog") Fit models with misspecified link function clm.probit <- clm(y ~ x + I(x ^ 2), data = df3, link = "probit") polr.probit <- polr(y ~ x + I(x ^ 2), data = df3, method = "probit") Q-Q plots of the residuals (with bootstrapping) p1 <- autoplot(clm.probit, nsim = 50, what = "qq") +

12 sure ggtitle("clm: probit link") p2 <- autoplot(clm.loglog, nsim = 50, what = "qq") + ggtitle("clm: log-log link (correct link function)") p3 <- autoplot(polr.probit, nsim = 50, what = "qq") + ggtitle("polr: probit link") p4 <- autoplot(polr.loglog, nsim = 50, what = "qq") + ggtitle("polr: log-log link (correct link function)") grid.arrange(p1, p2, p3, p4, ncol = 2) We can also try various goodness-of-fit tests par(mfrow = c(1, 2)) plot(gof(clm.probit, nsim = 50)) plot(gof(clm.loglog, nsim = 50)) End(Not run) sure sure: An R package for constructing surrogate-based residuals and diagnostics for ordinal and general regression models. The sure package provides surrogate-based residuals for fitted ordinal and general (e.g., binary) regression models of class clm, glm, lrm, orm, polr, or vglm. Details The development version can be found on GitHub: https://github.com/afit-r/sure. As of right now, sure exports the following functions: resids - construct (surrogate-based) residuals; autoplot - plot diagnostics using ggplot2-based graphics; gof - simulate p-values from a goodness-of-fit test. References Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted).

surrogate 13 surrogate Surrogate Response Simulate surrogate response values for cumulative link regression models using the latent method described in Liu and Zhang (2017). surrogate(object, method = c("latent", "jitter"), jitter.scale = c("probability", "response"), nsim = 1L,...) Arguments object method jitter.scale nsim An object of class clm, lrm, orm, polr, or vglm. Character string specifying the type of surrogate to use; for details, see Liu and Zhang (2017). For cumulative link models, the latent variable method is used. For binary GLMs, the jittering approach is employed. (Currently ignored.) Character string specifying the scale on which to perform the jittering. Should be one of "probability" or "response". (Currently ignored for cumulative link models.) Integer specifying the number of bootstrap replicates to use. Default is 1L meaning no bootstrap samples.... Additional optional arguments. (Currently ignored.) Value A numeric vector of class c("numeric", "surrogate") containing the simulated surrogate response values. Additionally, if nsim > 1, then the result will contain the attributes: boot.reps A matrix with nsim columns, one for each bootstrap replicate of the surrogate values. Note, these are random and do not correspond to the original ordering of the data; boot.id A matrix with nsim columns. Each column contains the observation number each surrogate value corresponds to in boot.reps. (This is used for plotting purposes.) Note Surrogate response values require sampling from a continuous distribution; consequently, the result will be different with every call to surrogate. The internal functions used for sampling from truncated distributions are based on modified versions of rtrunc and qtrunc.

14 surrogate References Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). URL http://www.tandfonline.com/doi/abs/10 Nadarajah, Saralees and Kotz, Samuel. R Programs for Truncated Distributions. Journal of Statistical Software, Code Snippet, 16(2), 1-8, 2006. URL https://www.jstatsoft.org/v016/c02.

Index Topic datasets df1, 5 df2, 5 df3, 6 df4, 7 df5, 7 sure-package (sure), 12 surrogate, 13 vgam, 9 vglm, 3, 8, 9, 12, 13 autoplot.clm (autoplot.resid), 2 autoplot.glm (autoplot.resid), 2 autoplot.lrm (autoplot.resid), 2 autoplot.orm (autoplot.resid), 2 autoplot.polr (autoplot.resid), 2 autoplot.resid, 2 autoplot.vgam (autoplot.resid), 2 autoplot.vglm (autoplot.resid), 2 clm, 3, 8, 9, 12, 13 df1, 5 df2, 5 df3, 6 df4, 7 df5, 7 ggplot2, 2, 12 glm, 3, 8, 9, 12 gof, 8 lrm, 3, 8, 9, 12, 13 orm, 3, 8, 9, 12, 13 plot.gof (gof), 8 polr, 3, 8, 9, 12, 13 qtrunc, 9, 13 resids, 4, 9 rtrunc, 9, 13 sure, 12 15