The gss Package. September 25, Description A comprehensive package for structural multivariate function estimation using smoothing splines.

Size: px
Start display at page:

Download "The gss Package. September 25, Description A comprehensive package for structural multivariate function estimation using smoothing splines."

Transcription

1 The gss Package September 25, 2004 Version Depends R (>= 1.7.0) Title General Smoothing Splines Author Chong Gu <chong@stat.purdue.edu> Maintainer Chong Gu <chong@stat.purdue.edu> A comprehensive package for structural multivariate function estimation using smoothing splines. License GPL R topics documented: LakeAcidity aids bacteriuria buffalo cdssden drkpk dssden family fitted.ssanova gastric gauss.quad gssanova gssanova hzdrate.sshzd mkfun.poly mkfun.tp mkran mkrk.factor mkterm.ssanova nlm nox ozone predict.ssanova print.ssanova

2 2 LakeAcidity project rkpk rkpk smolyak ssanova ssanova ssden sshzd stan summary.gssanova summary.gssanova summary.ssanova wesdr Index 47 LakeAcidity Water Acidity in Lakes Data extracted from the Eastern Lake Survey of 1984 conducted by the United States Environmental Protection Agency, concerning 112 lakes in the Blue Ridge. data(lakeacidity) Format A list containing 112 observations on the following variables. ph cal lat lon geog Surface ph. Calcium concentration. Latitude. Longitude. Geographic location, derived from lat and lon Details geog was generated from lat and lon using the code given in the Example section. Source Douglas, A. and Delampady, M. (1990), Eastern Lake Survey Phase I: Documentation for the Data Base and the Derived Data sets. Tech Report 160 (SIMS), Dept. Statistics, University of British Columbia. References Gu, C. and Wahba, G. (1993), Semiparametric analysis of variance with tensor product thin plate splines. Journal of the Royal Statistical Society Ser. B, 55,

3 bacteriuria 3 Examples ## Converting latitude and longitude to x-y coordinates ## Not run: convert <- function(lat, lon) { lat <- lat/180*pi lon <- lon/180*pi m.lat <- (max(lat)+min(lat))/2 m.lon <- (max(lon)+min(lon))/2 x <- cos(m.lat)*sin(m.lon-lon) y <- sin(lat-m.lat) cbind(x,y) } data(lakeacidity) convert(lakeacidity$lat,lakeacidity$lon) ## Clean up rm(lakeacidity,convert) ## End(Not run) aids AIDS Incubation A data set collected by Centers for Disease Control and Prevention concerning AIDS patients who were infected with the HIV virus through blood transfusion. data(aids) Format A data frame containing 295 observations on the following variables. incu Time from HIV infection to AIDS diagnosis. infe Time from HIV infection to end of data collection (July 1986). age Age at time of blood transfusion. Source Wang, M.-C. (1989), A semiparametric model for randomly truncated data. Journal of the American Statistical Association, 84, bacteriuria Treatment of Bacteriuria Bacteriuria patients were randomly assigned to two treatment groups. Weekly binary indicator of bacteriuria was recorded for every patient over 4 to 16 weeks. A total of 72 patients were represented in the data, with 36 each in the two treatment groups.

4 4 buffalo data(bacteriuria) Format A data frame containing 820 observations on the following variables. id trt time infect Identification of patients, a factor. Treatments 1 or 2, a factor. Weeks after randomization. Binary indicator of bacteriuria (bacteria in urine). Source Joe, H. (1997), Multivariate Models and Dependence Concepts. Hall. London: Chapman and References Gu and Ma (2003b), Generalized Nonparametric Mixed-Effect Models: Computation and Smoothing Parameter Selection. Available at html. buffalo Buffalo Annual Snowfall Annual snowfall accumulations in Buffalo, NY from 1910 to data(buffalo) Format A vector of 64 numerical values. Source Scott, D. W. (1985), Average shifted histograms: Effective nonparametric density estimators in several dimensions. The Annals of Statistics, 13,

5 cdssden 5 cdssden Evaluating Conditional PDF, CDF, and Quantiles of Smoothing Spline Density Estimates Evaluate conditional pdf, cdf, and quantiles for smoothing spline density estimates. cdssden(object, x, cond, int) cpssden(object, q, cond, int) cqssden(object, p, cond, int) object x cond int q p Object of class "ssden". Data frame or vector of points on which conditional density is to be evaluated. One row data frame of conditioning variables. Normalizing constant. Vector of points on which conditional cdf is to be evaluated. Vector of probabilities for which conditional quantiles are to be calculated. Details The argument x in cdssden is of the same form as the argument newdata in predict.lm, but can take a vector for 1-D conditional densities. cpssden and cqssden naturally only work for 1-D conditional densities. Value cdssden returns a list object with the following components. pdf int Vector of conditional pdf. Normalizing constant. cpssden and cpssden return a vector of conditional cdf or quantiles. Note cpssden and cqssden can be very slow. See Also ssden and dssden.

6 6 drkpk drkpk Numerical Engine for ssden and sshzd Calculate penalized likelihood density estimates and hazard estimates via the Newton iteration and evaluate the cross-validation score, as implemented in the RATFOR routine dnewton.r and hzdnewton.r, and minimize the cross-validation score using nlm and nlm0. sspdsty(s,r,q,cnt,qd.s,qd.r,qd.wt,prec,maxiter,alpha) mspdsty(s,r,q,cnt,qd.s,qd.r,qd.wt,prec,maxiter,alpha) msphzd(s,r,q,nobs,cnt,qd.s,qd.r,qd.wt,prec,maxiter,alpha) s r q Nobs cnt qd.s qd.r qd.wt prec maxiter alpha Unpenalized terms evaluated at data points. Basis of penalized terms evaluated at data points. Penalty matrix. Total number of lifetime observations. Bin-counts for histogram data. Unpenalized terms evaluated at quadrature nodes. Basis of penalized terms evaluated at quadrature nodes. Quadrature weights. Precision requirement for internal iterations. Maximum number of iterations allowed for internal iterations. Parameter defining cross-validation score for smoothing parameter selection. Details sspdsty is used by ssden to compute cross-validated density estimate with a single smoothing parameter. mspdsty is used by ssden to compute cross-validated density estimate with multiple smoothing parameters. msphzd is used by sshzd to compute cross-validated hazard estimate with single or multiple smoothing parameters. References Gu, C. and Wang, J. (2002), Penalized Likelihood Density Estimation: Direct Cross- Validation and Scalable Approximation. Available at stat.purdue.edu/~chong/manu. html. Gu, C. (2002), Smoothing Spline ANOVA Models. New York: Springer-Verlag.

7 dssden 7 dssden Evaluating PDF, CDF, and Quantiles of Smoothing Spline Density Estimates Evaluate pdf, cdf, and quantiles for smoothing spline density estimates. dssden(object, x) pssden(object, q) qssden(object, p) object x q p Object of class "ssden". Data frame or vector of points on which density is to be evaluated. Vector of points on which cdf is to be evaluated. Vector of probabilities for which quantiles are to be calculated. Details The argument x in dssden is of the same form as the argument newdata in predict.lm, but can take a vector for 1-D densities. pssden and qssden naturally only work for 1-D densities. Value A vector of pdf, cdf, or quantiles. See Also ssden and cdssden. family Utility Functions for Error Families Utility functions for fitting Smoothing Spline ANOVA models with non-gaussian responses.

8 8 family mkdata.binomial(y, eta, wt, offset) dev.resid.binomial(y, eta, wt) dev.null.binomial(y, wt, offset) cv.binomial(y, eta, wt, hat, alpha) y0.binomial(y, eta0, wt) proj0.binomial(y0, eta, offset) kl.binomial(eta0, eta1, wt) cfit.binomial(y, wt, offset) mkdata.poisson(y, eta, wt, offset) dev.resid.poisson(y, eta, wt) dev.null.poisson(y, wt, offset) cv.poisson(y, eta, wt, hat, alpha, sr, q) y0.poisson(eta0) proj0.poisson(y0, eta, wt, offset) kl.poisson(eta0, eta1, wt) cfit.poisson(y, wt, offset) mkdata.gamma(y, eta, wt, offset) dev.resid.gamma(y, eta, wt) dev.null.gamma(y, wt, offset) cv.gamma(y, eta, wt, hat, rss, alpha) y0.gamma(eta0) proj0.gamma(y0, eta, wt, offset) kl.gamma(eta0, eta1, wt) cfit.gamma(y, wt, offset) mkdata.inverse.gaussian(y, eta, wt, offset) dev.resid.inverse.gaussian(y, eta, wt) dev.null.inverse.gaussian(y, wt, offset) mkdata.nbinomial(y, eta, wt, offset, nu) dev.resid.nbinomial(y, eta, wt) dev.null.nbinomial(y, wt, offset) cv.nbinomial(y, eta, wt, hat, alpha) y0.nbinomial(y,eta0,nu) proj0.nbinomial(y0, eta, wt, offset) kl.nbinomial(eta0, eta1, wt, nu) cfit.nbinomial(y, wt, offset, nu) mkdata.weibull(y, eta, wt, offset, nu) dev.resid.weibull(y, eta, wt, nu) dev.null.weibull(y, wt, offset, nu) cv.weibull(y, eta, wt, hat, nu, alpha) y0.weibull(y, eta0, nu) proj0.weibull(y0, eta, wt, offset, nu) kl.weibull(eta0, eta1, wt, nu, int) cfit.weibull(y, wt, offset, nu) mkdata.lognorm(y, eta, wt, offset, nu) dev.resid.lognorm(y, eta, wt, nu)

9 fitted.ssanova 9 dev0.resid.lognorm(y, eta, wt, nu) dev.null.lognorm(y, wt, offset, nu) cv.lognorm(y, eta, wt, hat, nu, alpha) y0.lognorm(y, eta0, nu) proj0.lognorm(y0, eta, wt, offset, nu) kl.lognorm(eta0, eta1, wt, nu, y0) cfit.lognorm(y, wt, offset, nu) mkdata.loglogis(y, eta, wt, offset, nu) dev.resid.loglogis(y, eta, wt, nu) dev0.resid.loglogis(y, eta, wt, nu) dev.null.loglogis(y, wt, offset, nu) cv.loglogis(y, eta, wt, hat, nu, alpha) y0.loglogis(y, eta0, nu) proj0.loglogis(y0, eta, wt, offset, nu) kl.loglogis(eta0, eta1, wt, nu, y0) cfit.loglogis(y, wt, offset, nu) y eta wt offset nu Model response. Fitted values on link scale. Model weights. Model offset. Size for nbinomial. Inverse scale for log life time. Details These are not to be called by the user. mkdata.x create the pseudo data to be used in iterated penalized least squares fitting. dev.resid.x calculate the deviance residuals. dev.null.x calculate the deviance of the constant null model. See Also gssanova. fitted.ssanova Fitted Values and Residuals from Smoothing Spline ANOVA Fits Methods for extracting fitted values and residuals from smoothing spline ANOVA fits. fitted.ssanova(object,...) residuals.ssanova(object,...) fitted.gssanova(object,...) residuals.gssanova(object, type="working",...)

10 10 gauss.quad object type Details... Ignored. Object of class "ssanova" or "gssanova". Type of residuals desired, with two alternatives "working" (default) or "deviance". The fitted values for "gssanova" objects are on the link scale, the same scale the "working" residuals are on. gastric Gastric Cancer Data Survival of gastric cancer patients under chemotherapy and chemotherapy-radiotherapy combination. data(gastric) Format A data frame containing 90 observations on the following variables. futime status trt Follow-up time, in days. Censoring status. Factor indicating the treatments: 1 chemothrapy, 2 combination. Source Moreau, T., O Quigley, J., and Mesbah, M. (1985), A global goodness-of-fit statistic for the proportional hazards model. Applied Statistics, 34, gauss.quad Generating Gauss-Legendre Quadrature Generate Gauss-Legendre quadratures using the FORTRAN routine gaussq.f found at NETLIB. gauss.quad(size,interval)

11 gssanova 11 size interval Size of quadrature. Interval to be covered. Value gauss.quad returns a list object with the following components. pt wt Quadrature nodes. Quadrature weights. gssanova Fitting Smoothing Spline ANOVA Models with Non Gaussian Responses Fit smoothing spline ANOVA models to responses from selected exponential families with cubic spline, linear spline, or thin-plate spline marginals for numerical variables. Factors are also accepted. The symbolic model specification via formula follows the same rules as in lm and glm. gssanova(formula, family, type="cubic", data=list(), weights, subset, offset, na.action=na.omit, partial=null, method=null, varht=1, nu=null, prec=1e-7, maxiter=30, ext=.05, order=2) formula family type data weights subset Symbolic description of the model to be fit. of the error distribution. Supported are exponential families "binomial", "poisson", "Gamma", "inverse.gaussian", and "nbinomial". Also supported are accelerated life model families "weibull", "lognorm", and "loglogis". Type of numerical marginals to be used. Supported are type="cubic" for cubic spline marginals, type="linear" for linear spline marginals, and type="tp" for thin-plate spline marginals. Optional data frame containing the variables in the model. Optional vector of weights to be used in the fitting process. Optional vector specifying a subset of observations to be used in the fitting process. offset Optional offset term with known parameter 1. na.action partial Function which indicates what should happen when the data contain NAs. Optional extra unpenalized terms in partial spline models. method Score used to drive the performance-oriented iteration. Supported are method="v" for GCV, method="m" for GML, and method="u" for Mallow s CL.

12 12 gssanova varht Dispersion parameter needed for method="u". Ignored when method="v" or method="m" are specified. nu Inverse scale parameter in accelerated life model families. Ignored for exponential families. prec maxiter ext order Details Value Note Precision requirement for the iterations. Maximum number of iterations allowed for performance-oriented iteration, and for inner-loop multiple smoothing parameter selection when applicable. For cubic spline and linear spline marginals, this option specifies how far to extend the domain beyond the minimum and the maximum as a percentage of the range. The default ext=.05 specifies marginal domains of lengths 110 percent of their respective ranges. Prediction outside of the domain will result in an error. Ignored if type="tp" is specified. For thin-plate spline marginals, this option specifies the order of the marginal penalties. Ignored if type="cubic" or type="linear" are specified. The models are fitted by penalized likelihood method through the performance-oriented iteration, as described in the reference cited below. Only one link is implemented for each family. It is the logit link for "binomial", and the log link for "poisson", "Gamma", and "inverse.gaussian". For "nbinomial", the working parameter is the logit of the probability p; see NegBinomial. For "weibull", "lognorm", and "loglogis", it is the location parameter for the log life time. For family="binomial", "poisson", "nbinomial", "weibull", "lognorm", and "loglogis", the score driving the performance-oriented iteration defaults to method="u" with varht=1. For family="gamma" and "inverse.gaussian", the default is method="v". See ssanova for details and notes concerning smoothing spline ANOVA models. gssanova returns a list object of class "gssanova" which inherits from the class "ssanova". The method summary is used to obtain summaries of the fits. The method predict can be used to evaluate the fits at arbitrary points, along with the standard errors to be used in Bayesian confidence intervals, both on the scale of the link. The methods residuals and fitted.values extract the respective traits from the fits. For family="binomial", the response can be specified either as two columns of counts or as a column of sample proportions plus a column of total counts entered through the argument weights, as in glm. For family="nbinomial", the response may be specified as two columns with the second being the known sizes, or simply as a single column with the common unknown size to be estimated through the maximum likelihood method. For family="weibull", "lognorm", or "loglogis", the response consists of three columns, with the first giving the follow-up time, the second the censoring status, and the third the left-truncation time. For data with no truncation, the third column can be omitted.

13 gssanova1 13 Author(s) Chong Gu, References Gu, C. (1992), Cross-validating non Gaussian data. Journal of Computational and Graphical Statistics, 1, See Also Methods predict.ssanova, summary.gssanova, and fitted.gssanova. Examples ## Fit a cubic smoothing spline logistic regression model test <- function(x) {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2} x <- (0:100)/100 p <- 1-1/(1+exp(test(x))) y <- rbinom(x,3,p) logit.fit <- gssanova(cbind(y,3-y)~x,family="binomial") ## The same fit logit.fit1 <- gssanova(y/3~x,"binomial",weights=rep(3,101)) ## Obtain estimates and standard errors on a grid est <- predict(logit.fit,data.frame(x=x),se=true) ## Plot the fit and the Bayesian confidence intervals plot(x,y/3,ylab="p") lines(x,p,col=1) lines(x,1-1/(1+exp(est$fit)),col=2) lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3) lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3) ## Clean up ## Not run: rm(test,x,p,y,logit.fit,logit.fit1,est) dev.off() ## End(Not run) gssanova1 Fitting Smoothing Spline ANOVA Models with Non Gaussian Responses Fit smoothing spline ANOVA models to responses from selected exponential families with cubic spline, linear spline, or thin-plate spline marginals for numerical variables. Factors are also accepted. The symbolic model specification via formula follows the same rules as in lm and glm. gssanova1(formula, family, type="cubic", data=list(), weights, subset, offset, na.action=na.omit, partial=null, alpha=null, nu=null, id.basis=null, nbasis=null, seed=null, random=null, ext=.05,order=2)

14 14 gssanova1 formula family type data weights subset Symbolic description of the model to be fit. of the error distribution. Supported are exponential families "binomial", "poisson", "Gamma", and "nbinomial". Also supported are accelerated life model families "weibull", "lognorm", and "loglogis". Type of numerical marginals to be used. Supported are type="cubic" for cubic spline marginals, type="linear" for linear spline marginals, and type="tp" for thin-plate spline marginals. Optional data frame containing the variables in the model. Optional vector of weights to be used in the fitting process. Optional vector specifying a subset of observations to be used in the fitting process. offset Optional offset term with known parameter 1. na.action partial alpha nu id.basis nbasis seed random ext order Details Function which indicates what should happen when the data contain NAs. Optional extra unpenalized terms in partial spline models. Tuning parameter defining cross-validation; larger values yield smoother fits. Defaults are alpha=1 for family="binomial" and alpha=1.4 for family="poisson" and family="gamma". Input for future support of accelerated life model families. Index designating selected knots. Number of knots to be selected. Ignored when id.basis is supplied. Seed for reproducible random selection of knots. Ignored when id.basis is supplied. Input for parametric random effects in nonparametric mixed-effect models. See mkran for details. For cubic spline and linear spline marginals, this option specifies how far to extend the domain beyond the minimum and the maximum as a percentage of the range. The default ext=.05 specifies marginal domains of lengths 110 percent of their respective ranges. Prediction outside of the domain will result in an error. Ignored if type="tp" is specified. For thin-plate spline marginals, this option specifies the order of the marginal penalties. Ignored if type="cubic" or type="linear" are specified. gssanova1 implements an alternative approach to the fitting of gssanova models. It works in some q-dimensional function space determined by a set of knots typically through random selection; the default dimension (nbasis) is set to be q = 10n ( 2/9), adequate for cubic splines. The algorithms are of the order O(nq 2 ), scaling much better than the O(n 3 ) of ssanova; the memory requirements are O(nq) versus O(n 2 ). gssanova1 adopts direct cross-validation instead of the indirect cross-validation of gssanova. Currently, only three families are supported, "binomial", "poisson", and "Gamma". Other families of gssanova shall be added in the future. Only one link is implemented for each family. It is the logit link for "binomial", and the log link for "poisson", and "Gamma". See ssanova for details and notes concerning smoothing spline ANOVA models.

15 gssanova1 15 Value Note gssanova1 returns a list object of class "gssanova1" which inherits from the classes "ssanova1", "gssanova", and "ssanova". The method summary is used to obtain summaries of the fits. The method predict can be used to evaluate the fits at arbitrary points, along with the standard errors to be used in Bayesian confidence intervals, both on the scale of the link. The methods residuals and fitted.values extract the respective traits from the fits. For family="binomial", the response can be specified either as two columns of counts or as a column of sample proportions plus a column of total counts entered through the argument weights, as in glm. Author(s) Chong Gu, chong@stat.purdue.edu References See Also gssanova and methods predict.ssanova1, summary.gssanova1, and fitted.gssanova. Examples ## Fit a cubic smoothing spline logistic regression model test <- function(x) {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2} x <- (0:100)/100 p <- 1-1/(1+exp(test(x))) y <- rbinom(x,3,p) logit.fit <- gssanova1(cbind(y,3-y)~x,family="binomial") ## The same fit logit.fit1 <- gssanova1(y/3~x,"binomial",weights=rep(3,101), id.basis=logit.fit$id.basis) ## Obtain estimates and standard errors on a grid est <- predict(logit.fit,data.frame(x=x),se=true) ## Plot the fit and the Bayesian confidence intervals plot(x,y/3,ylab="p") lines(x,p,col=1) lines(x,1-1/(1+exp(est$fit)),col=2) lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3) lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3) ## Fit a mixed-effect logistic model data(bacteriuria) bact.fit <- gssanova1(infect~trt+time,family="binomial",data=bacteriuria, id.basis=(1:820)[bacteriuria$id%in%c(3,38)],random=~1 id) ## Predict fixed effects predict(bact.fit,data.frame(time=2:16,trt=as.factor(rep(1,15))),se=true) ## Estimated random effects bact.fit$b

16 16 hzdrate.sshzd ## Clean up ## Not run: rm(test,x,p,y,logit.fit,logit.fit1,est,bacteriuria,bact.fit) dev.off() ## End(Not run) hzdrate.sshzd Evaluating Smoothing Spline Hazard Estimates Evaluate smoothing spline hazard estimates by sshzd. hzdrate.sshzd(object, x, se=false) hzdcurve.sshzd(object, time, covariates=null, se=false) survexp.sshzd(object, time, covariates=null, start=0) object x se time covariates start Object of class "sshzd". Data frame or vector of points on which hazard is to be evaluated. Flag indicating if standard errors are required. Vector of time points. Vector of covariate values. Optional starting times of the intervals. Value For se=false, hzdrate.sshzd returns a vector of hazard evaluations. For se=true, hzdrate.sshzd returns a list consisting of the following components. fit se.fit fit se.fit Vector of hazard evaluations. Vector of standard errors for log hazard. Vector or columns of hazard curve(s). Vector or columns of standard errors for log hazard curve(s). survexp.sshzd returns a vector of expected survivals, which are based on the cumulative hazards over the specified intervals. See Also sshzd.

17 mkfun.poly 17 mkfun.poly Crafting Building Blocks for Polynomial Splines Craft numerical functions to be used by mkterm.linear, mkterm.cubic to assemble model terms. mkrk.linear(range) mkrk.cubic(range) mkphi.cubic(range) range Numerical vector whose minimum and maximum specify the range on which the function to be crafted is defined. Details These are not to be called by the user. Value A list of two components. fun env Function definition. Portable local constants derived from the argument. Note mkrk.x create a bivariate function fun(x,y,env,outer=false), where x, y are real arguments and local constants can be passed in through env. mkphi.cubic creates a univariate function fun(x,nu=1,env). See Also mkfun.tp, mkrk.factor.

18 18 mkfun.tp mkfun.tp Crafting Building Blocks for Thin-Plate Splines Craft numerical functions to be used by mkterm.tp to assemble model terms. mkrk.tp(dm,order,mesh,weight) mkphi.tp(dm,order,mesh,weight) mkrk.tp.p(dm,order) mkphi.tp.p(dm,order) dm Dimension of the variable d. order Order of the differential operator m. mesh weight Details Value Normalizing mesh. Normalizing weights. These are not to be called by the user. Thin-plate splines are defined for 2m > d. mkrk.tp.p generates the pseudo kernel, and mkphi.tp.p generates the (m+d 1)!/d!/(m 1)! lower order polynomials with total order less than m. mkphi.tp generates normalized lower order polynomials orthonormal w.r.t. a norm specified by mesh and weight as described in the reference, and mkrk.tp conditions the pseudo kernel to generate the reproducing kernel orthogonal to the lower order polynomials w.r.t. the norm. A list of two components. fun env Function definition. Portable local constants derived from the arguments. Note mkrk.tp creates a bivariate function fun(x,y,env,outer=false), where x, y are real arguments and local constants can be passed in through env. mkphi.tp creates a collection of univariate functions fun(x,nu,env), where x is the argument and nu is the index. References Gu, C. and Wahba, G. (1993), Semiparametric analysis of variance with tensor product thin plate splines. Journal of the Royal Statistical Society Ser. B, 55,

19 mkran 19 See Also mkfun.poly, mkrk.factor. mkran Generating Random Effects in Mixed-Effect Models Generate entries representing random effects in mixed-effect models. mkran(formula, data) formula data Symbolic description of the random effects. Data frame containing the variables in the model. Details Value This function generates random effects terms from simple grouping variables, for use in nonparametric mixed-effect models as described in Gu and Ma (2003a, b). The syntax of the formula resembles that of similar utilities for linear and nonlinear mixed-effect models, as described in Pinheiro and Bates (2000). Currently, mkran takes only two kinds of formulas, ~1 grp2 or ~grp1 grp2. Both grp1 and grp2 should be factors, and for the second formula, the levels of grp2 should be nested under those of grp1. The Z matrix is determined by grp2. When observations are ordered according to the levels of grp2, the Z matrix is block diagonal of 1 vectors. The Sigma matrix is diagonal. For ~1 grp2, it has one tuning parameter. For ~grp1 grp2, the number of parameters equals the number of levels of grp1, with each parameter shared by the grp2 levels nested under the same grp1 level. A list of three components. z sigma init Z matrix. Sigma matrix to be evaluated through sigma$fun(para,sigma$env). Initial parameter values. Note One may pass a formula or a list to the argument random in calls to ssanova1 orgssanova1 to fit nonparametric mixed-effect models. A formula will be converted to a list using mkran. A list should be of the same form as the value of mkran. Author(s) Chong Gu, chong@stat.purdue.edu

20 20 mkrk.factor References Gu and Ma (2003a), Optimal Smoothing in Nonparametric Mixed-Effect Models. Available at Gu and Ma (2003b), Generalized Nonparametric Mixed-Effect Models: Computation and Smoothing Parameter Selection. Available at html. Pinheiro and Bates (2000), Mixed-Effects Models in S and S-PLUS. New York: Springer- Verlag. Examples ## Toy data test <- data.frame(grp=as.factor(rep(1:2,c(2,3)))) ## First formula ran.test <- mkran(~1 grp,test) ran.test$z ran.test$sigma$fun(2,ran.test$sigma$env) # diag(10^(-2),2) ## Second formula ran.test <- mkran(~grp grp,test) ran.test$z ran.test$sigma$fun(c(1,2),ran.test$sigma$env) # diag(10^(-1),10^(-2)) ## Clean up ## Not run: rm(test,ran.test) mkrk.factor Crafting Building Blocks for Discrete Splines Craft numerical functions to be used by mkterm.x to assemble model terms involving factors. mkrk.nominal(levels) mkrk.ordinal(levels) levels Levels of the factor. Details These are not to be called by the user. For a nominal factor with levels 1, 2,..., k, the level means f(i) will be shrunk towards each other through a penalty proportional to where f(.) = (f(1) f(k))/k. (f(1) f(.)) (f(k) f(.)) 2 For a ordinal factor with levels 1 < 2 <... < k, the level means f(i) will be shrunk towards each other through a penalty proportional to (f(1) f(2)) (f(k 1) f(k)) 2

21 mkterm.ssanova 21 Value A list of two components. fun env Function definition. Portable local constants derived from the arguments. Note mkrk.x create a bivariate function fun(x,y,env,outer=false), where x, y are real arguments and local constants can be passed in through env. See Also mkterm.ssanova, mkfun.poly, mkfun.tp. mkterm.ssanova Assembling Model Terms for Smoothing Spline ANOVA Models Assemble numerical functions for calculating model terms in a Smoothing Spline ANOVA Model. mkterm.linear(mf, ext) mkterm.linear1(mf, range) mkterm.cubic(mf, ext) mkterm.cubic1(mf, range) mkterm.tp(mf, order, mesh, weight) mf ext range order mesh weight Model frame of the model formula. Size of the buffer zone beyond the data range. Data frame specifying domain range. Order of the differential operator. Normalizing mesh. Normalizing weights. Details These are not to be called by the user. For polynomial splines, ext specifies how far to go beyond the data range percentage wise. For example, if the minimum and maximum values of a variable in mf is 0 and 1 and ext=.05, then the marginal domain on which the model is defined would be [.95, 1.05]. See mkfun.tp for order, mesh, weight.

22 22 nox Value A list object with a component labels containing the labels of all model terms. For each of the model terms, there is a component holding the numerical functions for calculating the unpenalized and penalized parts within the term. Note The numerical functions are assembled using building blocks crafted by mkfun.poly, mkfun.tp, mkrk.factor. nlm0 Minimizing Univariate Functions on Finite Intervals Minimize univariate functions on finite intervals using 3-point quadratic fit, with goldensection safe-guard. nlm0(fun,range,prec=1e-7) fun range prec Function to be minimized. Interval on which the function to be minimized. Desired precision of the solution. Value nlm0 returns a list object with the following components. estimate minimum evaluations Minimizer. Minimum. Number of function evaluations. nox NOx in Engine Exhaust Data from an experiment in which a single-cylinder engine was run with ethanol to see how the NOx concentration in the exhaust depended on the compression ratio and the equivalence ratio. data(nox) Format A data frame containing 88 observations on the following variables.

23 ozone 23 nox comp equi NOx concentration in exhaust. Compression ratio. Equivalence ratio. Source Brinkman, N. D. (1981), Ethanol fuel a single-cylinder engine study of efficiency and exhaust emissions. SAE Transactions, 90, References Cleveland, W. S. and Devlin, S. J. (1988), Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83, Breiman, L. (1991), The pi method for estimating multivariate functions from noisy data. Technometrics, 33, ozone Ozone Concentration in Los Angeles Basin Daily measurements of ozone concentration and eight meteorological quantities in the Los Angeles basin for 330 days of data(ozone) Format A data frame containing 330 observations on the following variables. upo3 Upland ozone concentration, in ppm. vdht Vandenberg 500 millibar height, in meters. wdsp Wind speed, in miles per hour. hmdt Humidity. sbtp Sandburg Air Base temperature, in Celsius. ibht Inversion base height, in foot. dgpg Dagget pressure gradient, in mmhg. ibtp Inversion base temperature, in Fahrenheit. vsty Visibility, in miles. day Calendar day, between 1 and 366. Source Unknown. References Breiman, L. and Friedman, J. H. (1985), Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80,

24 24 predict.ssanova Hastie, T. and Tibshirani, R. (1990), Generalized Additive Models. Chapman and Hall. predict.ssanova Predicting from Smoothing Spline ANOVA Fits Evaluate terms in a smoothing spline ANOVA fit at arbitrary points. Standard errors of the terms can be requested for use in constructing Bayesian confidence intervals. predict.ssanova(object, newdata, se.fit=false, include=object$terms$labels,...) predict.ssanova1(object, newdata, se.fit=false, include=object$terms$labels,...) Value object newdata se.fit include... Ignored. Object of class inheriting from "ssanova". Data frame or model frame in which to predict. Flag indicating if standard errors are required. List of model terms to be included in the prediction. The partial and offset terms, if present, are to be specified by "partial" and "offset", respectively. For se.fit=false, predict.ssanova returns a vector of the evaluated fit. For se.fit=true, predict.ssanova returns a list consisting of the following components. fit se.fit Vector of evaluated fit. Vector of standard errors. Note To supply the partial terms for partial spline models, add a component partial=i(...) in newdata; the as is function I(...) is necessary when partial has more than one column. For mixed-effect models through ssanova1 or gssanova1, the Z matrix is set to 0 if not supplied. To supply the Z matrix, add a component random=i(...) in newdata. Author(s) Chong Gu, chong@stat.purdue.edu References Gu, C. and Wahba, G. (1993), Smoothing spline ANOVA with component-wise Bayesian confidence intervals. Journal of Computational and Graphical Statistics, 2,

25 print.ssanova 25 See Also Fitting functions ssanova, ssanova and methods summary.ssanova, summary.gssanova, fitted.ssanova. Examples ## THE FOLLOWING EXAMPLE IS TIME-CONSUMING ## Not run: ## Fit a model with thin-plate marginals, where geog is 2-D data(lakeacidity) fit <- ssanova(ph~log(cal)*geog,"tp",lakeacidity) ## Obtain estimates and standard errors on a grid new <- data.frame(cal=1,geog=i(matrix(0,1,2))) new <- model.frame(~log(cal)+geog,new) predict(fit,new,se=true) ## Evaluate the geog main effect predict(fit,new,se=true,inc="geog") ## Evaluate the sum of the geog main effect and the interaction predict(fit,new,se=true,inc=c("geog","log(cal):geog")) ## Evaluate the geog main effect on a grid grid <- seq(-.04,.04,len=21) new <- model.frame(~geog,list(geog=cbind(rep(grid,21),rep(grid,rep(21,21))))) est <- predict(fit,new,se=true,inc="geog") ## Plot the fit and standard error par(pty="s") contour(grid,grid,matrix(est$fit,21,21),col=1) contour(grid,grid,matrix(est$se,21,21),add=true,col=2) ## Clean up rm(lakeacidity,fit,new,grid,est) dev.off() ## End(Not run) print.ssanova Print Functions for Smoothing Spline ANOVA Models Print functions for Smoothing Spline ANOVA models. print.gssanova1(x,...) print.ssanova(x,...) print.ssanova1(x,...) print.ssden(x,...) print.sshzd(x,...) print.summary.ssanova(x, digits=6,...) print.summary.gssanova(x, digits=6,...) print.summary.gssanova1(x, digits=6,...)

26 26 project x Object of class ssanova, summary.ssanova, summary.gssanova, or ssden. digits Number of significant digits to be printed in values.... Ignored. See Also ssanova, gssanova, summary.ssanova, summary.gssanova, ssden, sshzd. project Projecting Smoothing Spline ANOVA Fits for Model Diagnostics Calculate Kullback-Leibler projection of smoothing spline ANOVA fits for Model Diagnostics. project(object,...) project.ssanova1(object, include,...) project.gssanova1(object, include,...) project.ssden(object, include, mesh=false,...) project.sshzd(object, include, mesh=false,...) object Object of class "ssanova1", "gssanova1", "ssden", or "sshzd".... Additional arguments. Ignored in project.x. include List of model terms to be included in the reduced model space. The partial and offset terms, if present, are to be specified by "partial" and "offset", respectively. mesh Flag indicating whether to return evaluations of the projection. Details The entropy KL(fit0,null) can be decomposed as the sum of KL(fit0,fit1) and KL(fit1,null), where fit0 is the fit to be projected, fit1 is the projection in the reduced model space, and null is the constant fit. The ratio KL(fit0,fit1)/KL(fit0,null) serves as a diagnostic of the feasibility of the reduced model. For regression fits, smoothness safe-guard is used to prevent interpolation, and KL(fit0,fit1)+KL(fit1,null) may not match KL(fit0,null) perfectly. For mixed-effect models from ssanova1 and gssanova1, the estimated random effects are treated as offset.

27 rkpk 27 Value The functions return a list consisting of the following components. ratio kl check mesh KL(fit0,fit1)/KL(fit0,null); the smaller the value, the more feasible the reduced model is. KL(fit0,fit1). KL(fit0,fit1)/KL(fit0,null)+KL(fit1,null)/KL(fit0,null); a value closer to 1 is preferred. The evaluations of the projection. Author(s) Chong Gu, chong@stat.purdue.edu References Gu, C. (2003), Model Diagnostics for Smoothing Spline ANOVA Models. Available at See Also Fitting functions ssanova1, gssanova1, ssden, and sshzd. rkpk Interface to RKPACK Call RKPACK routines for numerical calculations for fitting and predicting from Smoothing Spline ANOVA models. sspreg(s, q, y, method="v", varht=1) mspreg(s, q, y, method="v", varht=1, prec=1e-7, maxiter=30) sspregpoi(family, s, q, y, wt, offset, method="u", varht=1, nu, prec=1e-7, maxiter=30) mspregpoi(family, s, q, y, wt, offset, method="u", varht=1, nu, prec=1e-7, maxiter=30) getcrdr(obj, r) getsms(obj) s q y method varht prec Design matrix of unpenalized terms. Penalty matrices of penalized terms. Model response. Method for smoothing parameter selection. Assumed dispersion parameter, needed only for method="u". Precision requirement for iterations.

28 28 rkpk1 maxiter family wt offset obj nu r Maximum number of iterations allowed. Error family. Model weights. Model offset. Object returned from a call to sspreg, mspreg, sspregpoi, or mspregpoi. Optional argument for nbinomial, weibull, lognorm, and loglogis families. Inputs for standard error calculation. Details sspreg is used by ssanova to fit Gaussian models with a single smoothing parameter. mspreg is used to fit Gaussian models with multiple single smoothing parameters. sspregpoi is used by gssanova to fit non Gaussian models with a single smoothing parameter. mspregpoi is used to fit non Gaussian models with multiple single smoothing parameters. getcrdr and getsms are used by predict.ssanova to calculate standard errors of the fitted terms. References Gu, C. (1989), RKPACK and its applications: Fitting smoothing spline models. In ASA Proceedings of Statistical Computing Section, pp rkpk1 Numerical Engine for ssanova1 and gssanova1 Calculate penalized least squares regression estimates via the normal equation and evaluate the GCV, GML, or Mallows CL scores, as implemented in the RATFOR routine reg.r, and minimize the cross-validation score using nlm. sspreg1(s,r,q,y,method,alpha,varht,random) mspreg1(s,r,q,y,method,alpha,varht,random) sspngreg1(family,s,r,q,y,wt,offset,alpha,nu,random) mspngreg1(family,s,r,q,y,wt,offset,alpha,nu,random) ngreg1(dc,family,sr,q,y,wt,offset,nu,alpha) ngreg.proj(dc,family,sr,q,y0,wt,offset,nu)

29 rkpk1 29 family s r q y wt offset method alpha nu varht random dc sr y0 of the error distribution. Supported are exponential families "binomial", "poisson", "Gamma", and "nbinomial". Also supported are accelerated life model families "weibull", "lognorm", and "loglogis". Unpenalized terms evaluated at data points. Basis of penalized terms evaluated at data points. Penalty matrix. Response vector. Model weights. Model offset. "v" for GCV, "m" for GML, or "u" for Mallows CL. Parameter modifying GCV or Mallows CL scores for smoothing parameter selection. Optional argument for future support of nbinomial, weibull, lognorm, and loglogis families. External variance estimate needed for method="u". Input for parametric random effects in nonparametric mixed-effect models. Coefficients of fits. cbind(s,r). Components of the fit to be projected. Details sspreg1 is used by ssanova1 to compute regression estimates with a single smoothing parameter. mspreg1 is used by ssanova1 to compute regression estimates with multiple smoothing parameters. ssngpreg1 is used by gssanova1 to compute non-gaussian regression estimates with a single smoothing parameter. mspngreg1 is used by gssanova1 to compute non-gaussian regression estimates with multiple smoothing parameters. ngreg1 is used by ssngpreg1 and mspngreg1 to perform Newton iteration with fixed smoothing parameters and to calculate cross-validation scores on return. ngreg.proj is used by project.gssanova1 to calculate Kullback-Leibler projection for non-gaussian regression. References Kim, Y.-J. and Gu, C. (2002) Penalized Least Squares Regression: Fast Computation via Efficient Approximation. Available at

30 30 ssanova smolyak Generating Smolyak Cubature Generate Smolyak cubatures using C routines modified from smolyak.c found in Knut Petras SMOLPACK. smolyak.quad(d,k) smolyak.size(d,k) d k Dimension of unit cube. Depth of algorithm. Details smolyak.quad and smolyak.size concern the delayed Smolyak cubature. Value smolyak.quad returns a list object with the following components. pt wt Quadrature nodes in rows of matrix. Quadrature weights. smolyak.size returns an integer. ssanova Fitting Smoothing Spline ANOVA Models Fit smoothing spline ANOVA models with cubic spline, linear spline, or thin-plate spline marginals for numerical variables. Factors are also accepted. The symbolic model specification via formula follows the same rules as in lm. ssanova(formula, type="cubic", data=list(), weights, subset, offset, na.action=na.omit, partial=null, method="v", varht=1, prec=1e-7, maxiter=30, ext=.05, order=2)

31 ssanova 31 formula type data weights subset Symbolic description of the model to be fit. Type of numerical marginals to be used. Supported are type="cubic" for cubic spline marginals, type="linear" for linear spline marginals, and type="tp" for thin-plate spline marginals. Optional data frame containing the variables in the model. Optional vector of weights to be used in the fitting process. Optional vector specifying a subset of observations to be used in the fitting process. offset Optional offset term with known parameter 1. na.action partial method varht prec maxiter ext order Details Function which indicates what should happen when the data contain NAs. Optional extra unpenalized terms in partial spline models. Method for smoothing parameter selection. Supported are method="v" for GCV, method="m" for GML (REML), and method="u" for Mallow s CL. External variance estimate needed for method="u". Ignored when method="v" or method="m" are specified. Precision requirement in the iteration for multiple smoothing parameter selection. Ignored when only one smoothing parameter is involved. Maximum number of iterations allowed for multiple smoothing parameter selection. Ignored when only one smoothing parameter is involved. For cubic spline and linear spline marginals, this option specifies how far to extend the domain beyond the minimum and the maximum as a percentage of the range. The default ext=.05 specifies marginal domains of lengths 110 percent of their respective ranges. Prediction outside of the domain will result in an error. Ignored if type="tp" is specified. For thin-plate spline marginals, this option specifies the order of the marginal penalties. Ignored if type="cubic" or type="linear" are specified. ssanova and the affiliated methods provide a front end to RKPACK, a collection of RAT- FOR routines for structural multivariate nonparametric regression via the penalized least squares method. The algorithms implemented in RKPACK are of the orders O(n 3 ) in execution time and O(n 2 ) in memory requirement. The constants in front of the orders vary with the complexity of the model to be fit. The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form y = c + f 1 (x1) + f 2 (x2) + f 12 (x1, x2) + e with the terms denoted by "1", "x1", "x2", and "x1:x2". Through the specifications of the side conditions, these terms are uniquely defined. In the current implementation, f 1 and f 12 integrate to 0 on the x1 domain for cubic spline and linear spline marginals, and add to 0 over the x1 (marginal) sampling points for thin-plate spline marginals. The penalized least squares problem is equivalent to a certain empirical Bayes model or a mixed effect model, and the model terms themselves are generally sums of finer terms of two types, the unpenalized terms (fixed effects) and the penalized terms (random effects).

32 32 ssanova Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters. The method predict can be used to evaluate the sum of selected or all model terms at arbitrary points within the domain, along with standard errors derived from a certain Bayesian calculation. The method summary has a flag to request diagnostics for the practical identifiability and significance of the model terms. Value ssanova returns a list object of class "ssanova". The method summary is used to obtain summaries of the fits. The method predict can be used to evaluate the fits at arbitrary points, along with the standard errors to be used in Bayesian confidence intervals. The methods residuals and fitted.values extract the respective traits from the fits. Factors Factors are accepted as predictors. When a factor has 3 or more levels, all terms involving it are treated as penalized terms, with the level means being shrunk towards each other. The shrinking is done differently for nominal and ordinal factors; see mkrk.factor for details. Note The independent variables appearing in formula can be multivariate themselves. In particular, ssanova(y~x,"tp",order=order) can be used to fit ordinary thin-plate splines in any dimension, of any order permissible, and with standard errors available for Bayesian confidence intervals. Note that thin-plate splines reduce to polynomial splines in one dimension. For univariate marginals, the additive models using type="cubic" and type="tp" yield identical fit through different internal makes. For example, ssanova(y~x1+x2,"cubic") and ssanova(y~x1+x2,"tp") yield the same fit. The same is not true for models with interactions, however. Mathematically, the domain (through ext for type="cubic") or the order (through order for type="tp") could be specified individually for each of the variables. Such flexibility is not provided in our implementation, however, as it would be more a source for confusion than a practical utility. Author(s) Chong Gu, chong@stat.purdue.edu References Gu, C. (2002), Smoothing Spline ANOVA Models. New York: Springer-Verlag. Wahba, G. (1990), Spline Models for Observational Data. Philadelphia: SIAM. See Also Methods predict.ssanova, summary.ssanova, and fitted.ssanova.

33 ssanova1 33 Examples ## Fit a cubic spline x <- runif(100); y < *sin(2*pi*x) + rnorm(x) cubic.fit <- ssanova(y~x,method="m") ## The same fit with different internal makes tp.fit <- ssanova(y~x,"tp",method="m") ## Obtain estimates and standard errors on a grid new <- data.frame(x=seq(min(x),max(x),len=50)) est <- predict(cubic.fit,new,se=true) ## Plot the fit and the Bayesian confidence intervals plot(x,y,col=1); lines(new$x,est$fit,col=2) lines(new$x,est$fit+1.96*est$se,col=3) lines(new$x,est$fit-1.96*est$se,col=3) ## Clean up ## Not run: rm(x,y,cubic.fit,tp.fit,new,est) dev.off() ## End(Not run) ## Fit a tensor product cubic spline data(nox) nox.fit <- ssanova(log10(nox)~comp*equi,data=nox) ## Fit a spline with cubic and nominal marginals nox$comp<-as.factor(nox$comp) nox.fit.n <- ssanova(log10(nox)~comp*equi,data=nox) ## Fit a spline with cubic and ordinal marginals nox$comp<-as.ordered(nox$comp) nox.fit.o <- ssanova(log10(nox)~comp*equi,data=nox) ## Clean up ## Not run: rm(nox,nox.fit,nox.fit.n,nox.fit.o) ssanova1 Fitting Smoothing Spline ANOVA Models Fit smoothing spline ANOVA models with cubic spline, linear spline, or thin-plate spline marginals for numerical variables. Factors are also accepted. The symbolic model specification via formula follows the same rules as in lm. ssanova1(formula, type="cubic", data=list(), weights, subset, offset, na.action=na.omit, partial=null, method="v", alpha=1.4, varht=1, id.basis=null, nbasis=null, seed=null, random=null, ext=.05, order=2) formula type Symbolic description of the model to be fit. Type of numerical marginals to be used. Supported are type="cubic" for cubic spline marginals, type="linear" for linear spline marginals, and type="tp" for thin-plate spline marginals.

Package gss. R topics documented: April 19, 2018

Package gss. R topics documented: April 19, 2018 Version 2.1-8 Date 2018-04-18 Title General Smoothing Splines Author Chong Gu Maintainer Chong Gu Depends R (>= 2.14.0), stats Package gss April 19, 2018 A comprehensive

More information

Package gsscopu. R topics documented: July 2, Version Date

Package gsscopu. R topics documented: July 2, Version Date Package gsscopu July 2, 2015 Version 0.9-3 Date 2014-08-24 Title Copula Density and 2-D Hazard Estimation using Smoothing Splines Author Chong Gu Maintainer Chong Gu

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

100 Myung Hwan Na log-hazard function. The discussion section of Abrahamowicz, et al.(1992) contains a good review of many of the papers on the use of

100 Myung Hwan Na log-hazard function. The discussion section of Abrahamowicz, et al.(1992) contains a good review of many of the papers on the use of J. KSIAM Vol.3, No.2, 99-106, 1999 SPLINE HAZARD RATE ESTIMATION USING CENSORED DATA Myung Hwan Na Abstract In this paper, the spline hazard rate model to the randomly censored data is introduced. The

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response

More information

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer Ludwig Fahrmeir Gerhard Tute Statistical odelling Based on Generalized Linear Model íecond Edition. Springer Preface to the Second Edition Preface to the First Edition List of Examples List of Figures

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Package gamm4. July 25, Index 10

Package gamm4. July 25, Index 10 Version 0.2-5 Author Simon Wood, Fabian Scheipl Package gamm4 July 25, 2017 Maintainer Simon Wood Title Generalized Additive Mixed Models using 'mgcv' and 'lme4' Description

More information

Package gplm. August 29, 2016

Package gplm. August 29, 2016 Type Package Title Generalized Partial Linear Models (GPLM) Version 0.7-4 Date 2016-08-28 Author Package gplm August 29, 2016 Maintainer Provides functions for estimating a generalized

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial

More information

Minitab 18 Feature List

Minitab 18 Feature List Minitab 18 Feature List * New or Improved Assistant Measurement systems analysis * Capability analysis Graphical analysis Hypothesis tests Regression DOE Control charts * Graphics Scatterplots, matrix

More information

Data-Splitting Models for O3 Data

Data-Splitting Models for O3 Data Data-Splitting Models for O3 Data Q. Yu, S. N. MacEachern and M. Peruggia Abstract Daily measurements of ozone concentration and eight covariates were recorded in 1976 in the Los Angeles basin (Breiman

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

davidr Cornell University

davidr Cornell University 1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

Package survivalmpl. December 11, 2017

Package survivalmpl. December 11, 2017 Package survivalmpl December 11, 2017 Title Penalised Maximum Likelihood for Survival Analysis Models Version 0.2 Date 2017-10-13 Author Dominique-Laurent Couturier, Jun Ma, Stephane Heritier, Maurizio

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

JMP Book Descriptions

JMP Book Descriptions JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

The brlr Package. March 22, brlr... 1 lizards Index 5. Bias-reduced Logistic Regression

The brlr Package. March 22, brlr... 1 lizards Index 5. Bias-reduced Logistic Regression The brlr Package March 22, 2006 Version 0.8-8 Date 2006-03-22 Title Bias-reduced logistic regression Author David Firth URL http://www.warwick.ac.uk/go/dfirth Maintainer David Firth

More information

Approximate Smoothing Spline Methods for Large Data Sets in the Binary Case Dong Xiang, SAS Institute Inc. Grace Wahba, University of Wisconsin at Mad

Approximate Smoothing Spline Methods for Large Data Sets in the Binary Case Dong Xiang, SAS Institute Inc. Grace Wahba, University of Wisconsin at Mad DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 982 September 30, 1997 Approximate Smoothing Spline Methods for Large Data Sets in the Binary

More information

Statistical Modeling with Spline Functions Methodology and Theory

Statistical Modeling with Spline Functions Methodology and Theory This is page 1 Printer: Opaque this Statistical Modeling with Spline Functions Methodology and Theory Mark H. Hansen University of California at Los Angeles Jianhua Z. Huang University of Pennsylvania

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Package ICsurv. February 19, 2015

Package ICsurv. February 19, 2015 Package ICsurv February 19, 2015 Type Package Title A package for semiparametric regression analysis of interval-censored data Version 1.0 Date 2014-6-9 Author Christopher S. McMahan and Lianming Wang

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Package bgeva. May 19, 2017

Package bgeva. May 19, 2017 Version 0.3-1 Package bgeva May 19, 2017 Author Giampiero Marra, Raffaella Calabrese and Silvia Angela Osmetti Maintainer Giampiero Marra Title Binary Generalized Extreme Value

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Package bigsplines. May 25, 2018

Package bigsplines. May 25, 2018 Type Package Title Smoothing Splines for Large Samples Version 1.1-1 Date 2018-05-25 Author Nathaniel E. Helwig Package bigsplines May 25, 2018 Maintainer Nathaniel E. Helwig

More information

CS 450 Numerical Analysis. Chapter 7: Interpolation

CS 450 Numerical Analysis. Chapter 7: Interpolation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

Technical Support Minitab Version Student Free technical support for eligible products

Technical Support Minitab Version Student Free technical support for eligible products Technical Support Free technical support for eligible products All registered users (including students) All registered users (including students) Registered instructors Not eligible Worksheet Size Number

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Semiparametric Tools: Generalized Additive Models

Semiparametric Tools: Generalized Additive Models Semiparametric Tools: Generalized Additive Models Jamie Monogan Washington University November 8, 2010 Jamie Monogan (WUStL) Generalized Additive Models November 8, 2010 1 / 33 Regression Splines Choose

More information

Non-Parametric and Semi-Parametric Methods for Longitudinal Data

Non-Parametric and Semi-Parametric Methods for Longitudinal Data PART III Non-Parametric and Semi-Parametric Methods for Longitudinal Data CHAPTER 8 Non-parametric and semi-parametric regression methods: Introduction and overview Xihong Lin and Raymond J. Carroll Contents

More information

Package simsurv. May 18, 2018

Package simsurv. May 18, 2018 Type Package Title Simulate Survival Data Version 0.2.2 Date 2018-05-18 Package simsurv May 18, 2018 Maintainer Sam Brilleman Description Simulate survival times from standard

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica Boston-Keio Workshop 2016. Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica... Mihoko Minami Keio University, Japan August 15, 2016 Joint work with

More information

Unified Methods for Censored Longitudinal Data and Causality

Unified Methods for Censored Longitudinal Data and Causality Mark J. van der Laan James M. Robins Unified Methods for Censored Longitudinal Data and Causality Springer Preface v Notation 1 1 Introduction 8 1.1 Motivation, Bibliographic History, and an Overview of

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Package slp. August 29, 2016

Package slp. August 29, 2016 Version 1.0-5 Package slp August 29, 2016 Author Wesley Burr, with contributions from Karim Rahim Copyright file COPYRIGHTS Maintainer Wesley Burr Title Discrete Prolate Spheroidal

More information

Stochastic Gradient Boosting with Squared Error Epsilon Insensitive Loss

Stochastic Gradient Boosting with Squared Error Epsilon Insensitive Loss Stochastic Gradient Boosting with Squared Error Epsilon Insensitive Loss Ken Lau Statistics, UBC ken.lau@stat.ubc.ca Abstract We often deal with outliers in air quality data. By fitting our to all the

More information

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions Technical Support Free technical support Worksheet Size All registered users, including students Registered instructors Number of worksheets Limited only by system resources 5 5 Number of cells per worksheet

More information

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition book 2014/5/6 15:21 page v #3 Contents List of figures List of tables Preface to the second edition Preface to the first edition xvii xix xxi xxiii 1 Data input and output 1 1.1 Input........................................

More information

Package GWRM. R topics documented: July 31, Type Package

Package GWRM. R topics documented: July 31, Type Package Type Package Package GWRM July 31, 2017 Title Generalized Waring Regression Model for Count Data Version 2.1.0.3 Date 2017-07-18 Maintainer Antonio Jose Saez-Castillo Depends R (>= 3.0.0)

More information

Package dglm. August 24, 2016

Package dglm. August 24, 2016 Version 1.8.3 Date 2015-10-27 Title Double Generalized Linear Models Package dglm August 24, 2016 Author Peter K Dunn and Gordon K Smyth Maintainer Robert Corty

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software June 2014, Volume 58, Issue 5. http://www.jstatsoft.org/ Smoothing Spline ANOVA Models: R Package gss Chong Gu Purdue University Abstract This document provides a brief

More information

Minitab detailed

Minitab detailed Minitab 18.1 - detailed ------------------------------------- ADDITIVE contact sales: 06172-5905-30 or minitab@additive-net.de ADDITIVE contact Technik/ Support/ Installation: 06172-5905-20 or support@additive-net.de

More information

Package FWDselect. December 19, 2015

Package FWDselect. December 19, 2015 Title Selecting Variables in Regression Models Version 2.1.0 Date 2015-12-18 Author Marta Sestelo [aut, cre], Nora M. Villanueva [aut], Javier Roca-Pardinas [aut] Maintainer Marta Sestelo

More information

The glmmml Package. August 20, 2006

The glmmml Package. August 20, 2006 The glmmml Package August 20, 2006 Version 0.65-1 Date 2006/08/20 Title Generalized linear models with clustering A Maximum Likelihood and bootstrap approach to mixed models. License GPL version 2 or newer.

More information

Learn What s New. Statistical Software

Learn What s New. Statistical Software Statistical Software Learn What s New Upgrade now to access new and improved statistical features and other enhancements that make it even easier to analyze your data. The Assistant Data Customization

More information

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering Encoding UTF-8 Version 1.0.3 Date 2018-03-25 Title Generalized Linear Models with Clustering Package glmmml March 25, 2018 Binomial and Poisson regression for clustered data, fixed and random effects with

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

CH9.Generalized Additive Model

CH9.Generalized Additive Model CH9.Generalized Additive Model Regression Model For a response variable and predictor variables can be modeled using a mean function as follows: would be a parametric / nonparametric regression or a smoothing

More information

QstatLab: software for statistical process control and robust engineering

QstatLab: software for statistical process control and robust engineering QstatLab: software for statistical process control and robust engineering I.N.Vuchkov Iniversity of Chemical Technology and Metallurgy 1756 Sofia, Bulgaria qstat@dir.bg Abstract A software for quality

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

Package GAMBoost. February 19, 2015

Package GAMBoost. February 19, 2015 Version 1.2-3 Package GAMBoost February 19, 2015 Title Generalized linear and additive models by likelihood based boosting Author Harald Binder Maintainer Harald Binder

More information

Package acebayes. R topics documented: November 21, Type Package

Package acebayes. R topics documented: November 21, Type Package Type Package Package acebayes November 21, 2018 Title Optimal Bayesian Experimental Design using the ACE Algorithm Version 1.5.2 Date 2018-11-21 Author Antony M. Overstall, David C. Woods & Maria Adamou

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

Package intccr. September 12, 2017

Package intccr. September 12, 2017 Type Package Package intccr September 12, 2017 Title Semiparametric Competing Risks Regression under Interval Censoring Version 0.2.0 Author Giorgos Bakoyannis , Jun Park

More information

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value

More information

The gbev Package. August 19, 2006

The gbev Package. August 19, 2006 The gbev Package August 19, 2006 Version 0.1 Date 2006-08-10 Title Gradient Boosted Regression Trees with Errors-in-Variables Author Joe Sexton Maintainer Joe Sexton

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Package citools. October 20, 2018

Package citools. October 20, 2018 Type Package Package citools October 20, 2018 Title Confidence or Prediction Intervals, Quantiles, and Probabilities for Statistical Models Version 0.5.0 Maintainer John Haman Functions

More information

Abstract R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special c

Abstract R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special c R-Splines for Response Surface Modeling July 12, 2000 Sarah W. Hardy North Carolina State University Douglas W. Nychka National Center for Atmospheric Research Raleigh, NC 27695-8203 Boulder, CO 80307

More information

TECHNICAL REPORT NO December 11, 2001

TECHNICAL REPORT NO December 11, 2001 DEPARTMENT OF STATISTICS University of Wisconsin 2 West Dayton St. Madison, WI 5376 TECHNICAL REPORT NO. 48 December, 2 Penalized Log Likelihood Density Estimation, via Smoothing-Spline ANOVA and rangacv

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Time Series Analysis by State Space Methods

Time Series Analysis by State Space Methods Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY

More information

Package sprinter. February 20, 2015

Package sprinter. February 20, 2015 Type Package Package sprinter February 20, 2015 Title Framework for Screening Prognostic Interactions Version 1.1.0 Date 2014-04-11 Author Isabell Hoffmann Maintainer Isabell Hoffmann

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Multivariable Regression Modelling

Multivariable Regression Modelling Multivariable Regression Modelling A review of available spline packages in R. Aris Perperoglou for TG2 ISCB 2015 Aris Perperoglou for TG2 Multivariable Regression Modelling ISCB 2015 1 / 41 TG2 Members

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Package addhaz. September 26, 2018

Package addhaz. September 26, 2018 Package addhaz September 26, 2018 Title Binomial and Multinomial Additive Hazard Models Version 0.5 Description Functions to fit the binomial and multinomial additive hazard models and to estimate the

More information

SAS High-Performance Analytics Products

SAS High-Performance Analytics Products Fact Sheet What do SAS High-Performance Analytics products do? With high-performance analytics products from SAS, you can develop and process models that use huge amounts of diverse data. These products

More information

Package DTRreg. August 30, 2017

Package DTRreg. August 30, 2017 Package DTRreg August 30, 2017 Type Package Title DTR Estimation and Inference via G-Estimation, Dynamic WOLS, and Q-Learning Version 1.3 Date 2017-08-30 Author Michael Wallace, Erica E M Moodie, David

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs, GAMMs and other penalied GLMs using mgcv in R Simon Wood Mathematical Sciences, University of Bath, U.K. Simple eample Consider a very simple dataset relating the timber volume of cherry trees to

More information

Package bayesdp. July 10, 2018

Package bayesdp. July 10, 2018 Type Package Package bayesdp July 10, 2018 Title Tools for the Bayesian Discount Prior Function Version 1.3.2 Date 2018-07-10 Depends R (>= 3.2.3), ggplot2, survival, methods Functions for data augmentation

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

Independent Components Analysis through Product Density Estimation

Independent Components Analysis through Product Density Estimation Independent Components Analysis through Product Density Estimation 'frevor Hastie and Rob Tibshirani Department of Statistics Stanford University Stanford, CA, 94305 { hastie, tibs } @stat.stanford. edu

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K. A toolbo of smooths Simon Wood Mathematical Sciences, University of Bath, U.K. Smooths for semi-parametric GLMs To build adequate semi-parametric GLMs requires that we use functions with appropriate properties.

More information

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Al Kadiri M. A. Abstract Benefits of using Ranked Set Sampling (RSS) rather than Simple Random Sampling (SRS) are indeed significant

More information

The grplasso Package

The grplasso Package The grplasso Package June 27, 2007 Type Package Title Fitting user specified models with Group Lasso penalty Version 0.2-1 Date 2007-06-27 Author Lukas Meier Maintainer Lukas Meier

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

The GLMMGibbs Package

The GLMMGibbs Package The GLMMGibbs Package April 22, 2002 Version 0.5-1 Author Jonathan Myles and David Clayton Maintainer Jonathan Myles Depends R (>= 1.0) Date 2001/22/01 Title

More information

Using the DATAMINE Program

Using the DATAMINE Program 6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

More information

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11

More information

The picasso Package for High Dimensional Regularized Sparse Learning in R

The picasso Package for High Dimensional Regularized Sparse Learning in R The picasso Package for High Dimensional Regularized Sparse Learning in R X. Li, J. Ge, T. Zhang, M. Wang, H. Liu, and T. Zhao Abstract We introduce an R package named picasso, which implements a unified

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 9, 2017 1 Outline Multivariate ordinary least squares Matlab code:

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Package SmoothHazard

Package SmoothHazard Package SmoothHazard September 19, 2014 Title Fitting illness-death model for interval-censored data Version 1.2.3 Author Celia Touraine, Pierre Joly, Thomas A. Gerds SmoothHazard is a package for fitting

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information