Non-Parametric and Semi-Parametric Methods for Longitudinal Data

Size: px

Start display at page:

Download "Non-Parametric and Semi-Parametric Methods for Longitudinal Data"

Jayson Cross
6 years ago
Views:

1 PART III Non-Parametric and Semi-Parametric Methods for Longitudinal Data

3 CHAPTER 8 Non-parametric and semi-parametric regression methods: Introduction and overview Xihong Lin and Raymond J. Carroll Contents 8.1 Introduction and overview Brief review of non-parametric and semi-parametric regression methods for independent data Local polynomial kernels Smoothing splines Regression splines and penalized splines (P-splines) Overview of non-parametric and semi-parametric regression for longitudinal data References Introduction and overview Parametric regression methods for longitudinal data have been well developed in the last 20 years. Such methods can be classified broadly as estimating equation based methods, such as generalized estimating equations (Liang and Zeger, 1986), and their extensions (Chapter 3), and mixed-effects models (Laird and Ware, 1982; Breslow and Clayton, 1993; see also Chapter 4). Diggle et al. (2002) provide an excellent overview of these parametric regression methods. For recent developments, see Chapter 3 through Chapter 6. A major limitation of these methods is that the relationship of the mean of a longitudinal response to covariates is assumed fully parametric. Although such parametric mean models enjoy simplicity, they have suffered from inflexibility in modeling complicated relationships between the response and covariates in various longitudinal studies. Examples include hormone profiles in a menstrual cycle in reproductive health (Brumback and Rice, 1998; Zhang et al., 1998); longitudinal CD4 trajectories in AIDS research (Zeger and Diggle, 1994; Lin and Ying, 2001); age effects on childhood respiratory disease (Diggle et al., 2002; Lin and Zhang, 1999); time trajectories in speech research and growth curves (Brumback and Lindstrom, 2004; Gasser et al., 1984); time-varying treatment/exposure effects (Hogan, Lin, and Herman, 2004; Huang, Wu, and Zhou, 2002); and time course analysis of microarray gene expressions (Luan and Li, 2003; Storey et al., 2005). These practical applications have placed a strong demand in the last 10 years on developing non-parametric and semiparametric regression methods for longitudinal data, where flexible functional forms can be estimated from the data to capture possibly complicated relationships between longitudinal outcomes and covariates. Non-parametric and semi-parametric regression methods for independent data have been well developed in the last two decades. Non-parametric regression methods can be broadly

4 192 NON-PARAMETRIC AND SEMI-PARAMETRIC REGRESSION METHODS classified into kernel methods (Wand and Jones, 1995), which are often based on local likelihoods (Fan and Gijbels, 1996), and splines, which include smoothing splines (Green and Silverman, 1994; Wahba, 1990), penalized splines (Eilers and Marx, 1996; Ruppert, Wand, and Carroll, 2003), and regression splines (Stone et al., 1997). Both smoothing splines and penalized splines are based on penalized likelihoods. Silverman (1984) demonstrated a close connection between kernel smoothing and smoothing spline smoothing, and showed that kernels and smoothing splines are asymptotically equivalent for independent data and that splines are higher-order kernels. Semi-parametric regression methods for independent data have been equally well developed (Härdle, Liang, and Gao, 1999; Green and Silverman, 1994, Chapter 4). Such models are sometimes referred to as (generalized) partial linear models, where the mean or the transformed mean (by a parametric link function) of an outcome variable is modeled in terms of parametric functions of a subset of the covariates and non-parametric functions of other covariates. Profile-kernel and profile-spline methods have been proposed for estimation in such partial linear models (Heckman, 1984; Speckman, 1988; Carroll et al., 1997). Non-parametric and semi-parametric regression methods for longitudinal data using kernel and spline methods have enjoyed substantial developments in the last 10 years. Chapter 9 through Chapter 12 provide reviews of these methods. To help the reader understand these developments for longitudinal data, in the next section we provide an overview of non-parametric and semi-parametric regression methods using kernels and splines for independent data. 8.2 Brief review of non-parametric and semi-parametric regression methods for independent data Local polynomial kernels Traditional kernel regression estimates a non-parametric regression function at a target point using local weighted averages; for example, the Nadaraya Watson estimator. The most popular kernel regression method is local polynomial regression (Wand and Jones, 1994; Fan and Gijbels, 1996). Consider the simplest non-parametric regression model, Y i = θ(z i )+ɛ i, (8.1) where Y i is a scalar continuous outcome, Z i is a scalar covariate, θ(z) is an unknown smooth function, and ɛ i N(0,σ 2 ) and is independent and identically distributed (i =1,...,N). The idea of the local dth-order polynomial regression estimator of θ(z) is to approximate θ(z i ) locally around any arbitrary point z by a dth-order polynomial as θ(z i ) α α d (Z i z) d = Z i (z) α, where Z i (z) ={1,...,(Z i z) d } and α =(α 0,...,α d ), and to estimate α by maximizing the local log-likelihood, apart from a constant, defined as 1 N 2σ 2 K h (Z i z){y i Z i (z) α} 2, i=1 where K h (s) =h 1 K(s/h), h is a bandwidth, and K( ) is a kernel function, which is often chosen as a symmetric density function with mean 0. Commonly used kernel functions include the Gaussian, uniform, and Epanechnikov kernels, the latter being K(s) = 3 4 (1 s 2 ) +, where a + = a if a>0 and 0 otherwise. The resulting kernel estimating equation is N Z i (z)k h (Z i z){y i Z i (z) α} =0. (8.2) i=1 The dth-order kernel estimator at the target point z is θ(z) = α 0.Ifd = 0, we have the traditional local average kernel estimator, which corresponds to the Nadaraya Watson

5 INDEPENDENT DATA 193 estimator, θ(z) = N i=1 K h(z i z)y i N i=1 K h(z i z). (8.3) The local linear kernel estimator (d = 1) has been commonly used because of its better bias properties. Bandwidth selection is important in kernel smoothing. The bandwidth h could be selected using cross-validation. Other approaches include plug-in estimators (Wand and Jones, 1994; Fan and Gijbels, 1996) and empirical bias bandwidth selection (Ruppert, 1997), among others. A key feature of kernel smoothing for independent data is that it is local in the sense that θ(z) places more weight on the observations when Z i values are in the neighborhood of z, and downweights those observations that are far from z. This can be seen from (8.3). Specifically, as the bandwidth h 0 and the sample size N, only the observations in the shrinking neighborhood of z contribute to the estimation of θ(z). As we will see, this locality property is no longer true for superior kernel and spline smoothing for longitudinal data (Chapter 9; see also Lin et al., 2004; Welsh, Lin, and Carroll, 2002). Fan (1993) showed that the local polynomial kernel estimator enjoys minimax efficiency among the class of all linear smoothers. The Epanechnikov kernel is optimal in the sense that it minimizes the mean squared error of the local polynomial kernel estimator. The local linear polynomial kernel estimator (8.2) can be extended easily to non-parametric regression for non-normal outcomes within the generalized linear model framework (Fan and Gijbels, 1996, Chapter 5) Smoothing splines A smoothing spline estimates the non-parametric regression function θ(z) using a piecewise polynomial function with all the observed covariate values {Z i } used as knots, where smoothness constraints are assumed at the knots (Wahba, 1990; Green and Silverman, 1994). The most commonly used smoothing spline is the natural cubic smoothing spline, which assumes θ(z) is a piecewise cubic function, is linear outside of min(z i ) and max(z i ), and is continuous and twice differentiable with a step function third derivative at the knots {Z i }. The natural cubic smoothing spline estimator can be obtained by maximizing a penalized log-likelihood as follows. Under the simple non-parametric model (8.1), the penalized log-likelihood can be written as [ 1 N ] [ ] 2σ 2 {Y i θ(z i )} 2 λ {θ (2) (z)} 2 dz = 1 N 2σ 2 {Y i θ(z i )} 2 λθ Ψθ, i=1 where λ is a smoothing parameter, Ψ is the cubic smoothing spline penalty matrix (Green and Silverman, 1994, Equation 2.3), θ = {θ(z 1 ),...,θ(z n )}, and θ (2) (z) denotes the second derivative of θ(z). The smoothing parameter λ controls the goodness of fit and the smoothness of the curve. The smoothing spline estimator interpolates the data if λ = 0, and assumes θ(z) to be linear if λ. The resulting cubic smoothing spline estimator takes the form of a ridge regression estimator, θ =(I + λψ) 1 Y, (8.4) where Y =(Y 1,...,Y n ). Efficient algorithms, such as the Reinsch algorithm (Green and Silverman, 1994), can be used to calculate θ in O(N) arithmetic operations. The smoothing parameter λ can be estimated using cross-validation, generalized cross-validation (Wahba, 1990), and general maximum likelihood (GML) (Wahba, 1985). i=1

6 194 NON-PARAMETRIC AND SEMI-PARAMETRIC REGRESSION METHODS There is a close connection between a smoothing spline estimator and a linear mixed model. Specifically, the GML esimator corresponds to the restricted maximum likelihood estimator in the corresponding mixed model. Such a connection, as well as the Bayesian formulation of the smoothing spline, will be discussed in more detail in Chapters 9, 11, and 12. Silverman (1984) showed that the smoothing spline estimator is asymptotically equivalent to the local average kernel estimator. Using Silverman s (1984) results, Nychka (1995) established the asymptotic properties of the smoothing spline estimator (8.4) by deriving its asymptotic bias and variance. Smoothing spline estimation has been extended to generalized linear models (Green and Silverman, 1994) and generalized additive models (Hastie and Tibshirani, 1990). Bayesian spline estimation can be found in Hastie and Tibshirani (2000). More discussions of the use of smoothing splines in longitudinal data can be found in Chapter 9 and Chapter Regression splines and penalized splines (P-splines) A key advantage of a smoothing spline is that all the observed design points are used as knots. Hence, one does not need to choose knots. However, when the sample size is large, computational demands are significantly increased and make it difficult to compute. Regression splines (Stone et al., 1997) are a basis function-based non-parametric regression method, which uses a small number of knots and proceeds with a parametric regression using the bases. Denote by {s 1,...,s L } a set of L knots, where L is often small (e.g., 5 or 6), and by {B 1 (z),...,b L (z)} a set of basis functions (e.g., B-spline basis or plus-function basis). For the simple non-parametric regression (8.1), one approximates θ(z) by θ(z) L B l (z)α l. (8.5) l=1 Then one estimates α =(α 1,...,α L ) by fitting the parametric model L Y i = B l (z)α l + ɛ i, (8.6) l=1 via standard least squares. The resulting non-parametric regression spline estimator of θ(z) is θ(z) = L l=1 B l(z) α l, where α is the maximum likelihood estimate under (8.6). A key advantage of the regression spline is its computational simplicity, since one only needs to fit a parametric model. However, choices of the number of knots and the locations of the knots are critical. Estimation of θ(z) could be sensitive to these choices. Adaptive knot allocation strategies have been recommended (Stone et al., 1997). Penalized splines (P-splines) are a hybrid of regression splines and smoothing splines (Eilers and Marx, 1996; Ruppert, Wand, and Carroll, 2003). One approximates θ(z) using the basis expansion (8.5) with a large number of knots L, where L is often much smaller than the sample size N but much larger than the number of knots often used in regression splines (e.g., L = 20 to 30). P-spline estimation proceeds by fitting (8.6) with a quadratic penalty on {α l }. For example, if the {B l (z)} are plus basis functions and L is the number of interior knots, the dth-order P-spline model is L θ(z; α) =α 0 + α 1 z + + α d z d + α l+d (z s l ) d +. One estimates θ(z; α) by maximizing the penalized log-likelihood, apart from a constant, { } 1 N L 2σ 2 {Y i θ(z i ; α)} 2 λ αl 2. i=1 l=1 l=1

7 REFERENCES 195 If {B l (z)} are B-spline basis functions, a second-order difference penalty of α can be used (Fahrmeir, Kneib, and Lang, 2004; Lang and Brezger, 2004). A key advantage of P-splines is that they reduce the computational burden of smoothing splines when the sample size is large, and are less sensitive to the allocation of the knots compared to regression splines. The smoothing parameter can be treated as a variance component using the connection between P-splines and mixed models and can be estimated using restricted maximum likelihood. Several recent attempts have been made to understand the theoretical properties of P-splines in special situations (Hall and Opsomer, 2005). More details about the use of regression splines and P-splines in longitudinal data can be found in Chapter Overview of non-parametric and semi-parametric regression for longitudinal data Although non-parametric and semi-parametric regression methods have been well developed for independent data, their developments for longitudinal data have only occurred in recent years. A major difficulty in the analysis of longitudinal data is that the data are subject to within-subject correlation among repeated measures over time. This correlation presents significant challenges in the development of kernel and spline smoothing methods for longitudinal data; in particular, a need for developing non-conventional smoothing methods and a better understanding of their properties. Specifically, traditional local likelihood based kernel methods are not able to effectively account for the within-subject correlation (Lin and Carroll, 2000). A consistent and efficient non-parametric estimator for longitudinal data needs to be non-local (Welsh, Lin, and Carroll, 2002; Lin et al., 2004). Standard functional data analysis techniques are not directly applicable to longitudinal data, as repeated measures are often obtained at irregular sparse time points and are often more noisy (Yao, Müller, and Wang, 2005). Chapter 9 provides an overview of both estimating equation based methods and likelihood based methods for non-parametric and semi-parametric regression using kernel and spline smoothing for longitudinal data. Chapter 10 surveys the use of functional data analysis methods for non-parametric regression in longitudinal data by treating data as samples of random curves. Chapter 11 reviews smoothing spline methods for longitudinal data, while Chapter 12 reviews penalized spline methods. One can find in these chapters detailed discussions of the attractive connection between spline estimation and mixed models (Brumback and Rice, 1998; Wang, 1998; Zhang et al., 1998; Lin and Zhang, 1999; Verbyla et al., 1999). References Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88, Brumback, B. and Rice, J. A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion). Journal of the American Statistical Association 93, Brumback, L. C. and Lindstrom, M. J. (2004). Self modeling with flexible, random time transformations. Biometrics 60, Carroll, R. J., Fan, J., Gijbels, I., and Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association 92, Diggle, P. J., Heagerty, P. J., Liang, K. Y., and Zeger, S. L. (2002). Analysis of Longitudinal Data. Oxford: Oxford University Press. Eilers, P. H. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalities (with discussion). Statistical Science 11, Fahrmeir, L., Kneib, T., and Lang, S. (2004). Penalized structured additive regression for space-time data: A Bayesian perspective. Statistica Sinica 14,

8 196 INTRODUCTION AND OVERVIEW Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Annals of Statistics 21, Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. London: Chapman & Hall. Gasser, T., Müller, H. G., Köhler, W., Molinari, L., and Prader, A. (1984). Nonparametric regression analysis of growth curves. Annals of Statistics 12, Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London: Chapman & Hall. Härdle, W., Liang, H., and Gao, J. (1999). Partially Linear Models. New York: Springer-Verlag. Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. London: Chapman & Hall. Hastie, T. and Tibshirani, R. (2000). Bayesian backfitting. Statistical Science 15, Hall, P. and Opsomer, J. (2005). Theory for penalised spline regression. Biometrika 92, Heckman, N. (1984). Spline smoothing in partial linear models. Journal of the American Statistical Association 48, Hogan, J. W., Lin, X., and Herman, B. (2004) Mixtures of varying coefficient models for longitudinal data with discrete or continuous nonignorable dropout. Biometrics 60, Huang, J., Wu, C., and Zhou, L. (2002). Varying-coefficient models and basis function approximation for the analysis of repeated measures. Biometrika 89, Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38, Lang, S. and Brezger, A. (2004). Bayesian P-splines. Journal of Computational and Graphical Statistics 13, Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, Lin, D. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association 96, Lin, X. and Carroll, R. J. (2000). Nonparametric function estimation for clustered data when the predictor is measured without/with error. Journal of the American Statistical Association 95, Lin, X. and Zhang, D. (1999). Inference in generalized additive mixed model using smoothing splines. Journal of the Royal Statistical Society, Series B 61, Lin, X., Wang, N., Welsh, A., and Carroll, R. J. (2004). Equivalent kernels of smoothing splines in nonparametric regression for clustered data. Biometrika 91, Luan, Y. and Li, H. (2003). Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 19, Nychka, D. (1995). Splines as local smoothers. Annals of Statistics 23, Ruppert, D. (1997). Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. Journal of the American Statistical Association 92, Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge: Cambridge University Press. Silverman, B. (1984). Spline smoothing: the equivalent variable kernel method. Annals of Statistics 12, Speckman, P. (1988). Kernel smoothing in partial linear models. Journal of the Royal Statistical Society, Series B 50, Stone, C.J., Hansen, M., Kooperberg, C., and Truong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussion). Annals of Statistics 25, Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G., and Davis, R. W. (2005). Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences 102,

9 REFERENCES 197 Verbyla, A. P., Cullis, B. R., Kenward, M. G., and Welham, S. J. (1999). The analysis of designed experiments and longitudinal data using smoothing splines. Applied Statistics 48, Wahba, G. (1985). A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline problem. Annals of Statistics 13, Wahba, G. (1990). Spline Models for Observational Data. Philadephia: SIAM. Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. London: Chapman & Hall. Wang, Y. (1998). Mixed effects smoothing spline analysis of variance. Journal of the Royal Statistical Society, Series B 60, Welsh, A. H., Lin, X., and Carroll, R. J. (2002). Marginal longitudinal nonparametric regression: Locality and efficiency of spline and kernel methods. Journal of the American Statistical Association 97, Zeger, S. L. and Diggle, P. J. (1994). Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50, Zhang, D., Lin, X., Raz, J., and Sowers, M. (1998). Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association 93,

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested