Constrained Penalized Splines

Size: px
Start display at page:

Download "Constrained Penalized Splines"

Transcription

1 Constrained Penalized Splines Mary C Meyer Colorado State University October 15, 2010 Summary The penalized spline is a popular method for function estimation when the assumption of smoothness is valid. In this paper, methods for estimation and inference are proposed using penalized splines under the additional constraints of shape, such as monotonicity or convexity. The constrained penalized spline estimator is shown to have the same convergence rates as the corresponding unconstrained penalized spline, although in practice the squared error loss is typically smaller for the constrained versions. The penalty parameter may be chosen with generalized cross-validation, which also provides a method for determining if the shape restrictions hold. The method is not a formal hypothesis test, but is shown to have nice large-sample properties, and simulations show that is compares well with existing tests for monotonicity. Extensions to the partial linear model, the generalized regression model, and the varying coefficient model are given, and several examples demonstrate the utility of the methods. Keywords: shape-restricted regression, quadratic programming, cone projection, test for monotonicity, isotonic, convex regression 1

2 1 Introduction and Background Consider the problem of estimating a function f from a scatterplot of data {(x i, y i )}, i = 1,..., n, using the model y i = f(x i ) + σɛ i (1) where x i [0, 1] and the ɛ i are iid random variables with mean zero and unity variance. Smoothing methods include kernel methods and local polynomial regression, smoothing splines, and regression splines. The isotonic regression estimator (Brunk 1955) minimizes the sum of squared residuals over monotonicity assumption; the estimator is a step function. Several methods are available if f is assumed to monotone increasing as well as smooth. Friedman and Tibshirani (1984) smoothed the isotonic regression estimator using a running average. Mammen (1991) proposed a two-step method in which a kernel smoother is applied to the isotonic regression estimator, or the kernel smoother is isotonized. Utreras (1986) provided the characterization and convergence rates for the monotone smoothing spline estimator that minimizes n 1 [y i f(x i )] 2 + λ [f (k) (x)] 2 dx i=1 0 over the monotone non-decreasing functions f with continuous kth derivatives. Tantiyaswasdikul and Woodroofe (1994) obtained the solution for k = 1. The estimators are difficult to obtain for k 2, however, because imposing the constraints at the observations is not sufficient to guarantee the constraints hold everywhere. Additional knots must be placed between observations and a general algorithm for the exact placement is not known. (Wang and Li 2008). Mammen and Thomas-Agnan (1999) and showed that the shape-constrained smoothing spline estimator is equivalent to a two-step procedure and provided convergence rates. Mammen et al. (2001) generalized the idea of projecting a smoothed estimator onto a constrained subset of the estimation space. He and Shi (1998) proposed monotone B-spline smoothing with L 1 optimization, using linear programming to obtain the estimator. 2

3 In this paper the constrained penalized regression spline estimator is considered. For the method of polynomial regression splines, we assume that f C p+1 [0, 1] for some integer p 0, and specify a set of knots 0 = t 1 < < t k = 1. A set of m = k + p 1 basis functions δ 1 (x),..., δ m (x) are defined with corresponding basis vectors δ 1,..., δ m where δ ji = δ j (x i ) so that the functions span the space of smooth piecewise degree-p polynomial regression functions with the given knots, and the basis vectors span an m-dimensional subspace of IR n. The coefficients of the linear combination that minimizes the residual sum of squares is found using ordinary least-squares regression techniques, so that if B is the n m matrix whose columns are the δ j vectors, the coefficients of the least-squares estimator are b = (B B) 1 B y. Then f(x) = m j=1 bj δ j (x) is the regression spline estimate of f. There are many possibilities for the set of basis functions; the B-splines are a standard choice, defined so that each basis vector is orthogonal to all but a few of the others. These are shown for k = 5 in Figure 1(a) for p = 2 and (b) for p = 3. The curves represent the basis functions and the dots mark the values of the basis vectors for n = 50 equally spaced x i. See De Boor (2001) for a thorough treatment, including formulas for the spline basis functions. Other choices such as the truncated polynomial basis (see Ruppert, Wand, and Carroll, section 3.7) span the same space but the resulting design matrix might be highly collinear. The regression splines are simple, flexible, and parsimonious function estimators. Some large sample theory was derived by Zhou, Shen, and Wolfe (1998), for mild assumptions. The knots are required to be of bounded mesh ratio, that is, the ratio of the maximum gap between the knots to the minimum is bounded, and the maximum difference in gaps is o(k 1 ). The design points must be distributed according to some continuous density that is positive on (0, 1). They derived the asymptotic bias and variance of the regression splines, the asymptotic normality of the estimates, and showed that if the number of knots grows as n 1/(2p+3), the optimal point-wise convergence rate f(x) f(x) = O p (n (p+1)/(2p+3) ) is attained. In practice, the splines are sensitive to knot number and placement. Penalized splines are 3

4 (a) (b) x x x x x x x x x x x x Figure 1: B-spline basis functions with k = 5 knots (marked with ) equally spaced in (0, 1) (a) Piecewise quadratic; (b) piecewise cubic. more robust and the user choices are simplified to a single penalty parameter, which may be determined through cross-validation methods. The P -splines of Eilers and Marx (1996) use many knots equally spaced in (0, 1), and penalize the qth order differences in the sizes of the coefficients of adjacent B-splines. The penalized sum of squares is 2 n m m y i b j δ j (x i ) + λ ( q b j ) 2 (2) i=1 j=1 j=q+1 where 1 b j = b j b j 1 and q b j = q 1 b j for q > 1. The expression to minimize may be written in vector form: ψ(b; y) = b (B B + λd D)b 2y Bb, (3) where D is the qth order difference matrix, and the coefficient estimate is found via a weighted projection onto a linear subspace: b = (B B + λd D) 1 B y. The estimate of µ, where µ i = f(x i ), is µ = B b, and the effective degrees of freedom (edf) of the model (see Hastie and Tibshirani 1990, chapter 5) is the trace of B(B B + λd D) 1 B. For λ = 0 this is m, the number of columns of B, and as λ gets larger, the edf approaches min(p, q 1). 4

5 Claskens, Krivobokova, and Opsomer (2009) investigated the asymptotic properties of the penalized splines. They defined the estimator in more generality as the minimizer of 2 n m (q) 2 1 y i b j δ j (x i ) m + λ b i=1 j=1 0 j δ j (x) dx, j=1 and noted that the simplifications in (2) do not influence the asymptotic properties of the estimator. For degree of spline p and degree of penalty q, they demonstrate that the asymptotic behavior of the penalized spline can be similar to that of the regression spline if the number of knots grows slowly and similar to that of the smoothing spline for faster-growing number of knots. In particular, the regression spline behavior is seen when the number of knots grows at the rate n 1/(2p+3) and λ = O(n (p+2 q)/(2p+3) ). For smoothing spline behavior, the number of knots grows at the rate n ν, where 1/(2q + 1) ν < 2q/(2q + 1), and λ = O(n 1/(2q+1) ); then the rate ˆf(x) f(x) = O p (n q/(2q+1) ) is attained. The penalty parameter may be chosen through generalized cross-validation (GCV) methods (see Rupert, Wand, and Carroll, 2003, chapter 5), where the GCV for the fit µ with penalty parameter λ is defined as GCV (λ) = ni=1 (y i µ i ) 2 (1 edf/n) 2. (4) Suppose we can assume, in addition to smoothness, that f is monotone or convex. The (unpenalized) monotone spline was introduced by Ramsay (1988) and extended to convex and other shape restrictions by Meyer (2008). The quadratic I-spline basis functions, together with the constant function, span the space of piecewise quadratic splines with the given knots, and have the additional property that the linear combination is monotone if and only if the coefficients of the basis functions are non-negative. The cubic C-splines can be constructed so that the basis functions, together with the constant function and the identity function, span the space of piecewise cubic functions with the given knots, and the linear combination is convex if and only if the coefficients on the basis functions are nonnegative. 5

6 In the next section, the method for estimating the constrained penalized splines is presented, and it is shown that they have same large-sample properties as the unconstrained penalized splines. In section 3 the GCV is used to test for monotonicity or convexity, and in section 4 some confidence interval methods are presented. Extensions to the partial linear model, the generalized regression model, and the varying coefficient model are considered in section 5. Several examples of applications are given in section 6, demonstrating the broad utility of the methods. 2 Penalized Shape-Constrained Splines The B-spline bases are used for penalized monotone or convex regression. For the monotone case, we use the quadratic splines and define the k m matrix of slopes at the knots S by S ij = δ j (t i). Then the linear combination m j=1 b j δ j (x) is non-decreasing if and only if the coefficient vector is in the set C = {b : Sb 0} IR m. (5) (Note that this result can not hold for cubic or higher-degree splines, where the spline function may be increasing at the knots but decreasing between, and there is no set of linear constraints on b that guarantees monotonicity.) For the convex case, we use cubic splines and S ij = δ j (t i); then the linear combination is convex if and only if Sb 0. (For degree four or higher, there is no set of linear constraints on b that guarantees convexity.) Note that S is k (k + 1) for the monotone case and k (k + 2) for the convex case, and both are full row-rank. The extension to monotone and convex restrictions requires a (k + 1) (k + 2) matrix S, containing the k rows from the convex constraints and an additional row to guarantee a positive slope at the left hand side of the interval. Clearly, variations such as decreasing and concave, or sigmoidal with known point of inflection require simple changes to the matrix S. The set C is a convex polyhedral cone, and the problem of minimizing (3) for b C is a quadratic programming 6

7 problem. A routine such as the R function qpsolve might be used to find the coefficients of the basis functions. The problem may be transformed into a cone projection. Let LL be the Cholesky decomposition of B B + λd D, then define φ = L b and z = L 1 B y, so that (3) can be written as φ z 2 and the constraints are Aφ 0, where A = S(L ) 1. The minimizer ˆφ is the projection of z onto the cone C = {φ : Aφ 0} IR m, and subsequently ˆb = (L ) 1 ˆφ. Some necessary background for cone projection is given next; more details and proofs of the following claims may be found in Meyer (2003) or Silvapulle and Sen (2005, chapter 3). To characterize the constrained solution and estimate the effective degrees of freedom of the model, the dimension of the face of the cone containing the solution is determined as follows, for the case where A is k m and full row-rank. (All cases discussed in this paper involve full row-rank constraint matrices. However, if A is not full row-rank but irreducible, the following characterizations may still be given using the methods of Meyer (1999) for constrained estimation with more constraints than dimensions.) First, the edges or generators of C must be found. Let e 1,..., e m k span the null space of A, and let à be the square, nonsingular matrix where the first k rows are the rows of A and the last rows are the e vectors. Then the first k columns of à 1 are the edges γ 1,..., γ k of the cone, and the cone can be written as m k C = φ : φ = a j e j + j=1 k c j γ j, for c j 0, j = 1,..., k. (6) i=1 The projection of z onto C lands on a face of the cone, where the 2 k faces are indexed by the collection of sets J {1,..., k}, and are defined by m k F J = φ : φ = a j e j + c j γ j, for c j > 0, j J. j=1 j J Note that the interior of the cone C is a face with J = {1,..., k}, and the linear space spanned 7

8 by the e j vectors is a face with J =. Further, the faces partition C. The cone projection algorithm determines the face F J on which the projection falls; then the solution coincides with the ordinary least-squares projection onto the linear space spanned by the e j vectors and the edges γ j for j J. To compute the edf associated with the constrained penalized fit, let J be the matrix whose columns are the e vectors and the γ j vectors for j J. Then the constrained estimate of µ is ˆµ = B(L ) 1 J ( J J ) 1 JL 1 B y =: P J y, and hence the edf is the trace of P J. Note that if J = {1,..., k}, i.e., all edges are used, then J ( j J ) 1 J is the identity matrix, and the constrained edf is identical to the unconstrained edf. This happens when the unconstrained spline satisfies the constraints; otherwise, the edf for the constrained version is smaller. The edf for the constrained version is a random quantity with k + 1 possible values, the largest of which is that of the unconstrained version. A cross-validation scheme for the constrained version considers n CV (λ) = [y i ˆf ( i),λ (x i )] 2, i=1 where ˆf ( i),λ is the estimated regression function using the edge vectors with the ith observation removed (the knots stay the same). If the index set J of edges used by this projection is the same as that using all the data, then y i ˆf ( i),λ (x i ) = y i ˆf λ (x i ) 1 h i, where h i is the ith diagonal element of the projection matrix for the face indexed by J and ˆf λ is the estimated regression function using all the data. Replacing h i with the average over i gives the GCV in the form (4). Although the set J may vary when an observation is ignored, particularly for points of high leverage, the approximation is useful for the selection of λ and the check of the shape assumptions in the next section. 8

9 The penalized regression spline fits to (1) where cov(ɛ) = A for known positive-definite A is a simple extension of either the constrained or unconstrained model. The expression to be minimized is ψ(b; y) = b (B A 1 B + λd D)b 2y A 1 Bb, which can be accomplished as above with the Cholesky decomposition LL = B A 1 B + λd D and z = L 1 BA 1 y. This extension is important for correlated-errors models, for heteroskedastic data, and for the iteratively re-weighted least-squares methods such as for generalized regression. The estimators are more robust to knot choices when shape restrictions are employed, especially if convex or concave constraints are appropriate. The restriction to the cone allows for all the flexibility within the constraints, without allowing the wiggling associated with over-fitting in unconstrained non-parametric function estimation. To illustrate, a data set of size n = 50 is used to compare quadratic regression spline fits in Figure 2. In plot (a) the unconstrained, unpenalized spline fits with 3, 4, 5, and 6 equally spaced knots (having edf=4, 5, 6, and 7) are the most variable. The penalized spline fits, each with 10 knots placed evenly in the span of x-values, are shown in plot (b), with the choices of penalty parameter that coincide with 4, 5, 6, and 7 edf. Next, suppose that the relationship must be monotone increasing. In plot (c) we see the unpenalized, constrained splines, again with 3, 4, 5, and 6 knots, and finally, in plot (d) the penalized constrained splines (with the same penalty parameters as in (b)) are seen to vary the least with edf, i.e., are most robust to user choices. Next we show that the constrained penalized splines have the same convergence rates as the unconstrained penalized splines, if f (x) 0 over [0, 1], under the following assumptions: A1. The regression function f is in C p+1 [0, 1]. A2. The errors ɛ i are uncorrelated with zero mean and common variance σ 2. A3. There is a constant M such that for δ min = min i=1,...,k 1 (t i+1 t i ) and δ max = max i=1,...,k 1 (t i+1 9

10 (a) (b) (c) (d) Figure 2: Spline fits to simulated data. Each plot shows curves using 4 (black), 5 (red), 6 (green), and 7 (blue) equivalent degrees of freedom. (a) unconstrained, unpenalized (b) unconstrained, penalized, (c) constrained, unpenalized, and (d) constrained, penalized t i ), then δ max /δ min M for all n, and δ max = o(k 1 ). A4. There is a distribution function design Q with positive density so that sup x [0,1] Q n (x) Q(x) = o(1/k) where Q n is the empirical distribution function for x 1,..., x n. A5. The number of knots K = o(n). The Kuhn-Tucker conditions (see Silvapulle and Sen (2005), Appendix 1) for the unweighted projection ˆφ minimizing z φ 2 over φ C are z ˆφ, ˆφ = 0 and z ˆφ, φ 0, for all φ C. These are easily seen to be equivalent to y ˆµ, ˆµ = λˆbd Dˆb and y ˆµ, µ λˆbd Db, for all µ = Bb, b C. For the unconstrained estimator b, we have y µ, µ = λ bd Db, for all µ = Bb, b C. Let b 0 be the minimizer of ψ(b; µ) for ψ defined in (3), so that µ 0 = Bb 0 is the penalized spline 10

11 fit to the true function values, and suppose Sb 0 0, so that the shape restrictions hold. Then µ µ 0 2 = µ ˆµ 2 + ˆµ µ µ ˆµ, ˆµ µ 0 = µ ˆµ 2 + ˆµ µ y µ, ˆµ µ y ˆµ, ˆµ µ 0 µ ˆµ 2 + ˆµ µ λ( b ˆb) D D(ˆb b 0 ). From the properties of B-splines, we have µ ˆµ 2 = b ˆb 2 O p (n/k), so that we have µ µ 0 2 ˆµ µ b ˆb 2 [O p (n/k) + 2λD D(ˆb b 0 )] Using the banded nature of D and ˆb j = O p (1) for all j, and letting λ and k grow at the optimal rates given in section 1 (for either regression spline rates or smoothing spline rates), we have that the convergence rate for the constrained penalized splines is at least that for the unconstrained versions. To compare the constrained and unconstrained fits for small samples, simulations were conducted for two choices of regression function, edf, and sample size. The x-values are equally spaced in (0, 1) and unit model variance is used. The function f(x) = 1.5(2x 1) 3 is flat in the middle and steep on either end, while f(x) = log(2x +.1) is increasing rapidly at the left end of the interval, and flattens out at the right end. For the latter, monotone and concave constraints were also considered. The penalty parameters for the n = 50 case are chosen so that the edf values for the unconstrained estimator are 4 and 7, while the values for n = 100 are so that the edf is 5 and 8. In addition, we compute the constrained and unconstrained fits using the minimum-gcv value of the penalty parameter, where the range for n = 50 is so that the unconstrained edf varies from 3.5 to 8 by half-steps, and for n = 100, the range is The minimum-gcv value of the penalty parameter is typically smaller for the constrained case, because the constrained estimator allows for a high degree of flexibility without introducing the wiggling that is typically associated with over-fitting in non-parametric regression. In 11

12 f(x) = 1.5(2x 1) 3 f(x) = log(2x.1) n k edf monotone unconstr mon-conc monotone unconstr GCV GCV Table 1: SASEL for constrained and unconstrained penalized splines. The model variance is unity, the x values are equally spaced in (0, 1), and N = 10, 000 datasets per cell. particular, with the constrained estimator, one can decrease the penalty parameter without (necessarily) increasing the effective degrees of freedom of the fit, because the face of the cone on which the projection lands may be of the same small dimension as for the fit with larger penalty. The computations here and in the rest of the paper use q = 3, so that the fit for larger penalty parameter tends toward a cubic polynomial. The square root of the average squared error loss (SASEL) for the rth data set was computed as SASEL r = [( 1 n )] n 1/2 (ˆµ ri µ i ) 2, i=1 and the average over 10,000 data sets is reported in Table 1. In each case, the more stringent constraints provide smaller SASEL, with bigger improvements shown for larger edf. Comparisons of the penalized spline and the monotone B-spline of He and Shi (1998) give the expected results that the former has smaller MSE for normal errors, and the latter has smaller MSE for double-exponential errors. Simulations results for SASEL are given in Table 2 for two regression functions and two sample sizes. The method of He and Shi was coded using the R function lpcdd for the linear programming; there are m + 2n variables with m 1 + 3n linear equality and inequality constraints; for larger sample sizes the routine did not always converge. 12

13 f(x) = 1.5(2x 1) 3 f(x) = log(2x.1) n edf CPS HS CPS HS Table 2: SASEL for constrained penalized splines (CPS) compared with the monotone B- splines of He and Shi (HS). The model variance is unity, the x values are equally spaced in (0, 1), and N = 10, 000 datasets per cell. 3 Check for Validity of Shape Assumptions The shape assumptions often fall into the category of a priori knowledge, but occasionally the research question might concern the shape. The classic formal hypothesis test of monotone versus non-monotone is presented in Robertson, Wright, and Dykstra (1988), which tests H 0 :f is monotone versus H a :f is not monotone, using unsmoothed monotone regression. Their test statistic has the distribution of a mixture of beta random variables, if the true function is constant. For f not constant, the test can be badly biased. Bowman, Jones, and Gijbels (1998) formulate a test using kernel regression, where they define the critical bandwidth to be the smallest for which the fit is monotone. Using the errors from this fit, a bootstrap method is used for the p-value. The test proposed by Ghosal, Sen, and van der Vaart (2000) uses a locally-weighted Kendall s tau statistic to determine if there any decreasing portions of the regression function. Hall and Heckman (2000) compute a running derivative by fitting lines over portions of the data, and their test statistic is the minimum slope. Juditsky and Nemirovski (2002) formulate a general problem of testing whether the signal in a Gaussian random process is contained in a convex cone. A simple method to check for the validity of the shape assumptions is proposed here, based on the generalized cross-validation (GCV) choices of the penalty parameters. We choose a set of 13

14 Figure 3: (a) The GCV values for the data set of Figure 2, where the black circles are for the constrained version and the red triangles for the unconstrained splines. The method chooses the unconstrained version with edf=7.5, shown in plot (b) along with the best constrained fit. The dashed curve is the true regression function. possible penalty parameters, and compute the constrained and unconstrained penalized spline fits and the GCV for each parameter value. If the minimum GCV value for the constrained splines is smaller than that for the unconstrained splines, that is evidence that the shape restrictions hold. Conversely, if the smallest GCV value is for an unconstrained spline, then we conclude that there is evidence against the shape restriction. In the case of a tie, we decide in favor of the shape restrictions, because a tie often indicates that the unconstrained fit satisfies the constraints. The procedure is illustrated in Figure 3, using a data set with n = 50 generated from the regression function shown as the dash curve in plot (b). Ten penalty parameters were used, chosen so that the edf values for the unconstrained fit are 3.5, 4.0,..., 8.0, and 10 knots equally spaced over the range of the MRI values. The GCV values are shown in plot (a), where the values for the constrained fit are shown as circles and those for the unconstrained fit are triangles. The winner is the unconstrained spline with penalty parameter chosen to coincide with edf=7.5. The minimum-gcv constrained and unconstrained fits are shown in plot (b). Although this method is not a formal hypothesis test, we can show that it has nice large- 14

15 sample properties. First, suppose that the true regression function is strictly increasing, so that f (x) ξ > 0 for x [0, 1], and suppose λ and k grow with n at the rates necessary to obtain the optimal convergence rates. Then the probability that f (t j ) > 0 for all j = 1,..., k is P (S b 0) P (S( b b 0 ) ξ1), which goes to one as n increases, by the consistency of the unconstrained estimator b. Therefore, the constraints are asymptotically unbinding, and the probability that the GCVs are identical (and hence the constrained version is chosen) goes to one. Next, suppose that the true regression function is decreasing over an interval, so that f (x) ξ < 0 for x [a, b] where 0 a < b 1. Then SSE/n for the unconstrained version goes to σ 2, but SSE/n for the constrained version goes to σ 2 + c, where c (b a) 2 ξ/4. The limit of the numerator of the GCV, divided by n, is larger by c for the constrained version, while the denominators both tend to one, so that the probability that the GCV for the unconstrained fit is smaller (and the unconstrained version is chosen) goes to one. The small-sample performance of the procedure is illustrated through simulations. The percents of times the correct choice is made are provided in Table 2, for four choices of regression function and three sample sizes, where 10,000 data sets were generated for each combination, and model variance is σ 2 = 1. For the smallest sample size n = 50, the 10 penalty parameters are chosen so that the edf for the unconstrained fit ranges from 3.5 to 8. For n = 100, the range is 4 to 8.5, and for n = 200, the range is 4.5 to 9. The regression functions are chosen from the Bowman, et al. (1998) paper and the Ghosal, et al. (2000) paper. For the former, f a (x) = 1 + x a exp{ 50(x.5) 2 } for a =.15,.25, and.45. Note that f.15 is increasing over the range [0, 1], f.25 has a mild dip, and f.45 has a more pronounced dip. That paper uses σ =.1 for their simulations; we use σ = 1 and 10f a. Their rates of correct choice for a =.15 are 97.3, 98.2, and 99.2, for n = 50, n = 100, and n = 200, respectively. For a =.25, their rates are 8.8, 10.0, and 17.4 percent correct choices, and for a =.45, their rates are 37.4, 54.4, and 87.4 percent. Ghosal, et al. 15

16 percent correct choice f n = 50 n = 100 n = f f f µ Table 3: Simulations for the GCV method for determination of monotonicity. The model variance is unity, N = 10, 000 datasets per cell. Entries are percents of correct choices. (2000) defined 10(x.5) 3 exp{ 1000 x.25) 2 } for x [0,.5) µ 4 = 0.1(x 0.5) exp{ 100(x 0.25) 2 }, for x [.5, 1], which has a very sharp dip followed by a flat section. In that paper they use σ =.1; here we use σ = 1 and 3µ 4. The signal-to-noise ratio for our simulations is over three times smaller, but their rates of correct choices are lower: 2.6, 8.0, and 75.7, for n = 50, 100, and 200. The method proposed here consistently has higher power than the two competitors, for examples chosen for those methods. The advantage is that the new method does not require the choice of a single penalty parameter or bandwidth, but allows the flexibility of the fit to be chosen by the data. Therefore, the sharp dip of µ 4 and the more regular variation of f a are fit equally well. 4 Extensions The partial linear model is a simple extension. An arbitrary number of covariates to be modeled parametrically can be included in the model, so that y i = z ia + f(x i ) + σɛ i (7) 16

17 for z i IR r, where again the ɛ i are iid random variables with mean zero and unit variance. Let the n r matrix Z be the design matrix for the covariates, and suppose that its columns and the x vector form a linearly independent set, and further suppose that the 1 vector is not in the column space of Z. Estimates of E(y) will be of the form Za + Bb = W c, where the first r columns of W are the covariate vectors and the last m are the B-spline basis vectors. The constraint Sc 0 is again imposed, where now the first r columns of the k (r + m) matrix S contain zeros. The cone projection proceeds as described in section 2, except now the null space of A is of dimension m + r k, so that there are more unconstrained dimensions in the constraint set C. The ability to add covariates to a model is important if there is a possible confounding variable or to account for another source of variation. For the generalized regression problem, the responses y i are independent observations from a distribution written in the form of an exponential family: f(y i ) = exp{[y i θ i b(θ i )]/φ 2 c(y i, τ)}, where the specifications of b and c determine the sub-family of models. Common examples are b(θ) = log(1 + e θ ) for the Bernoulli and b(θ) = exp(θ) for the Poisson model. The vector µ is defined as E(y), and µ i = b (θ i ). The variance of y i is b (θ i )τ, and the variance function is written in terms of the mean as var(y i ) = V (µ i ). The mean vector is related to a design matrix of predictor variables through a link function g(µ i ) = η i ; for the generalized linear model, η i = x i β, where x i is the ith row of the design matrix and β is the parameter vector to be estimated. Here, we let η i = f(x i ) + z iα where f has known shape and degree of smoothness, and z i is an r-vector of co-variates. It is often natural to impose monotonicity restrictions on the mean function; for example, the probability of cancer in a lab rat might be increasing with amount of carcinogen administrered, or the expected count of water bird nests at a lake might be known to be decreasing with amount of boat traffic. If the link function is one-to-one, this is equivalent to constraining the function component f to be monotone. 17

18 The algorithm involves iteratively re-weighted quadratic programming problems, following the same ideas for the generalized linear model as found in McCullagh & Nelder (1989). Starting with η 0 C, the estimate η k+1 is obtained from η k by constructing a data vector u k where where µ k i u k i = η k i + (y i µ k i ) ( ) dη, dµ ki = g 1 (η k i ) and the derivative of the link function is evaluated at µk i. Weights w i are defined by 1/wi k = (dη/dµ) 2 k V k. The estimate η k+1 is obtained as the weighted penalized constrained regression spline with data u k ; i.e, the weighted projection of u k onto the cone defined by the study design. The algorithm is guaranteed to converge to the penalized maximum likelihood estimator; the proof is similar to the convergence proof in Meyer and Woodroofe (2004). The GCV may be used as check for the validity of shape assumptions, where minus two times the log likelihood is used in place of the sum of squared residuals, and the edf for the constrained version is computed according to the penalty parameter and the dimension of the cone face containing the solution. The varying coefficient linear regression model was introduced by Hastie and Tibshirani (1993). A simple version is y i = β 0 (u i ) + β 1 (u i )x i + ɛ i, where the functions β 0 and β 1 are assumed to be smooth. Here we impose a shape such as monotonicity or convexity on either or both coefficient functions. We define basis functions δ j (u), j = 1,..., m, over the range of u values, and create the matrix B with columns δ 1,..., δ m, δ 1 x,..., δ m x. The 2m 2m penalty matrix D is block diagonal with identical m m qth-order difference matrices. The constraint matrix S is constructed according to the desired shapes for β 0 and β 1. Then the estimation and inference methods of the previous sections can be implemented. 18

19 (a) constrained unconstrained (b) constrained unconstrained Figure 4: Analysis of British coal mine accident data. (a) Counts of accidents per year, along with estimated mean functions. The dashed curve is the penalized quadratic spline with thirdorder penalties, with penalty parameter chosen so that the effective degrees of freedom of the model is 6.5. The solid curve is the constrained version with the same penalty parameter. (b) Generalized cross-validation values for nine values of penalty parameter, chosen so that the unconstrained spline fits have edf ranging from 3.5 to Examples and Discussion The British coal mine accident data are used to illustrate the method of validation of the monotonicity assumption for count data. These data were used in the Eilers and Marx (1996) paper on penalized splines. Counts of serious mine accidents per year from 1850 to 1961 are shown in Figure 4(a). Suppose a historian believes that the risk of accident is decreasing over the time span, and the spike in the 1930s can be explained by random chance. Is there evidence in the data to contradict this belief? Constrained and unconstrained penalized quadratic spline fits were computed for penalty parameters chosen so that the effective degrees of freedom for the unconstrained spline increases from 3.5 to 7.5 by half-steps. The GCV values are shown in plot (b); the minimum value is attained by the constrained version, so we find no evidence that the historian is incorrect. The ability to include co-variates when modeling the relationship between the response and the main predictor of interest is important when there may be a confounding effect, or simply 19

20 (a) (b) GCV constrained unconstrained log(survival time) present absent edf, unconstrained fit log(white blood cell count) Figure 5: (a) GCV values for constrained and unconstrained fits to the leukemia data. (b) The solid curves are the constrained fit with penalty parameter chosen so that the unconstrained fit has edf=6. The dashed curves are the fit with edf=4. to account for another source of variation. For example we use the leuk data set from the MASS library in R, containing survival times and white blood cell counts of 33 leukemia patients. A two-level covariate records the presence or absence of a morphologic characteristic of the blood cells. These data are depicted in Figure 5(b), where the log of the white blood cell count is plotted against the log of the survival time, and the presence or absence of the characteristic is indicated by the plot character. Plot (a) contains the GCV values for fits constrained to be decreasing in the continuous predictor, and for the unconstrained fits. For the smallest effective degrees of freedom, the unconstrained and constrained fits coincide, but the minimum GCV occurs for the constrained fit at the fourth largest penalty parameter. The best fit is shown as the solid curves in plot (b). The dashed curves are the smallest-gcv unconstrained fit. These are decreasing, but oversmoothed compared to the best constrained fit. For an example of the varying coefficient model, consider data from a lawsuit from the early 1990s. The United Auto Workers sought pay equity adjustments for civil service jobs in the state of Michigan (Killingsworth (2002)), claiming that women were systematically underpaid by the state based on comparable worth arguments. Both parties agreed to the appointment of a committee to assign points to civil service job categories, meant to represent worth. 20

21 (a) (b) (c) log(wages) log(wages) log(wages) percent women points points Figure 6: (a) Wages versus percent women for UAW data; (b) wages versus points; (c) fit to varying coefficient model for 0, 30, 50, 70, and 100% women. The data set then consists of three variables for each of 176 categories in Michigan civil service jobs: the points assigned by the committee, the percent of women in the job category, and the maximum wages (in dollars per hour) for the category. The data are shown in Figure 6(a) and (b), with wages plotted against each of the two predictors. The wages appear to be a decreasing function of percent of women employees, but perhaps the job categories with more women have, on average, lower points. As seen in plot (b), wages rise sharply with increasing points, with a curvature that could be accounted for if the slope and intercept of the linear relationship depends on the percent women. The model is y i = β 0 (u i ) + β 1 (u i )x i + ɛ i, where y i is wages for the ith job category, x i is the points, and u i is percent women. Interest is in whether the slope function is decreasing with percent of women in the job category. The function β 0 is unconstrained. Penalty parameters that result in between 6 and 12.5 effective degrees of freedom for the unconstrained fit are chosen, and the GCV scores for the constrained and unconstrained fits are shown in Figure 7(a). The dotted line near the top indicates the GCV for the constant-slope model with varying intercept, indicating that the varying slope function provides a better fit. The constrained and unconstrained fits coincide for the seven largest penalty parameters, and 21

22 GCV score (a) intercept estimate (b) slope estimate (c) edf, unconstrained percent women percent women Figure 7: (a) The GCV values for the constrained and unconstrained slope functions are identical for the first seven values, and the smallest GCV occurs for an edf of 6.5. (b) The estimated intercept function for edf=6.5, and (c) the estimated slope function. the GCV for the unconstrained fit is increasing past where they diverge. The minimum GCV fits for the intercept and slope functions are shown in plots (b) and (c). Finally, example fits to the scatterplot are shown in Figure 6(c), where the lines correspond to 0, 30, 50, 70, and 100% women, from the top to the bottom on the right. In conclusion, we have found that regression spline fits are more robust to user-defined choices when shape constraints are imposed. The constrained, penalized splines are computed via a weighted projection onto a polyhedral convex cone, which is a subset of the model space for the unconstrained penalized splines. All the flexibility of the penalized splines are retained, within the bounds of the shape restrictions. The convergence rates for the constrained versions are the same as for the unconstrained versions, but in practice, the constrained fits tend to have smaller squared error loss than the unconstrained fits, as well as more robustness to penalty parameter choice. The GCV method of choosing the penalty parameter provides a simple way to determine if the shape assumptions are valid; this is not a formal hypothesis test but has been shown to have good large-sample properties and compares well to two competing methods. A straightforward extension to the additive models has been given, where the fit is obtained through a 22

23 single cone projection; no back-fitting or two-step procedures are necessary. The ability to account for co-variates in data analysis is important, to check for possible confounding effects and to model another source of variation. The generalized regression model and the varying-coefficient model are also simple extensions, and co-variates are easily included in the model. User-friendly R-code for all methods presented in this paper can be found at meyer/penspl.htm. References [1] Bowman, Jones, and Gijbels (1998) Testing monotonicity of regression. Journal of Computational and Graphical Statistics 7(4) [2] de Boor, C. (2001) A Practical Guide to Splines, revised edition. Springer, New York. [3] Claeskens G., Krivobokova T., and Opsomer J.D. (2009) Asymptotic properties of penalized spline estimators. Biometrika 96(3) [4] Eilers, P.H.C. and Marx, B.D. (1996) Flexible smoothing with B-splines and penalites. Statistical Science 11(2) [5] Friedman J. and Tibshirani R. (1984) The Monotone Smoothing of Scatterplots, Technometrics 26(3), [6] Ghosal, Sen, and van der Vaart (2000) Testing monotonicity of regression. Annals of Statistics 28(4) [7] Hall P. and Heckman N.E. (2000) Testing for monotonicity of a regression mean by calibrating for linear functions. Annals of Statistics 28(1) [8] Hastie, TJ and Tibshirani, RJ (1990). Generalized Additive Models Chapman & Hall, London. 23

24 [9] Juditsky A. and Nemirovski A. (2002) On nonparametric tests of positivity/ monotonicity/ convexity. Annals of Statistics 30(2) [10] Killingsworth, M.R. (2002) Comparable Worth and Pay Equity: Recent Developments in the United States. Canadian Public Policy Vol 28. [11] Mammen, E. (1991) Estimating a Smooth Monotone Regressin Function, Annals of Statistics 19(2) [12] Mammen, E. and Thomas-Agnan (1999) Smoothing Splines and Shape Restrictions, Scandinavian Journal of Statistics [13] McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models, Second Edition Chapman & Hall, New York. [14] Meyer, M.C. (1999) An Extension of the Mixed Primal-Dual Bases Algorithm to the Case of More Constraints than Dimensions, Journal of Statistical Planning and Inference 81, pp [15] Meyer M.C. (2003) A test for linear versus convex regression function using shaperestricted regression, Biometrika, 90(1) [16] Meyer M.C. and Woodroofe, M (2004) Estimation of a Unimodal Density using Shape Restrictions, Canadian Journal of Statistics 32(1), [17] Meyer, M.C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3) [18] Ramsay, J. O. (1988) Monotone regression splines in action, Statistical Science, 3(4) [19] Robertson, T., Wright, F. T., and Dykstra, R. L. (1988) Order Restricted Statistical Inference John Wiley & Sons, New York 24

25 [20] Ruppert, D., Wand, M. P., and Carroll, R. J. (2003) Semiparametric Regression, Cambridge Series in Statistical and Probabilistic Mathematics. [21] Silvapulle, M.J. and Sen, P.K. (2005) Constrained Statistical Inference John Wiley & Sons, New York [22] Tantiyaswasdikul, C. and Woodroofe, M. (1994) Isotonic Smoothing Splines under Sequential Designs, Journal of Statistical Planning and Inference [23] Utreras, F.I. (1985) Smoothing noisy data under monotonicity constraints: existence, characterization, and convergence rates. Numerishe Mathematik [24] Wang X. and Li, F. (2008) Isotonic smoothing splines regression, Journal of Computational and Graphical Statistics 17(1) [25] Zhou, S., Shen, X., and Wolfe, D.A. (1998) Local Asymptotics for Regression Splines and Confidence Regions. Annals of Statistics 26(5)

An Algorithm for Quadratic Programming. with Applications in Statistics

An Algorithm for Quadratic Programming. with Applications in Statistics An Algorithm for Quadratic Programming with Applications in Statistics Mary C. Meyer December 2, 2010 Problems involving estimation and inference under linear inequality constraints arise often in statistical

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

An Algorithm for Quadratic Programming. with Applications in Statistics

An Algorithm for Quadratic Programming. with Applications in Statistics An Algorithm for Quadratic Programming with Applications in Statistics Mary C. Meyer October 27, 2009 Problems involving estimation and inference under linear inequality constraints arise often in statistical

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

Non-Parametric and Semi-Parametric Methods for Longitudinal Data

Non-Parametric and Semi-Parametric Methods for Longitudinal Data PART III Non-Parametric and Semi-Parametric Methods for Longitudinal Data CHAPTER 8 Non-parametric and semi-parametric regression methods: Introduction and overview Xihong Lin and Raymond J. Carroll Contents

More information

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Al Kadiri M. A. Abstract Benefits of using Ranked Set Sampling (RSS) rather than Simple Random Sampling (SRS) are indeed significant

More information

davidr Cornell University

davidr Cornell University 1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

MATH3016: OPTIMIZATION

MATH3016: OPTIMIZATION MATH3016: OPTIMIZATION Lecturer: Dr Huifu Xu School of Mathematics University of Southampton Highfield SO17 1BJ Southampton Email: h.xu@soton.ac.uk 1 Introduction What is optimization? Optimization is

More information

arxiv: v1 [stat.me] 2 Jun 2017

arxiv: v1 [stat.me] 2 Jun 2017 Inference for Penalized Spline Regression: Improving Confidence Intervals by Reducing the Penalty arxiv:1706.00865v1 [stat.me] 2 Jun 2017 Ning Dai School of Statistics University of Minnesota daixx224@umn.edu

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Lecture VIII. Global Approximation Methods: I

Lecture VIII. Global Approximation Methods: I Lecture VIII Global Approximation Methods: I Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Global Methods p. 1 /29 Global function approximation Global methods: function

More information

Polynomials tend to oscillate (wiggle) a lot, even when our true function does not.

Polynomials tend to oscillate (wiggle) a lot, even when our true function does not. AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 2: Spline Approximations Dianne P O Leary c 2001, 2002, 2007 Piecewise polynomial interpolation Piecewise polynomial interpolation Read: Chapter 3 Skip:

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September

More information

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K. A toolbo of smooths Simon Wood Mathematical Sciences, University of Bath, U.K. Smooths for semi-parametric GLMs To build adequate semi-parametric GLMs requires that we use functions with appropriate properties.

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,

More information

100 Myung Hwan Na log-hazard function. The discussion section of Abrahamowicz, et al.(1992) contains a good review of many of the papers on the use of

100 Myung Hwan Na log-hazard function. The discussion section of Abrahamowicz, et al.(1992) contains a good review of many of the papers on the use of J. KSIAM Vol.3, No.2, 99-106, 1999 SPLINE HAZARD RATE ESTIMATION USING CENSORED DATA Myung Hwan Na Abstract In this paper, the spline hazard rate model to the randomly censored data is introduced. The

More information

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical

More information

CS 450 Numerical Analysis. Chapter 7: Interpolation

CS 450 Numerical Analysis. Chapter 7: Interpolation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

A fast Mixed Model B-splines algorithm

A fast Mixed Model B-splines algorithm A fast Mixed Model B-splines algorithm arxiv:1502.04202v1 [stat.co] 14 Feb 2015 Abstract Martin P. Boer Biometris WUR Wageningen The Netherlands martin.boer@wur.nl February 17, 2015 A fast algorithm for

More information

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR J. D. Maca July 1, 1997 Abstract The purpose of this manual is to demonstrate the usage of software for

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Daniel K. Heersink 1, Reinhard Furrer 1, and Mike A. Mooney 2 arxiv:1302.4631v3

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

SPLINE APPROXIMATION VIA THE CONTROL POLYGON

SPLINE APPROXIMATION VIA THE CONTROL POLYGON SPLINE APPROXIMATION VIA THE CONTROL POLYGON by Håkon Mørk THESIS for the degree of MASTER S DEGREE IN APPLIED MATHEMATICS AND MECHANICS (Master i Anvendt matematikk og mekanikk) Faculty of Mathematics

More information

Median and Extreme Ranked Set Sampling for penalized spline estimation

Median and Extreme Ranked Set Sampling for penalized spline estimation Appl Math Inf Sci 10, No 1, 43-50 (016) 43 Applied Mathematics & Information Sciences An International Journal http://dxdoiorg/1018576/amis/10014 Median and Extreme Ranked Set Sampling for penalized spline

More information

Algorithms for convex optimization

Algorithms for convex optimization Algorithms for convex optimization Michal Kočvara Institute of Information Theory and Automation Academy of Sciences of the Czech Republic and Czech Technical University kocvara@utia.cas.cz http://www.utia.cas.cz/kocvara

More information

Fitting latency models using B-splines in EPICURE for DOS

Fitting latency models using B-splines in EPICURE for DOS Fitting latency models using B-splines in EPICURE for DOS Michael Hauptmann, Jay Lubin January 11, 2007 1 Introduction Disease latency refers to the interval between an increment of exposure and a subsequent

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

Statistical Modeling with Spline Functions Methodology and Theory

Statistical Modeling with Spline Functions Methodology and Theory This is page 1 Printer: Opaque this Statistical Modeling with Spline Functions Methodology and Theory Mark H Hansen University of California at Los Angeles Jianhua Z Huang University of Pennsylvania Charles

More information

Nonparametric Frontier Estimation: The Smooth Local Maximum Estimator and The Smooth FDH Estimator

Nonparametric Frontier Estimation: The Smooth Local Maximum Estimator and The Smooth FDH Estimator Nonparametric Frontier Estimation: The Smooth Local Maximum Estimator and The Smooth FDH Estimator Hudson da S. Torrent 1 July, 2013 Abstract. In this paper we propose two estimators for deterministic

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

A General Greedy Approximation Algorithm with Applications

A General Greedy Approximation Algorithm with Applications A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling Moritz Baecher May 15, 29 1 Introduction Edge-preserving smoothing and super-resolution are classic and important

More information

Supplementary Material for A Locally Linear Regression Model for Boundary Preserving Regularization in Stereo Matching

Supplementary Material for A Locally Linear Regression Model for Boundary Preserving Regularization in Stereo Matching Supplementary Material for A Locally Linear Regression Model for Boundary Preserving Regularization in Stereo Matching Shengqi Zhu 1, Li Zhang 1, and Hailin Jin 2 1 University of Wisconsin - Madison 2

More information

Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling

Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling Siobhan Everson-Stewart, F. Jay Breidt, Jean D. Opsomer January 20, 2004 Key Words: auxiliary information, environmental surveys,

More information

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica Boston-Keio Workshop 2016. Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica... Mihoko Minami Keio University, Japan August 15, 2016 Joint work with

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response

More information

Generalized additive models I

Generalized additive models I I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often

More information

Linear penalized spline model estimation using ranked set sampling technique

Linear penalized spline model estimation using ranked set sampling technique Hacettepe Journal of Mathematics and Statistics Volume 46 (4) (2017), 669 683 Linear penalized spline model estimation using ranked set sampling technique Al Kadiri M A Abstract Benets of using Ranked

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Improved smoothing spline regression by combining estimates of dierent smoothness

Improved smoothing spline regression by combining estimates of dierent smoothness Available online at www.sciencedirect.com Statistics & Probability Letters 67 (2004) 133 140 Improved smoothing spline regression by combining estimates of dierent smoothness Thomas C.M. Lee Department

More information

Section 4 Matching Estimator

Section 4 Matching Estimator Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis

More information

Mathematical Programming and Research Methods (Part II)

Mathematical Programming and Research Methods (Part II) Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types

More information

60 2 Convex sets. {x a T x b} {x ã T x b}

60 2 Convex sets. {x a T x b} {x ã T x b} 60 2 Convex sets Exercises Definition of convexity 21 Let C R n be a convex set, with x 1,, x k C, and let θ 1,, θ k R satisfy θ i 0, θ 1 + + θ k = 1 Show that θ 1x 1 + + θ k x k C (The definition of convexity

More information

A C 2 Four-Point Subdivision Scheme with Fourth Order Accuracy and its Extensions

A C 2 Four-Point Subdivision Scheme with Fourth Order Accuracy and its Extensions A C 2 Four-Point Subdivision Scheme with Fourth Order Accuracy and its Extensions Nira Dyn School of Mathematical Sciences Tel Aviv University Michael S. Floater Department of Informatics University of

More information

Automatic Fatigue Detection System

Automatic Fatigue Detection System Automatic Fatigue Detection System T. Tinoco De Rubira, Stanford University December 11, 2009 1 Introduction Fatigue is the cause of a large number of car accidents in the United States. Studies done by

More information

Lecture 7: Splines and Generalized Additive Models

Lecture 7: Splines and Generalized Additive Models Lecture 7: and Generalized Additive Models Computational Statistics Thierry Denœux April, 2016 Introduction Overview Introduction Simple approaches Polynomials Step functions Regression splines Natural

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

A Locally Adaptive Penalty for Estimation of. Functions with Varying Roughness.

A Locally Adaptive Penalty for Estimation of. Functions with Varying Roughness. A Locally Adaptive Penalty for Estimation of Functions with Varying Roughness. Curtis B. Storlie, Howard D. Bondell, and Brian J. Reich Date: June 5, 2008 Abstract We propose a new regularization method

More information

Regression on a Graph

Regression on a Graph Regression on a Graph arxiv:0911.1928v1 [stat.me] 10 Nov 2009 Arne Kovac and Andrew D.A.C. Smith Abstract The Signal plus Noise model for nonparametric regression can be extended to the case of observations

More information

A Trimmed Translation-Invariant Denoising Estimator

A Trimmed Translation-Invariant Denoising Estimator A Trimmed Translation-Invariant Denoising Estimator Eric Chicken and Jordan Cuevas Florida State University, Tallahassee, FL 32306 Abstract A popular wavelet method for estimating jumps in functions is

More information

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

Modeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use

Modeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use Modeling Criminal Careers as Departures From a Unimodal Population Curve: The Case of Marijuana Use Donatello Telesca, Elena A. Erosheva, Derek A. Kreader, & Ross Matsueda April 15, 2014 extends Telesca

More information

One-mode Additive Clustering of Multiway Data

One-mode Additive Clustering of Multiway Data One-mode Additive Clustering of Multiway Data Dirk Depril and Iven Van Mechelen KULeuven Tiensestraat 103 3000 Leuven, Belgium (e-mail: dirk.depril@psy.kuleuven.ac.be iven.vanmechelen@psy.kuleuven.ac.be)

More information

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007 : Convex Non-Smooth Optimization April 2, 2007 Outline Lecture 19 Convex non-smooth problems Examples Subgradients and subdifferentials Subgradient properties Operations with subgradients and subdifferentials

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs, GAMMs and other penalied GLMs using mgcv in R Simon Wood Mathematical Sciences, University of Bath, U.K. Simple eample Consider a very simple dataset relating the timber volume of cherry trees to

More information

A Polygonal Line Algorithm for Constructing Principal Curves

A Polygonal Line Algorithm for Constructing Principal Curves A Polygonal Line Algorithm for Constructing s Balázs Kégl, Adam Krzyżak Dept. of Computer Science Concordia University 45 de Maisonneuve Blvd. W. Montreal, Canada H3G M8 keglcs.concordia.ca krzyzakcs.concordia.ca

More information

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects Patralekha Bhattacharya Thinkalytics The PDLREG procedure in SAS is used to fit a finite distributed lagged model to time series data

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software October 2015, Volume 67, Issue 3. doi: 10.18637/jss.v067.i03 Threshold Value Estimation Using Adaptive Two-Stage Plans in R Shawn Mankad Cornell University George Michailidis

More information

BART STAT8810, Fall 2017

BART STAT8810, Fall 2017 BART STAT8810, Fall 2017 M.T. Pratola November 1, 2017 Today BART: Bayesian Additive Regression Trees BART: Bayesian Additive Regression Trees Additive model generalizes the single-tree regression model:

More information

CH9.Generalized Additive Model

CH9.Generalized Additive Model CH9.Generalized Additive Model Regression Model For a response variable and predictor variables can be modeled using a mean function as follows: would be a parametric / nonparametric regression or a smoothing

More information

Lab 2: Support vector machines

Lab 2: Support vector machines Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

2014 Stat-Ease, Inc. All Rights Reserved.

2014 Stat-Ease, Inc. All Rights Reserved. What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation

More information