Abstract R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special c

Size: px

Start display at page:

Download "Abstract R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special c"

Katherine King
5 years ago
Views:

1 R-Splines for Response Surface Modeling July 12, 2000 Sarah W. Hardy North Carolina State University Douglas W. Nychka National Center for Atmospheric Research Raleigh, NC Boulder, CO

2 Abstract R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special case of R-splines. By this broader denition of an R-spline, however, it includes splines in which 2m? d 0 where the traditional roughness penalty is not guaranteed to be non-negative denite. This papers discusses a modication of the roughness penalty that allows for the tting of reduced polynomial null spaces. A series of examples are used to demonstrate the behavior of this modied roughness penalty. Keywords: Thin plate spline, nonparametric, response surfaces, roughness penalty, Demmler-Reinsch basis functions. 1 Introduction Thin plate splines are nonparametric method for tting response surfaces. A thin plate spline is a smooth surface constructed by adding a base, or null space, polynomial component, and a sum of radial basis functions. The thin plate smoothing splines are a solution to a minimization problem and can be interpreted as a natural generalization of the traditional polynomial model estimated by minimizing residual sum of squares. Unlike the traditional formulation, however, the thin plate spline minimization problem also involves a roughness penalty. This roughness penalty is a function of m and d where where m? 1 is the degree of the polynomial null space and d is the number of explanatory variables. To guarantee that the roughness penalty is positive and the function is a true thin plate spline, 2m? d > 0. Thus, the size of the polynomial null space grows very quickly as a function of the number of explanatory variables. Many typical industrial experiments do not have enough observations to t the number of parameters needed for the required null space. Specically, this becomes problematic when there is a large set of explanatory variables and it is desirable to use an all subsets approach to variable selection for thin 2

3 plate spline modeling. R-splines arose in the context of attempting to modify the thin plate spline to allow for a greater number of explanatory variables in the model. R-splines are splines t with a polynomial null space plus the sum of radial basis functions. Clearly, thin plate splines fall into this category. By this broader denition of an R-spline, however, it also includes other splines. Specically, it includes splines in which 2m? d 0. In other words, the polynomial null space is reduced or of a lower order than required by the thin plate spline restraint. When 2m? d 0, the seamless way in which the polynomial function and roughness penalty t together (i.e, the space spanned by the polynomial terms is the null space of the roughness penalty) no longer holds or the relationship is \broken." The term broken spline will be used to indicate this kind of spline where the penalty matrix is chosen to be similar to the thin plate spline. The term R-splines refers to the broader class of splines including both broken and thin plate splines. This paper begins by discussing the thin plate spline roughness penalty. It then presents the broken spline modication to the roughness penalty. The Demmler-Reinsch basis function representation of splines is used to show the role the eigenvalues of the roughness penalty matrix have in the spline solution. A series of four two and three dimensional examples from data sets available in S-PLUS and FUNFITS (Nychka, et. al., 1996), is used to compare the roughness penalties for the thin plate and broken spline. Another ve dimensional example using data collected at Becton Dickinson is also presented. 2 Thin plate spline roughness penalty The roughness functional below, J m (f), will increase in magnitude as a function departs from an m? 1 order polynomial. This roughness functional is: 3

4 Z X m! J m (f) = < d 1!::: d!! m :du 1 1 :::@u d d The sum in the integrand is taken over all non-negative integer vectors,, such that P d = m. Clearly, for m? 1 order polynomials J m (f) = 0, because all m th derivatives are 0. The thin plate spline estimator of f (Wahba, 1990) is the minimizer of the sum of the mean-squared error and the roughness penalty which is weighted by a smoothing parameter. Details on the form of the thin plate spline can be found in the appendix. In matrix form, the thin plate spline estimate is f(x i ) = T + M where T T = 0 and 2m? d > 0; where T is the design matrix of a polynomial model of order m? 1 and M k;i = E(k x i? x k k; m; d). For linear combinations of basis functions that satisfy the constraints, J m (f) = T M > 0. In other words, M is guaranteed to be positive denite. Dening the matrix W nn, a diagonal matrix proportional to the reciprocal variances of the errors, allows for the following matrix representation of the penalized sums of squares: S = 1 n (Y? T? M)T W (Y? T? M) + T M: (1) After taking partial derivatives of (1) with respect to and, a QR decomposition of T, F T T = R , is used to enforce the constraint T T = 0. F nn is an orthogonal matrix that can be partioned F = [F1 j F2] where F1 has columns that span the column space of T, i.e T = F1R, and F2 is orthogonal to the column space of T. Reparameterizing by letting = F2!2,!1, and! T = (!1;!2) T allows for the problem to be posed in ridge 4

5 regression form: X T W X! + H! = X T W Y; where X = T MF2 T and H = : 0 F2 T MF 2 Note that in the ridge regression formulation the roughness penalty is represented by the matrix H. 3 Broken spline roughness penalty The degree of the polynomial component implies a specic roughness penalty in the thin plate spline, and thus if T, the design matrix for the polynomial component does not span P m?1 the roughness penalty minimized in the ridge regression formulation will not be the same roughness penalty as the one used to derive the thin plate spline. We will use the term broken spline to describe splines where T does not span P m?1. For purposes of this comparative discussion the T matrix for a thin plate spline will be denoted T P and the T matrix for a broken spline will be denoted T R. Note that the column space of T R is a subset of that of T P. Partition F F = [F a F b F c ] where F a spans the space of T R, F b spans the columns of T P that are not in T R, and F c spans the space orthogonal to T P. Thus in the QR decomposition of T P, F2 = F c and in the QR decomposition of T R, F2 = [F b F c ]: 5

6 For the broken spline H, the roughness penalty matrix, becomes H R = rr Fb T MF b Fc T MF b 0 Fb T MF c Fc T MF c ; as compared to the H for the thin plate spline which is H t = rr t?rt?r Fc T MF c : Hence, H R for the broken spline is the roughness penalty for the thin plate spline, plus other terms, H R = H t F T b MF b F T c MF b 0 F T b MF c : With thin plate splines, H t is guaranteed to be non-negative denite. Recall T M > 0 where T T = 0 and 2m? d > 0, which guarantees that M is positive denite. F2 T MF 2, is a quadratic form of a positive denite matrix and as such is also positive denite. H t, having F2 T MF 2 in the lower right block and zeros elsewhere, is non-negative denite. With broken splines, however, H R is no longer a non-negative denite matrix. In the computational solution to the thin plate spline, UDU T is the singular value decomposition of BHB so H t = B?1 UDU T B?1. Now, BHB is no longer positive denite and thus its singular value decomposition is UDV T, where U 6= V, and H R = B?1 UDV T B?1. Let H R = B?1 UDU T B?1, which is a non-negative denite matrix because D is a diagonal matrix of singular values (which are non-negative by denition) and H is a quadratic form of D. If H R is replaced by H R, a non-negative denite roughness penalty is minimized. Because H R is a symmetric matrix, U and V will be the same except for sign changes in 6

7 columns corresponding to the negative eigenvalues. Hence, H R, is the matrix that results from forcing the negative eigenvalues of H R to be positive. How \close" the two matrices are depends on the magnitude of these eigenvalues. The negative eigenvalues that occur correspond to the \missing" polynomial terms in the null space. Thus, the roughness penalty is forced into penalizing roughness in the surface that might be modeled by these missing polynomial terms, tting them instead with the exible radial basis functions. 3.1 Demmler-Reinsch basis functions Denition Constructing a Demmler-Reinsch basis for the smoothing problem aids in the understanding of the roles the eigenvalues and eigenvectors of H have in the spline solution. Before dening the Demmler-Reinsch basis, two inner products are dened: < h ; h >1= X k h (x k )W k h (x k ) and < h ; h >2= Z < d X m! 1!::: m m 1 1 :::@x d 1 1 :::@x d d! dx The Demmler-Reinsch basis is denoted by fg g and is dened by the three following properties: 1. fg g for 1 N spans the same subspace as f j g and the radial basis functions. 2. < g ; g >1= 1 for = and 0 for 6=. 3. < g ; g >2= D for = and 0 for 6=. In other words, this basis consists of linear combinations of the polynomial and radial basis functions that are orthogonal with respect to the inner products by which the weighted 7

8 residual sums of squares and the roughness penalty can be expressed. By convention D are in ascending order Role in spline solution Because this basis spans the same space as the components of f(x) we can now express both f(x) and y as a linear combinations of these basis functions: NX NX f(x k ) = g (x k ) and y k = u g (x k ): (2) =1 =1 Using these expression it can be easily shown that u = P n k=1 g (x k )W k Y k : Moreover, under the assumption that the random errors are independent and distributed N(0; 2 I), the u 's are also independent and are distributed N(; 2 I): With the Demmler-Reinsch basis functions the residual sum of squares and the roughness penalty can be expressed: nx nx (Y k? f(x k )) 2 w k = (u? ) 2 k=1 =1 and nx J m (f) = D 2 : =1 Thus, it is apparent that minimizing S (f) is equivalent to nding The minimizing expression is: min 2< n nx nx (u? ) 2 + D 2 : (3) =1 =1 and so, c = u 1 + D ; ^f(x) = NX =1 c g (x) = 8 NX =1 u 1 + D g (x): (4)

9 In fact, the G = fg g =1;::;N we are seeking is (cleverly) the same G we found in the ridge regression formulation and property (2) is easily veried by noting that G T (X T W X)G = I: To determine if property (3) holds, observe that G T HG = D. As an important point of clarication recall that the upper r r block of H is 0 where r is the dimension of the null space. This implies D = 0 for = 1; ::; r where D are the eigenvalues of G T HG. More intuitively, by virtue of the way in which the basis functions are dened, the roughness of the basis functions is J m (g ) = D and r of these basis functions will be polynomials whose roughness is 0. With this formulation, it is apparent that if the D 's, which are the eigenvalues of H, and the g 's, which are the Demmler-Reinsch basis functions, are correlated for the thin plate and broken spline, then the two roughness penalties are similar and measuring the same features of the data. 4 Examples 4.1 Example datasets Four sample datasets that are available in S-Plus (StatSci, 1993), the stack and ethanol datasets, and FUNFITS, the mini-triathalon and BD2 datasets were used to compare these roughness penalties for ts to the data with a thin plate spline and a broken spline. The stack data is from operation of a plant for the oxidation of ammonia to nitric acid, measured on 21 consecutive days. The explanatory variables are air ow to the plant, cooling water inlet temperature, and acid concentration and the response variable is percent of ammonia lost. The ethanol data frame records 88 measurements an experiment in which ethanol was burned in a single cylinder automobile test engine. The explanatory variables are the compression ratio of the engine and the equivalence ratio at which the 9

10 engine was run (a measure of the richness of the air/ethanol mix). The response variable is the concentration of nitric oxide and nitrogen dioxide in engine exhaust, normalized by the work done by the engine. The mini-triathalon data contains swimming, biking, and running times from 110 entrants in a Cary, NC event. In the example we are using swimming and biking times to predict running times. The BD2 dataset comes from results of a sequence of RSM designs in a DNA amplication experiment performed at Becton Dickinson Technologies. The explanatory variables are potassium phosphate ( a buer), magnesium acetate (a salt), and dimethyl sulfoxide ( a solvent). The response variable is yield. In summary, the ethanol and mini-triathalon datasets each have 2 explanatory variables and the stack and BD2 datasets have 3. Thin plate splines were t with m = 2 or a linear null space and the broken splines were t with a m = 1 or a constant null space. Figure 1 shows the plots of the eigenvalues of H for the two types of splines. These plots demonstrate that these eigenvalues are nearly equivalent for all but a few points. Notice that the number of eigenvalues that dier signicantly equals the number of explanatory variables, which in turn equals the number of linear terms missing from the polynomial null space. These are the specic eigenvalues that correspond to polynomial terms in the null space of the thin plate spline, and hence are 0 for the thin plate spline and positive for the broken spline. 10

11 minitri H eigenvalues H eigenvalues for m=1 and stack.loss H eigenvalues m= m=1 ethanol H eigenvalues BD2 H eigenvalues m= m=1 Figure 1: Comparison of eigenvalues for m=1 (broken splines) and (thin plate splines). 11

12 Figure 2 shows image plots of the correlations of the thin plate spline and broken spline Demmler-Reinsch basis functions for these four data sets. Lines on the graphs indicate the division of the basis functions associated with the null space and roughness penalty. The areas in the regions greater than the lines on the graph indicate the regions where the basis functions are those associated with the null space. The area enclosed within the lines indicates the basis functions associated with the roughness penalty, which these plots show are highly correlated. As expected, the area outside the lines is not as highly correlated, as the polynomials in the null space are not the same for the thin plate and broken splines. There is, however, some correlation between the basis functions representing the null space in the thin plate spline and those representing the roughness penalty in the broken spline. This indicates that the polynomial terms missing from the broken spline are being accounted for in some way by the roughness penalty. There is some dierence between the two sets of basis functions, but overall when ordered in ascending order by eigenvalues the two sets of basis functions match closely beyond the rst few functions related to the null space. Clearly, more theoretical work is needed to explicitly quantify the relationship given these promising empirical results. Figure 3 demonstrates the results of tting the mini-triathalon dataset with broken and thin plate splines. The ts are approximately the same and nothing essential has been sacriced by tting the reduced null space, again evidence that the roughness penalties are similar. 12

13 Correlation of Demmler-Reinsch basis functions for m=1 and minitri stack.loss m=1 m=1 ethanol BD m=1 m=1 Figure 2: Correlation of basis functions for m=1 (broken splines) and (thin plate splines). 13

14 Predicted Values Observed Values R^2 = 69.56% Q^2 = 64.88% Predicted values Residuals m=1, Broken spline RMSE = Effective number of parameters Average Prediction Error Eff. df. = 8 Res. df. = 102 GCV min = GCV Predicted Values Observed Values R^2 = 70.23% Q^2 = 64.66% Predicted values Residuals , Thin plate Spline RMSE = Effective number of parameters Average Prediction Error Eff. df. = 9.6 Res. df. = GCV min = GCV Figure 3: Diagnostic plots for the mini-triathalon dataset t with a broken and thin plate spline. 14

15 4.2 A larger example dataset example m= Figure 4: Comparison of H eigenvalues for the example dataset t with a broken and thin plate spline. The next example arose from research being conducted at Becton Dickinson Technologies and because of the proprietary nature of the data, the details can not be given. This example has 5 explanatory variables, and so m for a thin plate spline must be 3. In other words, for a thin plate spline a full quadratic null space with 21 terms must be t. The broken spline was t with m = 2, or a linear null space with 6 terms. Figure 4 is a scatterplot of the 291 eigenvalues of H for the broken versus the thin plate spline. This plot shows that the majority of eigenvalues are very closely related, except for 15 of the eigenvalues related to the terms in the null space of the thin plate spline that are not in the null space of the broken spline. Another approximately 15 eigenvalues are smaller for 15

16 the broken spline than for the thin plate spline. example m= Figure 5: Correlation of basis functions for the example dataset t with a broken and thin plate spline. Figure 5 shows that, again, the basis functions are highly correlated for the thin plate and broken spline. They are less correlated at the higher end than the previous examples because of the higher dimensionality. The diagnostic plots, Figure 6, indicate that in this 16

17 case the broken spline is doing a slightly better job of tting the data. Predicted Values Observed Values R^2 = 45.12% Q^2 = 10.18% Predicted values Residuals , Broken spline RMSE = Pure Error = Effective number of parameters Average Prediction Error Eff. df. = 46.3 Res. df. = GCV min = GCV Predicted Values Observed Values R^2 = 37.5% Q^2 = 6.3% Predicted values Residuals m=3, Thin plate Spline RMSE = Pure Error = Effective number of parameters Average Prediction Error Eff. df. = 46.3 Res. df. = GCV min = GCV Figure 6: Diagnostic plots for the example dataset t with a broken and thin plate spline. 5 Conclusions The evidence indicates that broken splines are a feasible model tting tool. Like the thin plate spline, the broken spline is the sum of polynomial terms and radial basis functions and as such it is a smooth function with considerably more exibility than the traditional polynomial model. Also like the thin plate spline, it is the solution to a minimization problem in which the mean-squared error and a roughness component are minimized. 17

18 It can be shown that the thin plate spline solution is the unique and optimal solution to the posed minimization problem. The broken spline is also a unique solution to the minimization problem given a xed roughness penalty matrix as described. In the thin plate spline, the roughness penalty is explicitly dened and makes intuitive sense from a physical perspective as the bending energy of a thin plate. In fact, positive denite matrices arise in many applications involving energy, and clearly only make sense for a roughness penalty that is to be minimized. With the broken spline, although the eective roughness penalty can not be explicitly dened, it is closely related to the \bending energy" roughness penalty and empirically gives good results. The new insight that it is possible to achieve good results using the sum of polynomial terms and radial basis functions to estimate smooth functions without the restrictions imposed for thin plate splines opens the door for experimentation using this modeling technique for many more types of datasets. A Denition of a thin plate spline The thin plate spline estimator of f (Wahba, 1990) is the minimizer of the following penalized sums of squares for a d-dimensional explanatory variable x: nx S (f) = 1 w i (y i? f(x i )) 2 + J m (f) for > 0: (5) n i=1 Thus, an ^f minimizing (5) will result in f with some level of smoothness dictated by. The function that minimizes this expression has the form tx NX f(x i ) = j (x i ) j + E(k x i? x k k; m; d) k (6) j=1 k=1 where tx j (x i ) j = 0; 1 j t and 2m? d > 0: (7) j=1 18

19 In this formulation, the j (x i ) are a set of t polynomial functions (of order m? 1) and E(k x i? x k k; m; d) are a set of N radial basis functions. It is assumed that j is estimable. The radial basis functions are explicitly dened as below: E(r; m; d) = 8 >< >: a md k r k (2m?d) log(k r k) a md k r k (2m?d) d even d odd where a md depends only on m and d. One standard way of determining is by generalized cross-validation (GCV) (Bates et al, 1987). B References Bates, D.M., Lindstrom, M.J., Wahba, G. and Yandell, B.S. (1987). GCVPACK - Routines for generalized cross validation. Comm. Stat. Sim. Comp. 16, Berlin, pp Nychka, D., B. Bailey, S. Ellner, P. Haaland, and M. O'Connell (1996). FunFits data analysis and statistical tools for estimating functions. Software and paper available from statlib. StatSci (1993) S-PLUS Reference Manual Vol. 2, Version 3.2 MathSoft, Inc., Seattle, WA. Wahba, G. (1990). Spline Models for Observational Data. Society for Industrial Applied Mathematics. Philadelphia PA. 19

2 <Author Name(s)> and Green, 1994). This research was motivated by a specic application with twelve potential independent variables. If the functiona

2 <Author Name(s)> and Green, 1994). This research was motivated by a specic application with twelve potential independent variables. If the functiona Variable Selection for Response Surface Modeling Using R-Splines Sarah W. HARDY Douglas W. NYCHKA North Carolina State University National Center for Atmospheric Research Raleigh, NC 27695-8203 Boulder,