Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines

Size: px

Start display at page:

Download "Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines"

Hope Kennedy
6 years ago
Views:

1 Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3

2 Cubic Splines a knots: a< 1 < 2 < < m <b A function g defined on [a, b] is a cubic spline w.r.t knots { i } m i=1 if: 1) g is a cubic polynomial in each of the m +1intervals, g(x) =d i x 3 + c i x 2 + b i x + a i, x 2 [ i, i+1 ] where i =0:m, 0 = a and m+1 = b; 2) g is continuous up to the 2nd derivative: since g is continuous up to the 2nd derivative for any point inside an interval, it su ces to check g (0,1,2) ( + i )=g(0,1,2) ( i ),i=1:m. a From now on, x 2 R is one-dimensional. 4

3 How many free parameters we need to represent g? m +4. We need 4 parameters (d 1,c i,b i,a i ) for each of the (m + 1) intervals, but we also have 3 constraints at each of the m knots, so 4(m + 1) 3m = m +4. 5

4 Suppose the knots { i } m i=1 are given. If g 1 (x) and g 2 (x) are two cubic splines, so is a 1 g 1 (x)+a 2 g 2 (x), wherea 1 and a 2 are two constants. That is, for a set of given knots, the corresponding cubic splines form a linear space (of functions) with dim (m + 4). 6

5 A set of basis functions for cubic splines (wrt knots { i } m i=1 ) is given by h 0 (x) = 1; h 1 (x) =x; h 2 (x) =x 2 ; h 3 (x) =x 3 ; h i+3 (x) =(x i ) 3 +, i =1, 2,..., m. That is, any cubic spline f(x) can be uniquely expressed as f(x) = 0 + m+3 X i=1 jh j (x). Of course, there are many other choices of the basis functions. For example, R uses the B-splines basis functions. 7

6 Natural Cubic Splines (NCS) A cubic spline on [a, b] is a NCS if its second and third derivatives are zero at a and b. That is, a NCS is linear in the two extreme intervals [a, 1 ] and [ m,b]. Note that the linear function in two extreme intervals are totally determined by their neighboring intervals. The degree of freedom of NCS s with m knots is m. For a curve estimation problem with data (x i,y i ) n i=1,ifweputnknots at the n data points (assumed to be unique), then we obtain a smooth curve (using NCS) passing through all y s. 8

7 Regression Splines A basis expansion approach: g(x) = 1 h 1 (x)+ 2 h 2 (x)+ + p h p (x), where p = m +4for regression with cubic splines and p = m for NCS. Represent the model on the observed n data points using matrix notation, ˆ = arg min ky F k 2, 9

8 where 0 y 1 y 2 y n 1 C A n 1 = 0 h 1 (x 1 ) h 2 (x 1 ) h p (x 1 ) h 1 (x 2 ) h 2 (x 2 ) h p (x 2 ) h 1 (x n ) h 2 (x n ) h p (x n ) 1 C A n p 0 1 p 1 C A p 1 We can obtain the design matrix F by commands bs or ns in R, andthen call the regression function lm. Use K-fold CV to select the number of knots. 10

9 Understand how R counts the degree-of-feedom. To generate a cubic spline basis for a given set of x i s, you can use the command bs. You can tell R the location of knots. Or you can tell R the df. Recall that a cubic spline with m knots has m +4df, so we need m = df 4 knots. By default, R puts knots at the 1/(m + 1),..., m/(m + 1) quantiles of x 1:n. 11

10 How R counts the df is a little confusing. The df in command bs actually means the number of columns of the design matrix returned by bs.soifthe intercept is not included in the design matrix (which is the default), then the df in command bs is equal to the real df minus 1. So the following three design matrices (the first two are of n 5 and the last one is of n 6) correspond to the same regression model with cubic splines of df 6. > bs(x, knots=quantile(x, c(1/3, 2/3))); > bs(x, df=5); > bs(x, df=6, intercept=true); 12

11 To generate a NCS basis for a given set of x i s, use the command ns. Recall that the linear functions in the two extreme intervals are totally determined by the other cubic splines. So data points in the two extreme intervals (i.e., outside the two boundary knots) are wasted since they do not a ect the fitting. Therefore, by default, R puts the two boundary knots as the min and max of x i s. You can tell R the location of knots, which are the interior knots. Recall that a NCS with m knots has m df. So the df is equal to the number of (interior) knots plus 2, where 2 means the two boundary knots. 13

12 Or you can tell R the df. If intercept = TRUE, thenweneedm = df 2 knots, otherwise we need m = df 1 knots. Again, by default, R puts knots at the 1/(m + 1),..., m/(m + 1) quantiles of x 1:n. The following three design matrices (the first two are of n 3 and the last one is of n 4) correspond to the same regression model with NCS of df 4. > ns(x, knots=quantile(x, c(1/3, 2/3))); > ns(x, df=3); > ns(x, df=4, intercept=true); 14

13 Choice of Knots Location of knots: to simplify this problem, we ignore the selection of locations by default, the knots are located at the quantiles of x i s. Number of knots: can be formulated as a variable selection problem (an easier version, since there are just p models, not 2 p ). AIC/BIC/R 2 adj m-fold CV (cross-validation) 15

14 Summary: Regression Splines Use LS to fit a spline model: Specify the DF a p, and then fit a regression model with a design matrix of p columns (including the intercept). How to do it in R? How to select the number/location of knots? a Not the polynomial degree, but the DF of the spline, related to the number of knots. 16

15 Smoothing Splines In Regression Splines (let s use NCS), we need to choose the number and the location of knots. What s a Smoothing Spline? Startwithaneasy but horrible solution: put knots at all the observed data points (x 1,...,x n ): y n 1 = F n n n 1. Instead of selecting knots, let s do ridge-type shrinkage ( will be defined later): min h ky F k 2 + t i, where the tuning parameter is often chosen by CV or GCV. Next we ll see how smoothing splines are derived from a di erent aspect. 19

16 Roughness Penalty Approach Let S[a, b] be the space of all smooth functions defined on [a, b]. Among all the functions in S[a, b], look for the minimizer of the following penalized residual sum of squares RSS(g, )= nx [y i g(x i )] 2 + i=1 Z b a [g 00 (x)] 2 dx, (1) where is a smoothing parameter. Theorem. ĝ = arg min RSS(g, ) is a NCS with knots at the n data points x 1,...,x n (x i 6= x j ). 20

17 (WLOG, assume n 2.) Let g be a function on [a, b] and g be a NCS with g(x i )= g(x i ), i =1:n. Does such g exist? Then Z g 00 2 Z g 00 2 ( ) with equality only if g g. PROOF :Leth(x) =g(x) g(x). Soh(x i )=0for i =1,...,n. Then ( ) holds true because Z g 002 = Z g Z Z +2 g 00 h 00 {z } =0 h

18 Smoothing Splines Write g(x) = P n i=1 ih i (x) where h i s are basis functions for NCS with knots at x 1,...,x n. nx [y i g(x i )] 2 =(y F ) t (y F ), i=1 where F n n with F ij = h j (x i ). Z b a g 00(x) 2 dx = Z h X = X i,j i i j Z ih 00 i (x)i 2dx h 00 i (x)h 00 j (x)dx = t, where n n with ij = R b a h00 i (x)h00 j (x)dx. 22

19 So and the solution is RSS(, )=(y F ) t (y F )+ t, ˆ = arg min RSS(, ) = (F t F + ) 1 F t y 23

20 Demmler & Reinsch (1975): a basis with double orthogonality property, i.e. F t F = I, = diag(d i ), where d 1 = d 2 =0(Why?). Using this basis, we have ˆ = (F t F + ) 1 F t y = (I + diag(d i )) 1 F t y, i.e., ˆi = 1 1+ d i ˆ(LS) i. 24

21 Smoother matrix S ŷ = Fˆ = F(F t F + ) 1 F t y = S y. Using D&R basis, S 1 = Fdiag F t. 1+ d i So columns of F are the eigen-vectors of S, which does not depend on. E ective df of a smoothing spline: df ( )=trs = nx i= d i. 25

22 Choice of Leave-one-out CV CV( ) = 1 n = 1 n nx i=1 nx i=1 [ y i ĝ [ i] (x i )] 2 2 yi ĝ(x i ). 1 S (i, i) Generalized CV nx yi ĝ(x i ) 2 GCV( )= 1 n i=1 1 1 n trs 26

23 Summary: Smoothing Splines Start with a model with the maximum complexity: NCS with knots at n (unique) x points. Fit a Ridge Regression model on the data. If we parameterize the NCS function space by the DR basis, then the design matrix is orthogonal and the corresponding coe cient is penalized di erently for each basis: no penalty for the two linear basis functions, higher penalty for wigglier basis functions. How to do it in R? How to select the tuning parameter or equivalently the df? What if we have collected two obs at the same location x? 27

24 Weighted Smoothing Splines Suppose the first two obs have the same x value, i.e., Then (x 1,y 1 ), (x 2,y 2 ), where x 1 = x 2. y1 g(x 1 ) 2 + y2 g(x 1 ) 2 = 2X i=1 yi y 1 + y y 1 + y 2 2 g(x 1 ) 2 = y 1 y 1 + y y 1 + y y2 y 1 + y 2 2 g(x 1 ) 2 2 So we can replace the first two obs by one, (x 1, y 1+y 2 2 ), and its weight is 2 while the weights for other obs are 1. 28

Moving Beyond Linearity

Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often