Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Similar documents
Splines and penalized regression

Generalized additive models I

Lecture 16: High-dimensional regression, non-linear regression

Lecture 17: Smoothing splines, Local Regression, and GAMs

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

Lecture 7: Splines and Generalized Additive Models

Moving Beyond Linearity

Generalized Additive Models

Moving Beyond Linearity

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

STAT 705 Introduction to generalized additive models

Generalized additive models II

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines

Nonparametric regression using kernel and spline methods

Generalized Additive Model

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Stat 8053, Fall 2013: Additive Models

Last time... Bias-Variance decomposition. This week

CS 450 Numerical Analysis. Chapter 7: Interpolation

Computational Physics PHYS 420

Assessing the Quality of the Natural Cubic Spline Approximation

Four equations are necessary to evaluate these coefficients. Eqn

Chapter 5: Basis Expansion and Regularization

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Chapter 19 Estimating the dose-response function

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

Economics Nonparametric Econometrics

February 2017 (1/20) 2 Piecewise Polynomial Interpolation 2.2 (Natural) Cubic Splines. MA378/531 Numerical Analysis II ( NA2 )

davidr Cornell University

Rational Bezier Surface

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Curve fitting using linear models

Statistical Modeling with Spline Functions Methodology and Theory

Nonlinearity and Generalized Additive Models Lecture 2

Interactive Graphics. Lecture 9: Introduction to Spline Curves. Interactive Graphics Lecture 9: Slide 1

Nonparametric Approaches to Regression

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that

Statistical Modeling with Spline Functions Methodology and Theory

Spline Curves. Spline Curves. Prof. Dr. Hans Hagen Algorithmic Geometry WS 2013/2014 1

The pspline Package. August 4, Author S original by Jim Ramsey R port by Brian Ripley

Applied Statistics : Practical 9

Nonparametric Regression

Lecture 13: Model selection and regularization

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is...

Lecture 8. Divided Differences,Least-Squares Approximations. Ceng375 Numerical Computations at December 9, 2010

Engineering Analysis ENG 3420 Fall Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Parameterization of triangular meshes

2 Regression Splines and Regression Smoothers

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Interpolation and Splines

Natural Quartic Spline

8 Piecewise Polynomial Interpolation

Network Traffic Measurements and Analysis

Parameterization. Michael S. Floater. November 10, 2011

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Nonparametric Methods Recap

Topics in Machine Learning

Nonparametric Regression and Generalized Additive Models Part I

Lecture 9: Introduction to Spline Curves

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Section 4 Matching Estimator

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Patrick Breheny. November 10

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Concept of Curve Fitting Difference with Interpolation

Important Properties of B-spline Basis Functions

Curve Representation ME761A Instructor in Charge Prof. J. Ramkumar Department of Mechanical Engineering, IIT Kanpur

Using the SemiPar Package

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR

Multiple Linear Regression

Fitting latency models using B-splines in EPICURE for DOS

BASIC LOESS, PBSPLINE & SPLINE

Estimating survival from Gray s flexible model. Outline. I. Introduction. I. Introduction. I. Introduction

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica

Linear Interpolating Splines

Splines. Chapter Smoothing by Directly Penalizing Curve Flexibility

Differentiation of Cognitive Abilities across the Lifespan. Online Supplement. Elliot M. Tucker-Drob

Lecture VIII. Global Approximation Methods: I

2D Spline Curves. CS 4620 Lecture 13

Model selection and validation 1: Cross-validation

An introduction to interpolation and splines

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

A review of spline function selection procedures in R

Lecture 7: Linear Regression (continued)

lecture 10: B-Splines

I How does the formulation (5) serve the purpose of the composite parameterization

See the course website for important information about collaboration and late policies, as well as where and when to turn in assignments.

NONPARAMETRIC REGRESSION TECHNIQUES

Generalized Additive Models

Introduction to Mixed Models: Multivariate Regression

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

More advanced use of mgcv. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K.

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Smoothing non-stationary noise of the Nigerian Stock Exchange All-Share Index data using variable coefficient functions

HLM versus SEM Perspectives on Growth Curve Modeling. Hsueh-Sheng Wu CFDR Workshop Series August 3, 2015

Transcription:

Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46

Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape, such as linear or quadratic, that can be estimated parametrically We have also discussed locally weighted linear/polynomial models as a way of allowing f to be more flexible An alternative approach is to introduce local basis functions Patrick Breheny STA 621: Nonparametric Statistics 2/46

Basis functions Introduction Introduction Problems with polynomial bases A common approach for extending the linear model is to augment the linear component of x with additional, known functions of x: f(x) = M β m h m (x), m=1 where the {h m } are known functions called basis functions Because the basis functions {h m } are prespecified and the model is linear in these new variables, ordinary least squares approaches for model fitting and inference can be employed This idea is not new to you, as you have encountered transformations and the inclusion of polynomial terms in models in earlier courses Patrick Breheny STA 621: Nonparametric Statistics 3/46

Problems with polynomial regression Introduction Problems with polynomial bases However, polynomial terms introduce undesirable side effects: each observation affects the entire curve, even for x values far from the observation Not only does this introduce bias, but it also results in extremely high variance near the edges of the range of x As Hastie et al. (2009) put it, tweaking the coefficients to achieve a functional form in one region can cause the function to flap about madly in remote regions Patrick Breheny STA 621: Nonparametric Statistics 4/46

Introduction Problems with polynomial bases Problems with polynomial regression (cont d) To illustrate this, consider the following simulated example (gray lines are models fit to 100 observations arising from the true f, colored red): Up to x 2 Up to x 4 Up to x 6 f(x) 1 0 1 2 3 f(x) 1 0 1 2 3 f(x) 1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x x Patrick Breheny STA 621: Nonparametric Statistics 5/46

Global versus local bases Introduction Problems with polynomial bases We can do better than this Let us consider instead local basis functions, thereby ensuring that a given observation affects only the nearby fit, not the fit of the entire line In this lecture, we will discuss splines: piecewise polynomials joined together to make a single smooth curve Patrick Breheny STA 621: Nonparametric Statistics 6/46

The piecewise constant model Regression splines Natural cubic splines Inference To understand splines, we will gradually build up a piecewise model, starting at the simplest one: the piecewise constant model First, we partition the range of x into K + 1 intervals by choosing K points {ξ k } K k=1 called knots For our example involving bone mineral density, we will choose the tertiles of the observed ages Patrick Breheny STA 621: Nonparametric Statistics 7/46

Regression splines Natural cubic splines Inference The piecewise constant model (cont d) female male 0.20 Relative change in spinal BMD 0.15 0.10 0.05 0.00 0.05 Age Patrick Breheny STA 621: Nonparametric Statistics 8/46

The piecewise linear model Regression splines Natural cubic splines Inference female male 0.20 Relative change in spinal BMD 0.15 0.10 0.05 0.00 0.05 Age Patrick Breheny STA 621: Nonparametric Statistics 9/46

Regression splines Natural cubic splines Inference The continuous piecewise linear model female male 0.20 Relative change in spinal BMD 0.15 0.10 0.05 0.00 0.05 Age Patrick Breheny STA 621: Nonparametric Statistics 10/46

Regression splines Natural cubic splines Inference Basis functions for piecewise continuous models These constraints can be incorporated directly into the basis functions: h 1 (x) = 1, h 2 (x) = x, h 3 (x) = (x ξ 1 ) +, h 4 (x) = (x ξ 2 ) +, where ( ) + denotes the positive portion of its argument: { r if r 0 r + = 0 if r < 0 Patrick Breheny STA 621: Nonparametric Statistics 11/46

Regression splines Natural cubic splines Inference Basis functions for piecewise continuous models It can be easily checked that these basis functions lead to a composite function f(x) that: Is linear everywhere except the knots Has a different intercept and slope in each region Is everywhere continuous Also, note that the degrees of freedom add up: 3 regions 2 degrees of freedom in each region - 2 constraints = 4 basis functions Patrick Breheny STA 621: Nonparametric Statistics 12/46

Splines Introduction Regression splines Natural cubic splines Inference The preceding is an example of a spline: a piecewise m 1 degree polynomial that is continuous up to its first m 2 derivatives By requiring continuous derivatives, we ensure that the resulting function is as smooth as possible We can obtain more flexible curves by increasing the degree of the spline and/or by adding knots However, there is a tradeoff: Few knots/low degree: Resulting class of functions may be too restrictive (bias) Many knots/high degree: We run the risk of overfitting (variance) Patrick Breheny STA 621: Nonparametric Statistics 13/46

The truncated power basis Regression splines Natural cubic splines Inference The set of basis functions introduced earlier is an example of what is called the truncated power basis Its logic is easily extended to splines of order m: h j (x) = x j 1 j = 1,..., m h m+k (x) = (x ξ k ) m 1 + k = 1,..., K Note that a spline has m + K degrees of freedom Homework: Write the set of truncated spline basis functions for representing a cubic spline function with three knots. Patrick Breheny STA 621: Nonparametric Statistics 14/46

Quadratic splines Introduction Regression splines Natural cubic splines Inference female male 0.20 Relative change in spinal BMD 0.15 0.10 0.05 0.00 0.05 Age Patrick Breheny STA 621: Nonparametric Statistics 15/46

Cubic splines Introduction Regression splines Natural cubic splines Inference female male 0.20 Relative change in spinal BMD 0.15 0.10 0.05 0.00 0.05 Age Patrick Breheny STA 621: Nonparametric Statistics 16/46

Additional notes Introduction Regression splines Natural cubic splines Inference These types of fixed-knot models are referred to as regression splines Recall that cubic splines contain 4 + K degrees of freedom: K + 1 regions 4 parameters per region - K knots 3 constraints per knot It is claimed that cubic splines are the lowest order spline for which the discontinuity at the knots cannot be noticed by the human eye There is rarely any need to go beyond cubic splines, which are by far the most common type of splines in practice Patrick Breheny STA 621: Nonparametric Statistics 17/46

Implementing regression splines Regression splines Natural cubic splines Inference The truncated power basis has two principal virtues: Conceptual simplicity The linear model is nested inside it, leading to simple tests of the null hypothesis of linearity Unfortunately, it has several computational/numerical flaws it s inefficient and can lead to overflow and nearly singular matrix problems The more complicated but numerically much more stable and efficient B-spline basis is often employed instead Patrick Breheny STA 621: Nonparametric Statistics 18/46

B-splines in R Introduction Regression splines Natural cubic splines Inference Fortunately, one can use B-splines without knowing the details behind their complicated construction In the splines package (which by default is installed but not loaded), the bs() function will implement a B-spline basis for you X <- bs(x,knots=quantile(x,p=c(1/3,2/3))) X <- bs(x,df=5) X <- bs(x,degree=2,df=10) Xp <- predict(x,newdata=x) By default, bs uses degree=3, knots at evenly spaced quantiles, and does not return a column for the intercept Patrick Breheny STA 621: Nonparametric Statistics 19/46

Natural cubic splines Regression splines Natural cubic splines Inference Polynomial fits tend to be erratic at the boundaries of the data; naturally, cubic splines share the same flaw Natural cubic splines ameliorate this problem by adding the additional (4) constraints that the function is linear beyond the boundaries of the data Patrick Breheny STA 621: Nonparametric Statistics 20/46

Natural cubic splines (cont d) Regression splines Natural cubic splines Inference female male 0.20 Relative change in spinal BMD 0.15 0.10 0.05 0.00 0.05 Age Patrick Breheny STA 621: Nonparametric Statistics 21/46

Natural cubic splines, 6 df Regression splines Natural cubic splines Inference female male 0.20 Relative change in spinal BMD 0.15 0.10 0.05 0.00 0.05 Age Patrick Breheny STA 621: Nonparametric Statistics 22/46

Regression splines Natural cubic splines Inference Natural cubic splines, 6 df (cont d) Female Male Relative change in spinal BMD 0.05 0.00 0.05 0.10 0.15 0.20 Age Patrick Breheny STA 621: Nonparametric Statistics 23/46

Splines vs. Loess (6 df each) Regression splines Natural cubic splines Inference loess spline female male 0.20 Relative change in spinal BMD 0.15 0.10 0.05 0.00 0.05 Age Patrick Breheny STA 621: Nonparametric Statistics 24/46

Natural splines in R Regression splines Natural cubic splines Inference R also provides a function to compute a basis for the natural cubic splines, ns, which works almost exactly like bs, except that there is no option to change the degree Note that a natural spline has m + K 4 degrees of freedom; thus, a natural cubic spline with K knots has K degrees of freedom Patrick Breheny STA 621: Nonparametric Statistics 25/46

Mean and variance estimation Regression splines Natural cubic splines Inference Because the basis functions are fixed, all standard approaches to inference for regression are valid Furthermore, note that the resulting estimate is a linear smoother with L = X(X X) 1 X ; thus E(ˆf) = Lf where f = E(y x) V(ˆf) = σ 2 LL CV = 1 ( yi ŷ i n 1 l ii i Furthermore, extensions to logistic regression, Cox proportional hazards regression, etc., are straightforward ) 2 Patrick Breheny STA 621: Nonparametric Statistics 26/46

Regression splines Natural cubic splines Inference Mean and variance estimation (cont d) Age Relative change in spinal BMD 0.05 0.00 0.05 0.10 0.15 0.20 female male Patrick Breheny STA 621: Nonparametric Statistics 27/46

Regression splines Natural cubic splines Inference Mean and variance estimation (cont d) Coronary heart disease study, K = 4 2 0.8 1 0.6 ^(x) f 0 π^(x) 0.4 1 0.2 100 120 140 160 180 200 220 Systolic blood pressure 100 120 140 160 180 200 220 Systolic blood pressure Patrick Breheny STA 621: Nonparametric Statistics 28/46

Problems with knots Penalized splines Smoothing splines in R Fixed-df splines are useful tools, but are not truly nonparametric Choices regarding the number of knots and where they are located are fundamentally parametric choices and have a large effect on the fit Furthermore, assuming that you place knots at quantiles, models will not be nested inside each other, which complicates hypothesis testing An alternative, more direct approach is penalization Patrick Breheny STA 621: Nonparametric Statistics 29/46

Penalized splines Smoothing splines in R Controlling smoothness with penalization Here, we directly solve for the function f that minimizes the following objective function, a penalized version of the least squares objective: n {y i f(x i )} 2 + λ i=1 {f (u)} 2 du The first term captures the fit to the data, while the second penalizes curvature note that for a line, f (u) = 0 for all u Here, λ is the smoothing parameter, and it controls the tradeoff between the two terms: λ = 0 imposes no restrictions and f will therefore interpolate the data λ = renders curvature impossible, thereby returning us to ordinary linear regression Patrick Breheny STA 621: Nonparametric Statistics 30/46

Penalized splines Smoothing splines in R Controlling smoothness with penalization (cont d) This avoids the knot selection problem altogether by formulating the problem in a nonparametric manner It may sound impossible to solve for such an f over all possible functions, but the solution turns out to be surprisingly simple: as we will show, the solution to this problem lies in the family of natural cubic splines Patrick Breheny STA 621: Nonparametric Statistics 31/46

Terminology Introduction Penalized splines Smoothing splines in R First, some terminology: The parametric splines with fixed degrees of freedom that we have talked about so far are called regression splines A spline that passes through the points {x i, y i } is called an interpolating spline, and is said to interpolate the points {x i, y i } A spline that describes and smooths noisy data by passing close to {x i, y i } without the requirement of passing through them is called a smoothing spline Patrick Breheny STA 621: Nonparametric Statistics 32/46

Penalized splines Smoothing splines in R Natural cubic splines are the smoothest interpolators Theorem: Out of all twice-differentiable functions passing through the points {x i, y i }, the one that minimizes λ {f (u)} 2 du is a natural cubic spline with knots at every unique value of {x i } Patrick Breheny STA 621: Nonparametric Statistics 33/46

Penalized splines Smoothing splines in R Natural cubic splines solve the nonparametric formulation Theorem: Out of all twice-differentiable functions, the one that minimizes n {y i f(x i )} 2 + λ {f (u)} 2 du i=1 is a natural cubic spline with knots at every unique value of {x i } Patrick Breheny STA 621: Nonparametric Statistics 34/46

Design matrix Introduction Penalized splines Smoothing splines in R Let {N j } n j=1 denote the collection of natural cubic spline basis functions and N denote the n n design matrix consisting of the basis functions evaluated at the observed values: N ij = N j (x i ) f(x) = n j=1 N j(x)β j f(x) = Nβ Patrick Breheny STA 621: Nonparametric Statistics 35/46

Solution Introduction Penalized splines Smoothing splines in R Homework: Show that the objective function for penalized splines is where Ω jk = N j (t)n k (t)dt The solution is therefore (y Nβ) (y Nβ) + λβ Ωβ, β = (N N + λω) 1 N y Patrick Breheny STA 621: Nonparametric Statistics 36/46

Penalized splines Smoothing splines in R Smoothing splines are linear smoothers Note that the fitted values can be represented as ŷ = N(N N + λω) 1 N y = L λ y Thus, smoothing splines are linear smoothers, and we can use all the results that we derived back when discussing local regression Patrick Breheny STA 621: Nonparametric Statistics 37/46

Penalized splines Smoothing splines in R Smoothing splines are linear smoothers (cont d) In particular: CV = 1 n ( yi ŷ i i 1 l ii E ˆf(x 0 ) = l i (x 0 )f(x 0 ) i V ˆf(x 0 ) = σ 2 l(x 0 ) 2 i ˆσ 2 = (y i ŷ i ) 2 n ν ν = tr(l) ) 2 ν = 2tr(L) tr(l L) Patrick Breheny STA 621: Nonparametric Statistics 38/46

CV, GCV for BMD example Penalized splines Smoothing splines in R CV GCV 2 4 8 16 32 64 female male 0.006 0.005 0.004 0.003 0.002 2 4 8 16 32 64 Degrees of freedom Patrick Breheny STA 621: Nonparametric Statistics 39/46

Penalized splines Smoothing splines in R Undersmoothing and oversmoothing of BMD data 0.05 0.05 0.15 0.05 0.05 0.15 Males 0.05 0.05 0.15 0.05 0.05 0.15 0.05 0.05 0.15 Females 0.05 0.05 0.15 Patrick Breheny STA 621: Nonparametric Statistics 40/46

Penalized splines Smoothing splines in R Pointwise confidence bands Age Relative change in spinal BMD 0.05 0.00 0.05 0.10 0.15 0.20 female male Patrick Breheny STA 621: Nonparametric Statistics 41/46

R implementation Introduction Penalized splines Smoothing splines in R Recall that local regression had a simple, standard function for basic one-dimensional smoothing (loess) and an extensive package for more comprehensive analyses (locfit) Spline-based smoothing is similar smooth.spline does not require any packages and implements simple one-dimensional smoothing: fit <- smooth.spline(x,y) plot(fit,type="l") predict(fit,xx) By default, the function will choose λ based on GCV, but this can be changed to CV, or you can specify λ Patrick Breheny STA 621: Nonparametric Statistics 42/46

The mgcv package Introduction Penalized splines Smoothing splines in R If you have a binary outcome variable or multiple covariates or want confidence intervals, however, smooth.spline is lacking A very extensive package called mgcv provides those features, as well as much more The basic function is called gam, which stands for generalized additive model (we ll discuss GAMs more in a later lecture) Note that the main function of the gam package was also called gam; it is best to avoid having both packages loaded at the same time Patrick Breheny STA 621: Nonparametric Statistics 43/46

The mgcv package (cont d) Penalized splines Smoothing splines in R The syntax of gam is very similar to glm and locfit, with a function s() placed around any terms that you want a smooth function of: fit <- gam(y~s(x)) fit <- gam(y~s(x),family="binomial") plot(fit) plot(fit,shade=true) predict(fit,newdata=data.frame(x=xx),se.fit=true) summary(fit) Patrick Breheny STA 621: Nonparametric Statistics 44/46

Hypothesis testing Introduction Penalized splines Smoothing splines in R As with local regression, we are often interested in testing whether the smaller of two nested models provides an adequate fit to the data As before, we may construct F tests and generalized likelihood ratio tests: F = (RSS 0 RSS 1 )/q ˆσ 2. F q,n ν1 { } Λ = 2 l( β y) l( β 0 y). χ 2 q where q is the difference in degrees of freedom between the two models Patrick Breheny STA 621: Nonparametric Statistics 45/46

Hypothesis testing (cont d) Penalized splines Smoothing splines in R As with local regression, these tests do not strictly preserve type I error rates, but still provide a way to compute useful approximate p-values in practice Like the gam package, mgcv provides an anova method: anova(fit0,fit,test="f") anova(fit0,fit,test="chisq") It should be noted, however, that such tests treat λ as fixed, even though in reality it is often estimated from the data using CV or GCV Patrick Breheny STA 621: Nonparametric Statistics 46/46