Nonparametric regression using kernel and spline methods

Similar documents
Non-Parametric and Semi-Parametric Methods for Longitudinal Data

Nonparametric Estimation of Distribution Function using Bezier Curve

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

Nonparametric Regression

Nonparametric Mixed-Effects Models for Longitudinal Data

Smoothing parameterselection forsmoothing splines: a simulation study

Nonparametric Approaches to Regression

Moving Beyond Linearity

Splines and penalized regression

Improved smoothing spline regression by combining estimates of dierent smoothness

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique

Economics Nonparametric Econometrics

Curve fitting using linear models

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Moving Beyond Linearity

Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Applied Statistics : Practical 9

lecture 10: B-Splines

Generalized Additive Model

Lecture 16: High-dimensional regression, non-linear regression

Introduction to ANSYS DesignXplorer

Assessing the Quality of the Natural Cubic Spline Approximation

Consider functions such that then satisfies these properties: So is represented by the cubic polynomials on on and on.

Lecture 17: Smoothing splines, Local Regression, and GAMs

Going nonparametric: Nearest neighbor methods for regression and classification

Nonparametric Frontier Estimation: The Smooth Local Maximum Estimator and The Smooth FDH Estimator

Interpolation by Spline Functions

Discount curve estimation by monotonizing McCulloch Splines

davidr Cornell University

Computational Physics PHYS 420

Introduction to Nonparametric/Semiparametric Econometric Analysis: Implementation

Lecture 9: Introduction to Spline Curves

Median and Extreme Ranked Set Sampling for penalized spline estimation

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

99 International Journal of Engineering, Science and Mathematics

Model selection and validation 1: Cross-validation

Four equations are necessary to evaluate these coefficients. Eqn

8 Piecewise Polynomial Interpolation

NONPARAMETRIC REGRESSION WIT MEASUREMENT ERROR: SOME RECENT PR David Ruppert Cornell University

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor

CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES

Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines. Robert J. Hill and Michael Scholz

Last time... Bias-Variance decomposition. This week

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Nonparametric Regression (and Classification)

Nonparametric Smoothing of Yield Curves

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR

Constrained Penalized Splines

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

3 Nonlinear Regression

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

STA 414/2104 S: February Administration

Animation Lecture 10 Slide Fall 2003

A Practical Review of Uniform B-Splines

Generative and discriminative classification techniques

Kernel Interpolation

Kernel Density Estimation (KDE)

Linear penalized spline model estimation using ranked set sampling technique

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is...

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Machine Learning / Jan 27, 2010

Interactive Graphics. Lecture 9: Introduction to Spline Curves. Interactive Graphics Lecture 9: Slide 1

Multivariable Regression Modelling

ME 261: Numerical Analysis Lecture-12: Numerical Interpolation

Going nonparametric: Nearest neighbor methods for regression and classification

3 Nonlinear Regression

Lecture 8. Divided Differences,Least-Squares Approximations. Ceng375 Numerical Computations at December 9, 2010

Mar. 20 Math 2335 sec 001 Spring 2014

Smooth Curve from noisy 2-Dimensional Dataset

CS 450 Numerical Analysis. Chapter 7: Interpolation

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Multivariate Analysis

Spline Curves. Spline Curves. Prof. Dr. Hans Hagen Algorithmic Geometry WS 2013/2014 1

Nonparametric Regression

Lecture 7: Splines and Generalized Additive Models

STAT 705 Introduction to generalized additive models

February 2017 (1/20) 2 Piecewise Polynomial Interpolation 2.2 (Natural) Cubic Splines. MA378/531 Numerical Analysis II ( NA2 )

arxiv: v1 [stat.me] 2 Jun 2017

Important Properties of B-spline Basis Functions

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016

Dynamic Thresholding for Image Analysis

Bezier Curves, B-Splines, NURBS

Lecture VIII. Global Approximation Methods: I

SPATIALLY-ADAPTIVE PENALTIES FOR SPLINE FITTING

Rational Bezier Surface

Algorithms for convex optimization

Nonparametric and Semiparametric Econometrics Lecture Notes for Econ 221. Yixiao Sun Department of Economics, University of California, San Diego

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Statistical Modeling with Spline Functions Methodology and Theory

Approximation of 3D-Parametric Functions by Bicubic B-spline Functions

Package freeknotsplines

Smoothing and Forecasting Mortality Rates with P-splines. Iain Currie. Data and problem. Plan of talk

Transcription:

Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested in estimating the relationship between one dependent variable, Y, and one or severalcovariates, X 1,...,X q. We discuss here the situation with one covariate, X (the case with multiple covariates is addressed in the references provided below). The relationship between X and Y can be expressed as the conditional expectation E(Y X = x) = f(x). Unlike in parametric regression, the shape of the function f( ) is not restricted to belong to a specific parametric family such as polynomials. This representation for the mean function is the key difference between parametric and nonparametric regression, and the remaining aspects of the statistical model for (X, Y) are similar between both regression approaches. In particular, the random variable Y is often assumed to have a constant (conditional) variance, Var(Y X) = σ, with σ unknown. The constant variance and other common regression model assumptions, such as independence, can be relaxed just as in parametric regression. Kernel methods Supposethatwehaveadatasetavailablewithobservations(x 1,y 1 ),...,(x n,y n ). A simple kernel-based estimator of f(x) is the Nadaraya-Watson kernel regression estimator, defined as ˆf h (x) = n K h(x i x)y i n K h(x i x), (1) Department of Statistics, Colorado State University, Fort Collins, CO, USA. Email: jopsomer@stat.colostate.edu. Department of Statistics, Colorado State University, Fort Collins, CO, USA. Email: jbreidt@stat.colostate.edu 1

with K h ( ) = K( /h)/h for some kernel function K( ) and bandwidth parameter h > 0. The function K( ) is usually a symmetric probability density and examples of commonly used kernel functions are the Gaussian kernel K(t) = ( π) 1 exp( t /) and the Epanechnikov kernel K(t) = max{ 3 4 (1 t ),0}. Generally, the researcher is not interested in estimating the value of f( ) at a single location x, but in estimating the curve over a range of values, say for all x [a x,b x ]. In principle, kernel regression requires computing (1) for any value of interest. In practice, ˆf h (x) is calculated on a sufficiently fine grid of x-values and the curve is obtained by interpolation. We used the subscript h in ˆf h (x) in (1) to emphasize the fact that the bandwidth h is the main determinant of the shape of the estimated regression, as demonstrated in Figure 1. When h is small relative to the range of the data, the resulting fit can be highly variable and look wiggly. When h is chosen to be larger, this results in a less variable, more smooth fit, but it makes the estimator less responsive to local features in the data and introduces the possibility of bias in the estimator. Selecting a value for the bandwidth in such a way that it balances the variance with the potential bias is therefore a crucial decision for researchers who want to apply nonparametric regression on their data. Data-driven bandwidth selection methods are available in the literature, including in the references provided below. A class of kernel-based estimators that generalizes the Nadaraya-Watson estimator in (1) is referred to as local polynomial regression estimators. At each location x, the estimator ˆf h (x) is obtained as the estimated intercept, ˆ 0, in the weighted least squares fit of a polynomial of degree p, min (y i 0 + 1 (x i x)+ + p (x i x) p )K h (x i x). This estimator can be written explicitly in matrix notation as ( ) 1X ˆf h (x) = (1,0,...,0) X T x W T xx x x W x Y, () where Y = (y 1,...,y n ) T, W x = diag{k h (x 1 x),...,k h (x n x)} and 1 x 1 x (x 1 x) p X x =.... 1 x n x (x n x) p It should be noted that the Nadaraya-Watson estimator (1) is a special case of the local polynomial regression estimator with p = 0. In practice, the local linear (p = 1) and local quadratic estimators (p = ) are frequently used. An extensive literature on kernel regression and local polynomial regression exists, and their theoretical properties are well understood. Both kernel regression and local polynomial regression estimators are biased but consistent

Figure 1: Dates (Julian days) of first sightings of bank swallows in Cayuga Lake basin, with three kernel regressions using bandwidth values h calculated as the range of years multiplied by 0.05 ( ), 0. ( ) and 0.4 ( ). estimators of the unknown mean function, when that function is continuous and sufficiently smooth. For further information on these methods, we refer to reader to the monographs by Wand and Jones (1995) and Fan and Gijbels (1996). 3 Spline methods In the previous section, the unknown mean function was assumed to be locally well approximated by a polynomial, which led to local polynomial regression. An alternative approach is to represent the fit as a piecewise polynomial, with the pieces connecting at points called knots. Once the knots are selected, such an estimator can be computed globally in a manner similar to that for a parametrically specified mean function, as will be explained below. A fitted mean function represented by a piecewise continuous curve only rarely provides a satisfactory fit, however, so that usually the function and at least its first derivative are constrained to be continuous everywhere, with only the second or higher derivatives allowed to be discontinuous at the knots. For historical reasons, these constrained piecewise polynomials are referred to as splines, leading to 3

the name spline regression or spline smoothing for this type of nonparametric regression. Consider the following simple type of polynomial spline of degree p: 0 + 1 x+ + p x p + K p+k (x κ k ) p +, (3) where p 1, κ 1,...,κ K are the knots and ( ) p + = max{( )p,0}. Clearly, (3) has continuous derivatives up to degree (p 1), but the pth derivative can be discontinuous at the knots. Model (3) is constructed as a linear combination of basis functions 1,x,...,x p,(x κ 1 ) p +,...,(x κ K) p +. This basis is referred to as the truncated power basis. A popular set of basis functions are the so-called B- splines. Unlike the truncated power splines, the B-splines have compact support and are numerically more stable, but they span the same function space. In what follows, we will write ψ j (x),j = 1,...,J for a set of (generic) basis functions used in fitting regression splines, and replace (3) by 1 ψ 1 (x)+ + J ψ J (x). For fixed knots, a regression spline is linear in the unknown parameters = ( 1,..., J ) T and can be fitted parametrically using least squares techniques. Under the homoskedastic model described in Section 1, the regression spline estimator for f(x) is obtained by solving ˆ = argmin y i k=1 j ψ j (x i ) and setting ˆf(x) = J ˆ j ψ j (x). Since deviations from the parametric shape can only occur at the knots, the amount of smoothing is determined by the degreeofthe basis and the location andnumber ofknots. In practice, the degree isfixed (with p = 1,or3ascommonchoices)and the knotlocationsareusually chosen to be equally-spaced over the range of the data or placed at regularly spaced data quantiles. Hence, the number of knots K is the only remaining smoothing parameter for the spline regression estimator. As K (and therefore J) is chosen to be larger, increasingly flexible estimators for f( ) are produced. This reduces the potential bias due to approximating the unknown mean function by a spline function, but increases the variability of the estimators. The smoothing spline estimator is an important extension of the regression spline estimator. The smoothing spline estimator for f( ) for a set of data generated by the statistical model described in Section 1 is defined as the minimizer of bx (y i f(x i )) +λ (f (p) (t)) dt, (5) a x over the set of all functions f( ) with continuous (p 1)th derivative and square integrable pth derivative, and λ > 0 is a constant determining the degree of smoothness of the estimator. Larger values of λ correspond to smoother fits. The choice p = leads to the popular cubic smoothing splines. While not (4) 4

immediately obvious from the definition, the function minimizing (5) is exactly equal to a special type of regression spline with knots at each of the observation points x 1,...,x n (assuming each of the locations x i is unique). Traditional regression spline fitting as in (4) is usually done using a relatively small number of knots. By construction, smoothing splines use a large number of knots (typically, n knots), but the smoothness of the function is controlled by a penalty term and the smoothing parameter λ. The penalized spline estimator represents a compromise between these two approaches. It uses a moderate number of knots and puts a penalty on the coefficients of the basis functions. Specifically, a simple type of penalized spline estimator for m( ) is obtained by solving ˆ = argmin y i j ψ j (x i ) +λ j (6) and setting ˆf λ (x) = J ˆ j ψ j (x) as for regression splines. Penalized splines combine the advantage of a parametric fitting method, as for regression splines, with the flexible adjustment of the degree of smoothness as in smoothing splines. Both the basis function and the exact form of the penalization of the coefficients can be varied to accommodate a large range of regression settings. Spline-based regression methods are extensively described in the statistical literature. While the theoretical properties of (unpenalized) regression splines and smoothing splines are well established, results for penalized regression splines have only recently become available. The monographs by Wahba (1990), Eubank (1999) and Ruppert et al. (003) are good sources of information on spline-based methods. Acknowledgements Based on an article from Lovric, Miodrag (011), International Encyclopedia of Statistical Science. Heidelberg: Springer Science +Business Media, LLC. References Eubank, R. L. (1999). Nonparametric Regression and Spline Smoothing (nd ed.). New York: Marcel Dekker. Fan, J. and I. Gijbels(1996). Local Polynomial Modelling and its Applications. London: Chapman & Hall. Ruppert, D., M. P. Wand, and R. J. Carroll (003). Semiparametric Regression. Cambridge, UK: Cambridge University Press. Wahba, G. (1990). Spline models for observational data. SIAM [Society for Industrial and Applied Mathematics]. Wand, M. P. and M. C. Jones (1995). Kernel Smoothing. London: Chapman and Hall. 5