Statistical Modeling with Spline Functions Methodology and Theory

Size: px

Start display at page:

Download "Statistical Modeling with Spline Functions Methodology and Theory"

Prosper Randall
5 years ago
Views:

1 This is page 1 Printer: Opaque this Statistical Modeling with Spline Functions Methodology and Theory Mark H. Hansen University of California at Los Angeles Jianhua Z. Huang University of Pennsylvania Charles Kooperberg Fred Hutchinson Cancer Research Center Charles J. Stone University of California at Berkeley Young K. Truong University of North Carolina at Chapel Hill Copyright c 2006 by M. H. Hansen, J. Z. Huang, C. Kooperberg, C. J. Stone, and Y. K. Truong January 5, 2006

2 2

3 Contents This is page 3 Printer: Opaque this 1 Introduction Overview Why do we end up using splines? Broad outline of methods used Broad outline of theory Chapter by chapter overview Background Other smoothing methods Some history Software Preliminaries What is a Spline? Polynomials Piecewise polynomials Splines B-splines: Definition Important properties of B-splines Function Approximation Properties of Polynomial Approximation Why Splines? Why B-splines? Distance from a function to a spline space Tensor products of splines

4 4 Contents Tensor products of linear spaces Tensor products of B-splines Approximation properties Concavity One-dimensional case Multi-dimensional case Checking concavity Existence and the uniqueness of the maximum Optimization Preview of the methods Gradient Methods steepest ascent Newton Raphson Method Quasi-Newton Method Conjugate Directions One-Dimensional Optimization step length search Step-halving How to terminate an iteration B-splines with repeated knots Polynomial interpolation Divided difference efficient way to evaluate the coefficients Divided differences with repetition Properties of the divided difference Computing the divided differences recursively B-Splines with repeated knots Basis: Curry Schoenberg Theorem Examples Interpolation Errors via divided difference Continuity of divided differences Partial derivatives of B-splines with respect to knot locations 75 3 Linear Models Examples Smoothing and extrinsic catastrophists Global warming and tree migration Regression modeling and approximation spaces Linear spaces and ordinary least squares The bias-variance tradeoff How smooth? Some simple model selection criteria Curve estimation From polynomials to splines Model error for fixed-knot splines Adaptive knot placement Multivariate models From multivariate polynomials to splines

5 Contents Model error and functional ANOVA Adaptation for multivariate splines A survey of multivariate spline methods Properties of spline estimates Knot spacing Boundary conditions Degrees of freedom associated with knot placement Representation and computation Selecting a basis Implementing stepwise addition Connection to smoothing splines A second look at the examples Assessing uncertainty in curve fitting Test set prediction error Multivariate responses Conclusion Generalized Linear Models Applications Health effects of particulate matter Obesity and urban sprawl GLMs and approximation spaces Conditional Likelihood for a GLM Canonical linear regression and approximation spaces Link functions Estimation and adaptation Quadratic approximations Application to GLMs A general methodology Polychotomous Regression and Multiple Classification An example The vowel data Background A Polyclass model for the vowel data The Polyclass methodology The Polyclass model Fitting Polyclass models Model selection Further analysis of the vowel data Applying Polyclass to large data sets The fruit data Analysis of cpu-time required for large data sets PolyMARS: A least squares approximation of the addition process

6 6 Contents Fitting Polyclass models with large data sets and many basis functions Further analysis of the fruit data Technical details of the Polyclass algorithm Maximum number of basis functions Optimizing the location of a new knot Notes Density Estimation An example The income data Background Logspline density estimation The Logspline methodology The Logspline model Basis functions Fitting Logspline models Knot selection How much to smooth: more examples Free knot splines and inference Free knot splines The bootstrap A comparison Censoring and truncation The Fyn diabetes data Implications for Logspline The Fyn diabetes data analyzed Multivariate density estimation Technical details Initial knot placement Stepwise addition for Logspline Numerical integration Constrained optimization Notes Survival Analysis An example The bone marrow transplant data Background Linear models for the conditional log-hazard function A Hare model for the bone marrow transplant data The Hare methodology The Hare model Allowable spaces Model selection

7 Contents Fitting Hare models Inference Further analysis of the bone marrow transplant data Does a simpler model fit the data? Partially linear Hare models Proportional hazards regression Extensions The Colorado Plateau uranium miners data Time-dependent covariates Left truncation Analysis of the Colorado Plateau uranium miners data Interval censored data Heft Severe censoring and the penalty parameter Technical details Numerical integration for Heft Notes Estimation of the Spectral Distribution An example The network data Background Mixed Spectra An Lspec model for the network data The Lspec methodology The Lspec model Model selection for Lspec models Further analysis of the network data Extensions Notes Multivariate Splines Preliminaries An application The methodology Bivariate spline spaces Maximum likelihood estimation A stepwise algorithm Stepwise addition Stepwise deletion The example revisited Simulation results Extensions Alternate Optimization Methods 399

8 8 Contents 10.1 Normal linear regression revisited Greedy methods and the 87 δ Sr data Results from an exhaustive search Bayesian Formulations A single linear space Many spaces Computation Connection with model selection criteria Theoretical justification Normal linear regression Prior specification Computation Extended linear models Logspline density estimation Triogram regression ELM Prior specification Computation Logspline density estimation Triogram regression Other optimization methods Simulated annealing Genetic algorithms Gradient descent machines Combining models Rates of Convergence in Extended Linear Modeling Theoretical Framework and Basic Results Extended Linear Models Consistency and Rates of Convergence ANOVA modeling The main result on rates of convergence Approximation Error Estimation Error Functional ANOVA Functional ANOVA Decompositions Construction of Model Space and Estimation Spaces using Functional ANOVA Rates of Convergence of ANOVA Components Verification of Technical Conditions Preliminaries Theoretical and Empirical Inner Products Generalized Regression Density Estimation Hazard Regression Notes

9 Contents 9 12 Extended Linear Modeling with Free Knot Splines Main Results Statement of Main Results Uniformity in Rates of Convergence Adaptive Parameter Selection Free Knot Splines and Their Tensor Products Verification of Technical Conditions Preliminary Lemmas Density Estimation Generalized Regression Proofs of Lemmas in Section

10 10 Contents

Statistical Modeling with Spline Functions Methodology and Theory

This is page 1 Printer: Opaque this Statistical Modeling with Spline Functions Methodology and Theory Mark H Hansen University of California at Los Angeles Jianhua Z Huang University of Pennsylvania Charles