P-spline ANOVA-type interaction models for spatio-temporal smoothing

Size: px
Start display at page:

Download "P-spline ANOVA-type interaction models for spatio-temporal smoothing"

Transcription

1 P-spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee and María Durbán Universidad Carlos III de Madrid Department of Statistics IWSM Utrecht 2008 D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

2 Outline 1 Motivation 2 Penalized splines for Spatio-Temporal data 3 ANOVA-Type Interaction Models 4 Application to O 3 pollution in Europe 5 Conclusions D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

3 1. Motivation Air pollution Enviromental policies Monitoring networks: European Environmental Agency (EEA) EMEP project (European Monitoring and Evaluation Programme) Ozone (O 3 ) is currently one of the air pollutants of most concern in Europe. D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

4 Monitoring stations across Europe sample of 45 monitoring stations Monitoring station D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

5 O 3 time series plot for selected locations Seasonal pattern: O Spain Finland France UK time D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

6 O 3 level from 01/2004 to 12/2005 Play animation ene D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

7 1. Motivation Spatio-temporal data Response variable, y ijt measured over geographical locations, s = (x i, x j ), with i, j = 1,.., n and over time periods, xt, for t = 1,..., T ISSUE: huge amount of data available e.g. : Environmental data, epidemiologic studies, disease mapping applications,... Smoothing techniques: Study spatial and temporal trends. Space and time interactions. Penalized Splines (Eilers and Marx, 1996). D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

8 2. Penalized splines The flexible smoother Methodology: Given the data (xi, y i ), i = 1,..., n. Fit a sum of local basis functions: f (xi ) = Bθ Minimize the Penalized Sum of Squares: y i f (x i ) 2 + Penalty The Penalty controls the smoothness of the fit. Smoothing parameter: λ Apply a discrete penalty over coefficients θ, e.g. in 1d: P = λd D where D is a difference matrix acting on θ. D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

9 2. Penalized splines The flexible smoother For array data (Currie et al., 2006): Generalized Linear Array Methods (GLAM): f (x 1,..., x d ) = Bθ where B is the Kronecker product of d B-splines basis: B = B 1 B 2... B d Efficient Algorithms for smoothing on multidimensional grids (e.g. mortality data, images, etc...). Easy representation as a Mixed Model: f (x 1,..., x d ) = Xβ + Zα D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

10 2. Penalized splines Example of GLAM: 3d-case: f (x 1, x 2, x 3 ) = Bθ Basis: B = B 1 B 2 B 3 θ can be expressed as a 3d-array A = {θ}ijk of dim. c 1 c 2 c 3 θ (1,1,c3 ) θ (1,c2,c 3 ) layer 1,...,c 3 columns θ (1,c2,1) θ (1,1,1) 1,...,c 2 rows 1,...,c 1 θ (c1,1,c 3 ) θ (c1,c 2,c 3 ) θ (c1,1,1) θ (c1,c 2,1) D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

11 3d Penalty matrix: Set penalties over the 3d-array A: P = λ 1 D 1D 1 I c2 I c3 +λ 2 I c1 D 2D 2 I c3 +λ t I c1 I c2 D td t }{{}}{{}}{{} row-wise column-wise layer-wise For spatio-temporal data: f ( longitude, latitude, time) }{{} Space Spatial anisotropy (λ 1 λ 2), different amount of smoothing for latitude and longitude. Temporal smoothing (λ t) Space-time interaction. However spatial data are not over a regular grid. D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

12 2. Penalized splines Scattered data smoothing For scattered data, Eilers et al. (2006), propose: Row-wise Kronecker product or Box-Product of B-spline basis. Def. Box-Product: B 1 B 2 = (B 1 1 c 2 ) (1 c 1 B 2 ) where is the element-wise product. We propose the use of for spatial data: Although spatial data are not over a grid, the coefficients θ can be expressed in array form. Choose a moderate number of knots to cover the spatial domain. D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

13 2. Penalized splines Spatio-Temporal data smoothing For spatio-temporal data, we propose: Spatio-temporal B-splines Basis: B = B s B t, of dim. nt c 1c 2c 3 where B s is the spatial B-spline basis (B 1 B 2 ) and B t is the B-spline basis for time of dim. t c 3. Note that: GLAM framework Mixed models ( ) D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

14 2. Penalized splines Mixed Models representation Reparameterize the basis B and coefficients θ: Bθ = Xβ + Zα Currie et al. (2006), use the Singular Value Decomposition (SVD) over the Penalty P, i.e.: D D = [U n : U s] [ 0q Σ ] [ U n U s ] The Penalty becomes (blockdiagonal), F = λ Σ Standard mixed model theory (REML) D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

15 3. ANOVA-Type Interaction Models Smooth-ANOVA decomposition models Chen (1993), Gu (2002): Smoothing-Spline ANOVA (SS-ANOVA). Interpretation as main effects and interactions. Models of type: ŷ = f (x 1 ) + f (x 2 ) + f (x t ) Main/additive effects + f (x 1, x 2 ) + f (x 1, x t ) + f (x 2, x t ) 2-way interactions + f (x 1, x 2, x t ) 3-way interactions PROBLEM: basis dimension ( curse of dimensionality ) D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

16 3. ANOVA-Type Interaction Models Smooth-ANOVA decomposition models We propose ANOVA-Type models: Computationally efficient methodology based on low-rank P-splines and GLAM. For Spatio-temporal smoothing: Interpretation as: main spatial and temporal effects, spatial 2d effects (anisotropy) and space-time interaction Our approach is based on: SVD properties and the mixed model representation D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

17 ANOVA-Type Interaction models: 3d model: f (x 1, x 2, x t ) with basis: B = B s B t and smoothing parameters (λ 1, λ 2, λ t ), can be decomposed as: f (x 1 ) + f (x 2 ) + f (x t ) + f (x 1, x 2 ) f (x 1, x 2, x t ) Reformulate as a mixed model and expand the basis X and Z. D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

18 Expand X and Z Basis main effects 2-way interact. 3-way interact. X columns x 1 : x 2 : x 3 (x 1, x 2 ) : (x 2, x 3 ) : (x 1, x 3 ) (x 1, x 2, x 3 ) Z blocks Penalty F blockdiag λ 1, λ 2, λ t and Σ 1, Σ 2, Σ t D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

19 Full-ANOVA-type model: f (x 1 ) + f (x 2 ) + f (x t ) + f (x 1, x 2 ) f (x 1, x 2, x t ) different λ s for each smooth f ( ), with basis B = [ B 1s 1 t : B 2s 1 t : 1 n B t : B s 1 t : B s B t ] However now, B is NOT full column-rank ( linear dependency ) Model is NOT identifiable The mixed model representation and the expansion of X and Z, allow us to identify the constraints to impose in order to maintain the identifiability of the model. In P-splines context: constraints are applied over regression coefficients θ i,j,k D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

20 Equivalent as in a 3-way factorial design main effects: i θ (1) i = j θ (2) j = t θ (3) t = 0 2-way interactions: i,j θ (1,2) ij = i,t θ (2,3) it = j,t θ (1,3) jt = 0 3-way interactions: i,j,t θ (1,2,3) ijt = 0 D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

21 Centering and scaling matrix: (I c 11 /c) θ (1,1,c3 ) θ (1,c2,c 3 ) layer 1,...,c 3 columns θ (1,1,1) 1,...,c 2 θ (1,c2,1) rows 1,...,c 1 θ (c1,1,c 3 ) θ (c1,c 2,c 3 ) θ (c1,1,1) θ (c1,c 2,1) D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

22 4. Application to O 3 pollution in Europe Data and models Sample of 45 monitoring stations Monthly averages of O 3 levels (in ug/m3 units) from january 1999 to december 2005 (t = 1,..., 84) Models: Additive: Spatial 2d + time: ANOVA-type: 3d model: f (x 1, x 2, t) f (x 1, x 2 ) + f (t) f (x 1) f (x 1, x 2) f (x 1, x 2, t) ANOVA: f (x 1, x 2) + f (t) + f (x 1, x 2, t) D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

23 4. Application to O 3 pollution in Europe Summary of results: Model AIC ED Num. of λ s 2d space + time = 3 3d ANOVA = 6 Better AIC values for ANOVA model Effective Dimension: Trace of Hat matrix D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

24 Spatial 2d + time: f (x 1, x 2 ) + f (t) f(time) time Space-time interaction is not considered time smooth trend is additive D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

25 ANOVA Space-Time Interaction Model Play animation ŷ = f (x 1, x 2 ) + f (t) + f (x 1, x 2, t) D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

26 5. Conclusions P-splines as unified framework: Flexible multidimensional smoothing (mixed models) Low-rank Basis GLAM for spatial and spatio-temporal data ANOVA-type models: Interpretation as additive plus interactions smooth functions Identify which constraints to apply for model identifiability More complex structures: Incorporation of additional covariates with its interactions (e.g. year-month). D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

27 THANKS FOR YOUR ATTENTION!!! D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

28 References P-splines : Eilers, PHC. and Marx, BD. Stat. Sci. (1996), 11: Eilers, PHC., Currie, ID. and Durbán, M. CSDA (2006), 50(1): Currie, ID., Durbán M. and Eilers, PHC. JRSSB (2006), 68:1-22. SS-ANOVA: Chen, Z. JRSSB (1993), 55: Gu, C. Springer (2002) D.-J. Lee and M. Durban (UC3M) P-spline ANOVA-type models IWSM / 26

A fast Mixed Model B-splines algorithm

A fast Mixed Model B-splines algorithm A fast Mixed Model B-splines algorithm arxiv:1502.04202v1 [stat.co] 14 Feb 2015 Abstract Martin P. Boer Biometris WUR Wageningen The Netherlands martin.boer@wur.nl February 17, 2015 A fast algorithm for

More information

Smoothing and Forecasting Mortality Rates with P-splines. Iain Currie. Data and problem. Plan of talk

Smoothing and Forecasting Mortality Rates with P-splines. Iain Currie. Data and problem. Plan of talk Smoothing and Forecasting Mortality Rates with P-splines Iain Currie Heriot Watt University Data and problem Data: CMI assured lives : 20 to 90 : 1947 to 2002 Problem: forecast table to 2046 London, June

More information

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica Boston-Keio Workshop 2016. Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica... Mihoko Minami Keio University, Japan August 15, 2016 Joint work with

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

Fast and compact smoothing on large multidimensional grids

Fast and compact smoothing on large multidimensional grids Computational Statistics & Data Analysis 50 (2006) 61 76 www.elsevier.com/locate/csda Fast and compact smoothing on large multidimensional grids Paul H.C. Eilers a,, Iain D. Currie b, Maria Durbán c a

More information

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Daniel K. Heersink 1, Reinhard Furrer 1, and Mike A. Mooney 2 arxiv:1302.4631v3

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

Nonparametric Mixed-Effects Models for Longitudinal Data

Nonparametric Mixed-Effects Models for Longitudinal Data Nonparametric Mixed-Effects Models for Longitudinal Data Zhang Jin-Ting Dept of Stat & Appl Prob National University of Sinagpore University of Seoul, South Korea, 7 p.1/26 OUTLINE The Motivating Data

More information

Straightforward intermediate rank tensor product smoothing in mixed models

Straightforward intermediate rank tensor product smoothing in mixed models Straightforward intermediate rank tensor product smoothing in mixed models Simon N. Wood, Fabian Scheipl, Julian J. Faraway January 6, 2012 Abstract Tensor product smooths provide the natural way of representing

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Al Kadiri M. A. Abstract Benefits of using Ranked Set Sampling (RSS) rather than Simple Random Sampling (SRS) are indeed significant

More information

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K. A toolbo of smooths Simon Wood Mathematical Sciences, University of Bath, U.K. Smooths for semi-parametric GLMs To build adequate semi-parametric GLMs requires that we use functions with appropriate properties.

More information

Lecture 22 The Generalized Lasso

Lecture 22 The Generalized Lasso Lecture 22 The Generalized Lasso 07 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Due today Problem Set 7 - Available now, please hand in by the 16th Motivation Today

More information

Analyzing Longitudinal Data Using Regression Splines

Analyzing Longitudinal Data Using Regression Splines Analyzing Longitudinal Data Using Regression Splines Zhang Jin-Ting Dept of Stat & Appl Prob National University of Sinagpore August 18, 6 DSAP, NUS p.1/16 OUTLINE Motivating Longitudinal Data Parametric

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Multidimensional Penalized Signal Regression

Multidimensional Penalized Signal Regression Multidimensional Penalized Signal Regression Brian D. Marx Department of Experimental Statistics Louisiana State University Baton Rouge, LA 70803 USA (bmarx@lsu.edu) Paul H. C. Eilers Department of Medical

More information

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs, GAMMs and other penalied GLMs using mgcv in R Simon Wood Mathematical Sciences, University of Bath, U.K. Simple eample Consider a very simple dataset relating the timber volume of cherry trees to

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

Median and Extreme Ranked Set Sampling for penalized spline estimation

Median and Extreme Ranked Set Sampling for penalized spline estimation Appl Math Inf Sci 10, No 1, 43-50 (016) 43 Applied Mathematics & Information Sciences An International Journal http://dxdoiorg/1018576/amis/10014 Median and Extreme Ranked Set Sampling for penalized spline

More information

Time Series Analysis DM 2 / A.A

Time Series Analysis DM 2 / A.A DM 2 / A.A. 2010-2011 Time Series Analysis Several slides are borrowed from: Han and Kamber, Data Mining: Concepts and Techniques Mining time-series data Lei Chen, Similarity Search Over Time-Series Data

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Automatic Singular Spectrum Analysis for Time-Series Decomposition

Automatic Singular Spectrum Analysis for Time-Series Decomposition Automatic Singular Spectrum Analysis for Time-Series Decomposition A.M. Álvarez-Meza and C.D. Acosta-Medina and G. Castellanos-Domínguez Universidad Nacional de Colombia, Signal Processing and Recognition

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines. Robert J. Hill and Michael Scholz

Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines. Robert J. Hill and Michael Scholz Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines Robert J. Hill and Michael Scholz Department of Economics University of Graz, Austria OeNB Workshop Vienna,

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Introduction to ungroup

Introduction to ungroup Introduction to ungroup Marius D. Pascariu, Maciej J. Dańko, Jonas Schöley and Silvia Rizzi September 1, 2018 1 Abstract The ungroup R package introduces a versatile method for ungrouping histograms (binned

More information

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Genevera I. Allen Department of Statistics, Rice University, Department of Pediatrics-Neurology, Baylor College of Medicine,

More information

Sparse & Functional Principal Components Analysis

Sparse & Functional Principal Components Analysis Sparse & Functional Principal Components Analysis Genevera I. Allen Department of Statistics and Electrical and Computer Engineering, Rice University, Department of Pediatrics-Neurology, Baylor College

More information

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is...

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is... 1D Regression i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is... 1 Non-linear problems What if the underlying function is

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

SOM+EOF for Finding Missing Values

SOM+EOF for Finding Missing Values SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and

More information

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 25 Overview 1 Background to the book 2 A motivating example from my own

More information

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24 Overview 1 Background to the book 2 Crack growth example 3 Contents

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

Introduction. Product List. Design and Functionality 1/10/2013. GIS Seminar Series 2012 Division of Spatial Information Science

Introduction. Product List. Design and Functionality 1/10/2013. GIS Seminar Series 2012 Division of Spatial Information Science Introduction Open GEODA GIS Seminar Series 2012 Division of Spatial Information Science University of Tsukuba H.Malinda Siriwardana The GeoDa Center for Geospatial Analysis and Computation develops state

More information

Splines, Knots, and Penalties

Splines, Knots, and Penalties Splines, Knots, and Penalties Paul H. C. Eilers Department of Biostatistics Erasmus Medical Centre Rotterdam, The Netherlands (p.eilers@erasmusmc.nl) Brian D. Marx Department of Experimental Statistics

More information

Knot-Placement to Avoid Over Fitting in B-Spline Scedastic Smoothing. Hirokazu Yanagihara* and Megu Ohtaki**

Knot-Placement to Avoid Over Fitting in B-Spline Scedastic Smoothing. Hirokazu Yanagihara* and Megu Ohtaki** Knot-Placement to Avoid Over Fitting in B-Spline Scedastic Smoothing Hirokazu Yanagihara* and Megu Ohtaki** * Department of Mathematics, Faculty of Science, Hiroshima University, Higashi-Hiroshima, 739-8529,

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Chapter 5: Basis Expansion and Regularization

Chapter 5: Basis Expansion and Regularization Chapter 5: Basis Expansion and Regularization DD3364 April 1, 2012 Introduction Main idea Moving beyond linearity Augment the vector of inputs X with additional variables. These are transformations of

More information

Generalized additive models I

Generalized additive models I I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often

More information

Variability in Annual Temperature Profiles

Variability in Annual Temperature Profiles Variability in Annual Temperature Profiles A Multivariate Spatial Analysis of Regional Climate Model Output Tamara Greasby, Stephan Sain Institute for Mathematics Applied to Geosciences, National Center

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Estimating 3D Respiratory Motion from Orbiting Views

Estimating 3D Respiratory Motion from Orbiting Views Estimating 3D Respiratory Motion from Orbiting Views Rongping Zeng, Jeffrey A. Fessler, James M. Balter The University of Michigan Oct. 2005 Funding provided by NIH Grant P01 CA59827 Motivation Free-breathing

More information

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3 Cubic Splines a knots: a< 1 < 2 < < m

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

Local spatial-predictor selection

Local spatial-predictor selection University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2013 Local spatial-predictor selection Jonathan

More information

Algorithms for LTS regression

Algorithms for LTS regression Algorithms for LTS regression October 26, 2009 Outline Robust regression. LTS regression. Adding row algorithm. Branch and bound algorithm (BBA). Preordering BBA. Structured problems Generalized linear

More information

Theoretical and Practical Aspects of Penalized Spline Smoothing

Theoretical and Practical Aspects of Penalized Spline Smoothing Theoretical and Practical Aspects of Penalized Spline Smoothing Dissertation zur Erlangung des Grades eines Doktors der Wirtschaftswissenschaften (Dr rer pol) der Fakultät für Wirtschaftswissenschaften

More information

Spatial Variation of Sea-Level Sea level reconstruction

Spatial Variation of Sea-Level Sea level reconstruction Spatial Variation of Sea-Level Sea level reconstruction Biao Chang Multimedia Environmental Simulation Laboratory School of Civil and Environmental Engineering Georgia Institute of Technology Advisor:

More information

Reviewer Profiling Using Sparse Matrix Regression

Reviewer Profiling Using Sparse Matrix Regression Reviewer Profiling Using Sparse Matrix Regression Evangelos E. Papalexakis, Nicholas D. Sidiropoulos, Minos N. Garofalakis Technical University of Crete, ECE department 14 December 2010, OEDM 2010, Sydney,

More information

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current

More information

Package freeknotsplines

Package freeknotsplines Version 1.0.1 Date 2018-05-17 Package freeknotsplines June 10, 2018 Title Algorithms for Implementing Free-Knot Splines Author , Philip Smith , Pierre Lecuyer

More information

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,

More information

Linear penalized spline model estimation using ranked set sampling technique

Linear penalized spline model estimation using ranked set sampling technique Hacettepe Journal of Mathematics and Statistics Volume 46 (4) (2017), 669 683 Linear penalized spline model estimation using ranked set sampling technique Al Kadiri M A Abstract Benets of using Ranked

More information

INLA: an introduction

INLA: an introduction INLA: an introduction Håvard Rue 1 Norwegian University of Science and Technology Trondheim, Norway May 2009 1 Joint work with S.Martino (Trondheim) and N.Chopin (Paris) Latent Gaussian models Background

More information

davidr Cornell University

davidr Cornell University 1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent

More information

Introduction to Trajectory Clustering. By YONGLI ZHANG

Introduction to Trajectory Clustering. By YONGLI ZHANG Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem

More information

Fosca Giannotti et al,.

Fosca Giannotti et al,. Trajectory Pattern Mining Fosca Giannotti et al,. - Presented by Shuo Miao Conference on Knowledge discovery and data mining, 2007 OUTLINE 1. Motivation 2. T-Patterns: definition 3. T-Patterns: the approach(es)

More information

Deposited on: 07 September 2010

Deposited on: 07 September 2010 Lee, D. and Shaddick, G. (2007) Time-varying coefficient models for the analysis of air pollution and health outcome data. Biometrics, 63 (4). pp. 1253-1261. ISSN 0006-341X http://eprints.gla.ac.uk/36767

More information

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o

More information

Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June

Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June for the Austrian housing market, June 14 2012 Ao. Univ. Prof. Dr. Fachbereich Stadt- und Regionalforschung Technische Universität Wien Dr. Strategic Risk Management Bank Austria UniCredit, Wien Inhalt

More information

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11

More information

Statistical Modeling with Spline Functions Methodology and Theory

Statistical Modeling with Spline Functions Methodology and Theory This is page 1 Printer: Opaque this Statistical Modeling with Spline Functions Methodology and Theory Mark H. Hansen University of California at Los Angeles Jianhua Z. Huang University of Pennsylvania

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Surface Approximation and Interpolation via Matrix SVD

Surface Approximation and Interpolation via Matrix SVD Surface Approximation and Interpolation via Matrix SVD Andrew E. Long and Clifford A. Long Andy Long (longa@nku.edu) is at Northern Kentucky University. He received his Ph.D. in applied mathematics at

More information

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks Series Prediction as a Problem of Missing Values: Application to ESTSP7 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Abstract In this paper, time series prediction is considered as

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

PARAMETERIZATION AND SAMPLING DESIGN FOR WATER NETWORKS DEMAND CALIBRATION USING THE SINGULAR VALUE DECOMPOSITION: APPLICATION TO A REAL NETWORK

PARAMETERIZATION AND SAMPLING DESIGN FOR WATER NETWORKS DEMAND CALIBRATION USING THE SINGULAR VALUE DECOMPOSITION: APPLICATION TO A REAL NETWORK 11 th International Conference on Hydroinformatics HIC 2014, New York City, USA PARAMETERIZATION AND SAMPLING DESIGN FOR WATER NETWORKS DEMAND CALIBRATION USING THE SINGULAR VALUE DECOMPOSITION: APPLICATION

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

Comparing different interpolation methods on two-dimensional test functions

Comparing different interpolation methods on two-dimensional test functions Comparing different interpolation methods on two-dimensional test functions Thomas Mühlenstädt, Sonja Kuhnt May 28, 2009 Keywords: Interpolation, computer experiment, Kriging, Kernel interpolation, Thin

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Essentials for Modern Data Analysis Systems

Essentials for Modern Data Analysis Systems Essentials for Modern Data Analysis Systems Mehrdad Jahangiri, Cyrus Shahabi University of Southern California Los Angeles, CA 90089-0781 {jahangir, shahabi}@usc.edu Abstract Earth scientists need to perform

More information

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example

More information

BASIC LOESS, PBSPLINE & SPLINE

BASIC LOESS, PBSPLINE & SPLINE CURVES AND SPLINES DATA INTERPOLATION SGPLOT provides various methods for fitting smooth trends to scatterplot data LOESS An extension of LOWESS (Locally Weighted Scatterplot Smoothing), uses locally weighted

More information

TECHNICAL REPORT NO December 11, 2001

TECHNICAL REPORT NO December 11, 2001 DEPARTMENT OF STATISTICS University of Wisconsin 2 West Dayton St. Madison, WI 5376 TECHNICAL REPORT NO. 48 December, 2 Penalized Log Likelihood Density Estimation, via Smoothing-Spline ANOVA and rangacv

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models

More information

Alternative Statistical Methods for Bone Atlas Modelling

Alternative Statistical Methods for Bone Atlas Modelling Alternative Statistical Methods for Bone Atlas Modelling Sharmishtaa Seshamani, Gouthami Chintalapani, Russell Taylor Department of Computer Science, Johns Hopkins University, Baltimore, MD Traditional

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Fitting latency models using B-splines in EPICURE for DOS

Fitting latency models using B-splines in EPICURE for DOS Fitting latency models using B-splines in EPICURE for DOS Michael Hauptmann, Jay Lubin January 11, 2007 1 Introduction Disease latency refers to the interval between an increment of exposure and a subsequent

More information

Modern Multidimensional Scaling

Modern Multidimensional Scaling Ingwer Borg Patrick J.F. Groenen Modern Multidimensional Scaling Theory and Applications Second Edition With 176 Illustrations ~ Springer Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

Recovery of Piecewise Smooth Images from Few Fourier Samples

Recovery of Piecewise Smooth Images from Few Fourier Samples Recovery of Piecewise Smooth Images from Few Fourier Samples Greg Ongie*, Mathews Jacob Computational Biomedical Imaging Group (CBIG) University of Iowa SampTA 2015 Washington, D.C. 1. Introduction 2.

More information

Generalized additive models for large data sets

Generalized additive models for large data sets Appl. Statist. (2015) Generalized additive models for large data sets Simon N. Wood, University of Bath, UK Yannig Goude Electricité de France, Clamart, France and Simon Shaw University of Bath, UK [Received

More information

Package lmesplines. R topics documented: February 20, Version

Package lmesplines. R topics documented: February 20, Version Version 1.1-10 Package lmesplines February 20, 2015 Title Add smoothing spline modelling capability to nlme. Author Rod Ball Maintainer Andrzej Galecki

More information

Modern Multidimensional Scaling

Modern Multidimensional Scaling Ingwer Borg Patrick Groenen Modern Multidimensional Scaling Theory and Applications With 116 Figures Springer Contents Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional Scaling

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

CoxFlexBoost: Fitting Structured Survival Models

CoxFlexBoost: Fitting Structured Survival Models CoxFlexBoost: Fitting Structured Survival Models Benjamin Hofner 1 Institut für Medizininformatik, Biometrie und Epidemiologie (IMBE) Friedrich-Alexander-Universität Erlangen-Nürnberg joint work with Torsten

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Climate Precipitation Prediction by Neural Network

Climate Precipitation Prediction by Neural Network Journal of Mathematics and System Science 5 (205) 207-23 doi: 0.7265/259-529/205.05.005 D DAVID PUBLISHING Juliana Aparecida Anochi, Haroldo Fraga de Campos Velho 2. Applied Computing Graduate Program,

More information