GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K.

Size: px
Start display at page:

Download "GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K."

Transcription

1 GAMs, GAMMs and other penalied GLMs using mgcv in R Simon Wood Mathematical Sciences, University of Bath, U.K.

2 Simple eample Consider a very simple dataset relating the timber volume of cherry trees to their height and trunk girth Girth Height Volume

3 trees initial gam fit A possible model is log(µ i ) = f 1 (Height i )+f 2 (Girth i ), Volume i Gamma(µ i, φ) Which can be estimated using... library(mgcv) ct1 <- gam(volume ~ s(height) + s(girth), family=gamma(link=log),data=trees) gam produces a representation for the smooth functions, and estimates them along with their degree of smoothness.

4 Cherry tree results par(mfrow=c(1,2)) plot(ct1,various options) ## calls plot.gam s(height,1) s(girth,2.51) Height This is a simple eample of a Generalied Additive Model. Given the machinery for fitting GAMs a wide variety of other models can also be estimated. Girth

5 The general model Let y be a response variable, f j a smooth function of predictors j and L ij a linear functional. A general model... g(µ i ) = A i α + j L ij f j + Z i b, b N(0, Σ θ ), y i EF(µ i, φ) A and Z are model matrices, g is a known link function. Popular eamples are... g(µ i ) = A i α + j f j( ji ) (Generalied additive model) g(µ i ) = A i α + j f j( ji ) + Z i b, b N(0, Σ θ ) (GAMM) g(µ i ) = h i ()f ()d (Signal regression model) g(µ i ) = j f j(t ji ) ji (Varying coefficient model) The list goes on...

6 Representing the f j A convenient model representation arises by approimating each function with a basis epansion K f () = γ k b k () (b k k=1 known, e.g. spline) K is set to something generous to avoid bias. Some way of measuring the wiggliness of each f is also useful, for eample f () 2 d = γ T Sγ. (S known) For many models we will need an identifiability constraint, i f ( i) = 0. Automatic re-parameteriation handles this.

7 Basis & penalty eample y basis functions y f"()2 d = y f"()2 d = y f"()2 d =

8 Representing the model The general model is g(µ i ) = A i α + j L ij f j + Z i b, b N(0, Σ θ ), y i EF(µ i, φ) Given a basis for each f j, we can write j L ijf j = B i γ, where B is determined by the basis functions and L ij, while γ is a vector containing all the basis coefficients. Then the model becomes... g(µ i ) = A i α + B i γ + Z i b, b N(0, Σ θ ), y i EF(µ i, φ) Problem: if the bases were rich enough to give negligible bias, then MLE will overfit.

9 Estimation strategies 1. Control prediction error Penalie the model likelihood, using the f j wiggliness measures, j λ jγ T S j γ, to control model prediction error. Given λ, θ, φ, this yields the maimum penalied likelihood estimates (MPLE), ˆα, ˆγ, ˆb. GCV or similar gives ˆλ, ˆθ, ˆφ. 2. Empirical Bayes Put eponential ( priors on function wiggliness so that γ N 0, ( ) j λ js j ) /φ. Given λ, θ, φ the MAP estimates ˆα, ˆγ, ˆb, are found by MPLE as under 1. Maimie the marginal likelihood (REML) to obtain ˆλ, ˆθ, ˆφ.

10 Maimum Penalied Likelihood Estimation Define coefficient vector β T = (α T, γ T, b T ) Deviance is D(β) = 2{l ma l(β)} β is estimated by ˆβ = arg min D(β) + β j λ j β T S j β + β T Σ θ βφ is a ero padded version of Σ 1 θ so that β T Σ θ β = bt Σ 1 θ b. Similarly the S j here are ero padded versions of the originals. Σ θ In practice this optimiation is by Penalied IRLS.

11 Estimating λ etc We treat ˆβ as a function of λ, θ, φ for GCV/REML optimiation. GCV or REML are optimied numerically, with each step requiring evaluation of the corresponding ˆβ. Note that computationally the empirical Bayes approach is equivalent to generalied linear mied modelling. However in most applications it is not appropriate to treat the smooth functions as random quantities to be resampled from the prior at each replication of the data set. So the models are not frequentist random effects models.

12 More inference and degrees of freedom The Bayesian approach yields the large sample result β N( ˆβ, V β ) where V β is a penalied version of the information matri at ˆβ. Via an approach pioneered in Nychka (1988, JASA) it is possible to show that CI s based on this result have good frequentist properties. Effective degrees of freedom can also be computed. Let β denote the unpenalied MLE of β, EDF = j ˆβ j β j

13 Smooths for semi-parametric GLMs To build adequate semi-parametric GLMs requires that we use functions with appropriate properties. In one dimension there are several alternatives, and not alot to choose between them. In 2 or more dimensions there is a major choice to make. If the arguments of the smooth function are variables which all have the same units (e.g. spatial location variables) then an isotropic smooth may be appropriate. This will tend to ehibit the same degree of fleibility in all directions. If the relative scaling of the covariates of the smooth is essentially arbitrary (e.g. they are measured in different units), then scale invariant smooths should be used, which do not depend on this relative scaling.

14 Splines Many smooths covered here are based on splines. Here s the original underlying idea... wear sie Mathematically the red curve is the function minimiing (y i f ( i )) 2 + λ f () 2 d. i

15 Splines have variable stiffness Varying the fleibility of the strip (i.e. varying λ) changes the spline function curve. wear wear sie sie wear wear sie But irrespective of λ the spline functions always have the same basis. sie

16 Penalied regression splines Full splines have one basis function per data point. This is computationally wasteful, when penaliation ensures that the effective degrees of freedom will be much smaller than this. Penalied regression splines simply use fewer spline basis functions. There are two alternatives: 1. Choose a representative subset of your data (the knots ), and create the spline basis as if smoothing only those data. Once you have the basis, use it to smooth all the data. 2. Choose how many basis functions are to be used and then solve the problem of finding the set of this many basis functions that will optimally approimate a full spline. I ll refer to 1 as knot based and 2 as eigen based.

17 Knot based eample: "cr" In mgcv the "cr" basis is a knot based approimation to the minimier of i (y i f ( i )) 2 + λ f () 2 d a cubic spline. "cc" is a cyclic version. full spline basis b i() y data to smooth function estimate: full black, regression red simple regression spline basis s() b i()

18 Eigen based eample: "tp" The "tp", thin plate regression spline basis is an eigen approimation to a thin plate spline (including cubic spline in 1 dimension). full spline basis b i() y data to smooth function estimate: full black, regression red thin plate regression spline basis s() b i()

19 1 dimensional smoothing in mgcv Various 1D smoothers are available in mgcv... "cr" knot based cubic regression spline. "cc" cyclic version of above. "ps" Eilers and Mar style p-splines: Evenly spaced B-spline basis, with discrete penalty on coefficients. "ad" adaptive smoother in which strength of penalty varies with covariate. "tp" thin plate regression spline. Optimal low rank eigen appro. to a full spline: fleible order penalty derivative. Smooth classes can be added (?smooth.construct).

20 1D smooths compared accel tp cr ps cc times ad

21 Isotropic smooths One way of generaliing splines from 1D to several D is to turn the fleible strip into a fleible sheet (hyper sheet). This results in a thin plate spline. It is an isotropic smooth. Isotropy may be appropriate when different covariates are naturally on the same scale. In mgcv terms like s(,) generate such smooths linear predictor 0.4 linear predictor 0.4 linear predictor

22 Thin plate spline details In 2 dimensions a thin plate spline is the function minimiing {y i f ( i, i )} 2 + λ f 2 + 2f 2 + fdd 2 i This generalies to any number of dimensions, d, and any order of differential, m, such that 2m > d + 1. Full thin plate spline has n unknown coefficients. An optimal low rank eigen approimation is a thin plate regression spline. A t.p.r.s uses far fewer coefficients than a full spline, and is therefore computationally efficient, while losing little statistical performance.

23 Scale invariant smoothing: tensor product smooths Isotropic smooths assume that a unit change in one variable is equivalent to a unit change in another variable, in terms of function variability. When this is not the case, isotropic smooths can be poor. Tensor product smooths generalie from 1D to several D using a lattice of bendy strips, with different fleibility in different directions. f(,)

24 Tensor product smooths Carefully constructed tensor product smooths are scale invariant. Consider constructing a smooth of,. Start by choosing marginal bases and penalties, as if constructing 1-D smooths of and. e.g. f () = α i a i (), J (f ) = f () = β j b j (), f () 2 d = α T S α & J (f ) = B T S B

25 Marginal reparameteriation Suppose we start with f () = 6 i=1 β jb j (), on the left. f () f () We can always re-parameterie so that its coefficients are functions heights, at knots (right). Do same for f.

26 Making f depend on Can make f a function of by letting its coefficients vary smoothly with f() f(,)

27 The complete tensor product smooth Use f basis to let f coefficients vary smoothly (left). Construct in symmetric (see right). f(,) f(,)

28 Tensor product penalties - one per dimension -wiggliness: sum marginal penalties over red curves. -wiggliness: sum marginal penalties over green curves. f(,) f(,)

29 Isotropic vs. tensor product comparison Isotropic Thin Plate Spline Tensor Product Spline each figure smooths the same data. The only modification is that has been divided by 5 in the bottom row.

30 Soap film smoothing Sometimes we require a smoother defined over a bounded region, where it is important not to smooth across boundary features. A useful smoother can be constructed by considering a distorted soap film suspended from a fleible boundary wire y y y y y y

31 Markov Random fields Sometimes data are associated with discrete geographic regions. A neighbourhood structure on these regions can be used to define a Markov random field, which induces a simple penalty structure on region specific coefficients. s(district,2.71)

32 0 Dimensional smooths Having considered smoothing with respect to 1, 2 and several covariates, or a neighbourhood structure, consider the case where we want to smooth without reference to a covariate (or graph). This amounts to simply shrinking some model coefficients towards ero, as in ridge regression. In the contet of penalied GLMs such ero dimensional smooths are equivalent to simple Gaussian random effects.

33 Specifying GAMs in R with mgcv library(mgcv) loads a semi-parametric GLM package. gam(formula,family) is quite like glm. The family argument specifies the distribution and link function. e.g. Gamma(log). The response variable and linear predictor structure are specified in the model formula. Response and parametric terms eactly as for lm or glm. Smooth functions are specified by s or te terms. e.g. is specified by... log{e(y i )} = α + β i + f ( i ), y i Gamma, gam(y ~ + s(),family=gamma(link=log))

34 More specification: by variables Smooth terms can accept a by variable argument, which allows L ij f j terms other than just f j ( i ). e.g. E(y i ) = f 1 ( i )w i + f 2 (v i ), y i Gaussian, becomes gam(y ~ s(,by=w) + f(v)) i.e. f 1 ( i ) is multiplied by w i in the linear predictor. e.g. E(y i ) = f j ( i, i ) if factor g i is of level j, becomes gam(y ~ s(,,by=g) + g) i.e. there is one smooth function for each level of factor variable g, with each y i depending on just one of these functions.

35 Yet more specification: a summation convention s and te smooth terms accept matri arguments and by variables to implement general L ij f j terms. If X and L are n p matrices then s(x,by=l) evaluates L ij f j = k f (X ik)l ik for all i. For eample, consider data y i Poi where log{e(y i )} = k i ()f ()d 1 h p k i ( k )f ( k ) k=1 (the k are evenly spaced points). Let X ik = k i and L ik = k i ( k )/h. The model is fit by gam(y ~ s(x,by=l),poisson)

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K. A toolbo of smooths Simon Wood Mathematical Sciences, University of Bath, U.K. Smooths for semi-parametric GLMs To build adequate semi-parametric GLMs requires that we use functions with appropriate properties.

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

More advanced use of mgcv. Simon Wood Mathematical Sciences, University of Bath, U.K.

More advanced use of mgcv. Simon Wood Mathematical Sciences, University of Bath, U.K. More advanced use of mgcv Simon Wood Mathematical Sciences, University of Bath, U.K. Fine control of smoothness: gamma Suppose that we fit a model but a component is too wiggly. For GCV/AIC we can increase

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

Straightforward intermediate rank tensor product smoothing in mixed models

Straightforward intermediate rank tensor product smoothing in mixed models Straightforward intermediate rank tensor product smoothing in mixed models Simon N. Wood, Fabian Scheipl, Julian J. Faraway January 6, 2012 Abstract Tensor product smooths provide the natural way of representing

More information

Generalized additive models I

Generalized additive models I I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often

More information

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica

Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica Boston-Keio Workshop 2016. Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica... Mihoko Minami Keio University, Japan August 15, 2016 Joint work with

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines. Robert J. Hill and Michael Scholz

Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines. Robert J. Hill and Michael Scholz Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines Robert J. Hill and Michael Scholz Department of Economics University of Graz, Austria OeNB Workshop Vienna,

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

Package gamm4. July 25, Index 10

Package gamm4. July 25, Index 10 Version 0.2-5 Author Simon Wood, Fabian Scheipl Package gamm4 July 25, 2017 Maintainer Simon Wood Title Generalized Additive Mixed Models using 'mgcv' and 'lme4' Description

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September

More information

The mgcv Package. July 21, 2006

The mgcv Package. July 21, 2006 The mgcv Package July 21, 2006 Version 1.3-18 Author Simon Wood Maintainer Simon Wood Title GAMs with GCV smoothness estimation and GAMMs by REML/PQL

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

A review of spline function selection procedures in R

A review of spline function selection procedures in R Matthias Schmid Department of Medical Biometry, Informatics and Epidemiology University of Bonn joint work with Aris Perperoglou on behalf of TG2 of the STRATOS Initiative September 1, 2016 Introduction

More information

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR J. D. Maca July 1, 1997 Abstract The purpose of this manual is to demonstrate the usage of software for

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

arxiv: v1 [stat.ap] 14 Nov 2017

arxiv: v1 [stat.ap] 14 Nov 2017 How to estimate time-varying Vector Autoregressive Models? A comparison of two methods. arxiv:1711.05204v1 [stat.ap] 14 Nov 2017 Jonas Haslbeck Psychological Methods Group University of Amsterdam Lourens

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

Chapter 5: Basis Expansion and Regularization

Chapter 5: Basis Expansion and Regularization Chapter 5: Basis Expansion and Regularization DD3364 April 1, 2012 Introduction Main idea Moving beyond linearity Augment the vector of inputs X with additional variables. These are transformations of

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Hierarchical generalized additive models: an introduction with mgcv

Hierarchical generalized additive models: an introduction with mgcv Hierarchical generalized additive models: an introduction with mgcv Eric J Pedersen Corresp., 1, 2, David L. Miller 3, 4, Gavin L. Simpson 5, Noam Ross 6 1 Northwest Atlantic Fisheries Center, Fisheries

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

GAMs with integrated model selection using penalized regression splines and applications to environmental modelling.

GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Simon N. Wood a, Nicole H. Augustin b a Mathematical Institute, North Haugh, St Andrews,

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Lasso. November 14, 2017

Lasso. November 14, 2017 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................

More information

Lecture 7: Splines and Generalized Additive Models

Lecture 7: Splines and Generalized Additive Models Lecture 7: and Generalized Additive Models Computational Statistics Thierry Denœux April, 2016 Introduction Overview Introduction Simple approaches Polynomials Step functions Regression splines Natural

More information

Mixture models and clustering

Mixture models and clustering 1 Lecture topics: Miture models and clustering, k-means Distance and clustering Miture models and clustering We have so far used miture models as fleible ays of constructing probability models for prediction

More information

Semiparametric Tools: Generalized Additive Models

Semiparametric Tools: Generalized Additive Models Semiparametric Tools: Generalized Additive Models Jamie Monogan Washington University November 8, 2010 Jamie Monogan (WUStL) Generalized Additive Models November 8, 2010 1 / 33 Regression Splines Choose

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

P-spline ANOVA-type interaction models for spatio-temporal smoothing

P-spline ANOVA-type interaction models for spatio-temporal smoothing P-spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee and María Durbán Universidad Carlos III de Madrid Department of Statistics IWSM Utrecht 2008 D.-J. Lee and M. Durban (UC3M)

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Al Kadiri M. A. Abstract Benefits of using Ranked Set Sampling (RSS) rather than Simple Random Sampling (SRS) are indeed significant

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

arxiv: v2 [stat.ap] 14 Nov 2016

arxiv: v2 [stat.ap] 14 Nov 2016 The Cave of Shadows Addressing the human factor with generalized additive mixed models Harald Baayen a Shravan Vasishth b Reinhold Kliegl b Douglas Bates c arxiv:1511.03120v2 [stat.ap] 14 Nov 2016 a University

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

Smoothing and Forecasting Mortality Rates with P-splines. Iain Currie. Data and problem. Plan of talk

Smoothing and Forecasting Mortality Rates with P-splines. Iain Currie. Data and problem. Plan of talk Smoothing and Forecasting Mortality Rates with P-splines Iain Currie Heriot Watt University Data and problem Data: CMI assured lives : 20 to 90 : 1947 to 2002 Problem: forecast table to 2046 London, June

More information

arxiv: v1 [stat.me] 2 Jun 2017

arxiv: v1 [stat.me] 2 Jun 2017 Inference for Penalized Spline Regression: Improving Confidence Intervals by Reducing the Penalty arxiv:1706.00865v1 [stat.me] 2 Jun 2017 Ning Dai School of Statistics University of Minnesota daixx224@umn.edu

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,

More information

Curve fitting using linear models

Curve fitting using linear models Curve fitting using linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark September 28, 2012 1 / 12 Outline for today linear models and basis functions polynomial regression

More information

Parameterization of triangular meshes

Parameterization of triangular meshes Parameterization of triangular meshes Michael S. Floater November 10, 2009 Triangular meshes are often used to represent surfaces, at least initially, one reason being that meshes are relatively easy to

More information

GAM: The Predictive Modeling Silver Bullet

GAM: The Predictive Modeling Silver Bullet GAM: The Predictive Modeling Silver Bullet Author: Kim Larsen Introduction Imagine that you step into a room of data scientists; the dress code is casual and the scent of strong coffee is hanging in the

More information

A comparison of spline methods in R for building explanatory models

A comparison of spline methods in R for building explanatory models A comparison of spline methods in R for building explanatory models Aris Perperoglou on behalf of TG2 STRATOS Initiative, University of Essex ISCB2017 Aris Perperoglou Spline Methods in R ISCB2017 1 /

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016 CPSC 340: Machine Learning and Data Mining Regularization Fall 2016 Assignment 2: Admin 2 late days to hand it in Friday, 3 for Monday. Assignment 3 is out. Due next Wednesday (so we can release solutions

More information

A spatio-temporal model for extreme precipitation simulated by a climate model.

A spatio-temporal model for extreme precipitation simulated by a climate model. A spatio-temporal model for extreme precipitation simulated by a climate model. Jonathan Jalbert Joint work with Anne-Catherine Favre, Claude Bélisle and Jean-François Angers STATMOS Workshop: Climate

More information

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3 Cubic Splines a knots: a< 1 < 2 < < m

More information

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines EECS 556 Image Processing W 09 Interpolation Interpolation techniques B splines What is image processing? Image processing is the application of 2D signal processing methods to images Image representation

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Calibration and emulation of TIE-GCM

Calibration and emulation of TIE-GCM Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)

More information

Package bgeva. May 19, 2017

Package bgeva. May 19, 2017 Version 0.3-1 Package bgeva May 19, 2017 Author Giampiero Marra, Raffaella Calabrese and Silvia Angela Osmetti Maintainer Giampiero Marra Title Binary Generalized Extreme Value

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

INLA: Integrated Nested Laplace Approximations

INLA: Integrated Nested Laplace Approximations INLA: Integrated Nested Laplace Approximations John Paige Statistics Department University of Washington October 10, 2017 1 The problem Markov Chain Monte Carlo (MCMC) takes too long in many settings.

More information

CS 450 Numerical Analysis. Chapter 7: Interpolation

CS 450 Numerical Analysis. Chapter 7: Interpolation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24 Overview 1 Background to the book 2 Crack growth example 3 Contents

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey. Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

4.5 The smoothed bootstrap

4.5 The smoothed bootstrap 4.5. THE SMOOTHED BOOTSTRAP 47 F X i X Figure 4.1: Smoothing the empirical distribution function. 4.5 The smoothed bootstrap In the simple nonparametric bootstrap we have assumed that the empirical distribution

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN

More information

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is...

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is... 1D Regression i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is... 1 Non-linear problems What if the underlying function is

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June

Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June for the Austrian housing market, June 14 2012 Ao. Univ. Prof. Dr. Fachbereich Stadt- und Regionalforschung Technische Universität Wien Dr. Strategic Risk Management Bank Austria UniCredit, Wien Inhalt

More information

Logistic Regression and Gradient Ascent

Logistic Regression and Gradient Ascent Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence

More information

Chapter 18. Geometric Operations

Chapter 18. Geometric Operations Chapter 18 Geometric Operations To this point, the image processing operations have computed the gray value (digital count) of the output image pixel based on the gray values of one or more input pixels;

More information

3D Geometry and Camera Calibration

3D Geometry and Camera Calibration 3D Geometr and Camera Calibration 3D Coordinate Sstems Right-handed vs. left-handed 2D Coordinate Sstems ais up vs. ais down Origin at center vs. corner Will often write (u, v) for image coordinates v

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is: Economics 1660: Big Data PS 6: Regularization Prof. Daniel Björkegren PART A: (Source: HTF page 95) The Ridge regression problem is: : β "#$%& = argmin (y # β 2 x #4 β 4 ) 6 6 + λ β 4 #89 Consider the

More information

Comparing different interpolation methods on two-dimensional test functions

Comparing different interpolation methods on two-dimensional test functions Comparing different interpolation methods on two-dimensional test functions Thomas Mühlenstädt, Sonja Kuhnt May 28, 2009 Keywords: Interpolation, computer experiment, Kriging, Kernel interpolation, Thin

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Problem Set 4. Assigned: March 23, 2006 Due: April 17, (6.882) Belief Propagation for Segmentation

Problem Set 4. Assigned: March 23, 2006 Due: April 17, (6.882) Belief Propagation for Segmentation 6.098/6.882 Computational Photography 1 Problem Set 4 Assigned: March 23, 2006 Due: April 17, 2006 Problem 1 (6.882) Belief Propagation for Segmentation In this problem you will set-up a Markov Random

More information