Applied Statistics : Practical 9

Size: px
Start display at page:

Download "Applied Statistics : Practical 9"

Transcription

1 Applied Statistics : Practical 9 This practical explores nonparametric regression and shows how to fit a simple additive model. The first item introduces the necessary R commands for nonparametric regression using a simulated example. The second item uses the cars dataset in R, while data for the third and fourth items are available on the course website. 1. A simple simulated example We consider first a simple simulated example we will use to compare different methods to fit a regression function. set.seed(1122) # so we all get the same 'random' data x<-seq(0,1,len=100) # equally spaced grid of predictor values true.func<-sin(2*pi*x)+2*x # defines the true (unkown) function f y<-true.func+rnorm(length(x),0,0.3) # add random i.i.d errors points(x,true.func,type='l',lwd=2) # plot the true function y x #superimposed to the observed data You can later try to change the amount of noise in the data (the standard deviation in the rnorm function) or the period in the sinusoidal function to see how this impact the estimation. Let us now try to estimate the regression function using a nearest neighbors estimator. We need to install and load the R package FNN which contains the function to find the nearest neighbors in a dataset. 1

2 install.packages("fnn") and now we can define a function for the estimator: library(fnn) fknn<-function(x,y,k=5){ # this is a function definition + fx<-rep(na,length(x)) + index<- get.knn(data=x,k=k)$nn.index # finds the K nearest neighbors for every x + for (i in 1:length(x)){ + fx[i]<-mean(y[index[i,]]) # avereges the K nearest neighbors + } + fx # returns the estimated f + } We can use the function above to get the nearest neighbors estimators: NN_fhat<-fkNN(x,y,K=5) points(x,nn_fhat,type='l',lwd=2) Try with different values of K and choose the best one by visual inspection. The visual inspection suggests a number of neighbors between 10 and 15. Consider now the kernel smoothing estimator, which is available in R with the command ksmooth: KN_fhat<-ksmooth(x,y,kernel="normal",bandwidth=0.5) points(kn_fhat$x,kn_fhat$y,type='l',lwd=2) Try using different values for the bandwidth h, which one provides the best fit? Remember that smaller values of the bandwidth leads to a high variation in the fitted curve, while larger values of h give a smoother curve. The visual inspection suggests a bandwidth around You can look at the help of the ksmooth function to learn which other kernel functions (in addition to the Gaussian one) are available. Does the change of the kernel impact on the curve estimate? The other only available kernel is the box function 1 [x h/2,x+h/2], which provides a less smooth estimate, whose irregularities can be detected by the human eye. To apply a local polynomial smoother to the same data: LP_fhat<-loess(y~x,span=2,degree=2) points(x,lp_fhat$fitted,type='l',lwd=2) Here you have two parameters that control the fit: the degree of the piecewise polynomial (set by the option degree) and the bandwidth (option span). Try different values for these two parameters (check the help for admissible values), which ones provide the best fit? 2

3 Visual inspection suggests a bandwidth of 0.5 for a second degree polynomial or a bandwidth between 0.25 and 0.3 for a first degree polynomial. The smoothing splines estimator is implemented in R in the function smooth.spline. To control the smoothing, you can alternatively specify the smoothing parameter λ, via the option spar in the function (where λ is a monotone function of spar, see the help for more details) SS_fhat<-smooth.spline(x,y,spar=0.5) points(x,ss_fhat$y,type='l',lwd=2) or you can specify the effective degrees of freedom of the fitted curve. SS_fhat<-smooth.spline(x,y,df=3) points(x,ss_fhat$y,type='l',lwd=2) You can get the equivalent the parameters spar, λ and the effective degrees of freedom associated with a fitted model with SS_fhat$lambda SS_fhat$df SS_fhat$spar Choose visually an appropriate fit and report your choice of λ and the correspondent effective degrees of freedom. A reasonable fit is obtained with spar= 0.7, corresponding to λ = and 8.4 effective degrees of freedom. An automatic choice of the smoothing parameters can be obtained using the option cv=true. SS_fhat<-smooth.spline(x,y,cv=TRUE) points(x,ss_fhat$y,type='l',lwd=2,col=3) What are the value of λ and the effective degrees of freedom selected by cross-validation? The cross-validation method selects λ = and 6.57 effective degrees of freedom, not too far from what could be expected from visual inspection. If we set cv=false (without selecting the parameter ourselves), the function chooses the smoothing parameter via generalized cross validation, a modified version of the cross-validation error. Now superimpose in the same plot the fitted curves obtained from the various non parametric estimators, with your choice for the best smoothing parameters. Which one perform best in this case? NN_fhat<-fkNN(x,y,K=13) points(x,nn_fhat,type='l',lwd=2) KN_fhat<-ksmooth(x,y,kernel="normal",bandwidth=0.15) points(kn_fhat$x,kn_fhat$y,type='l',lwd=2, col=2) 3

4 LP_fhat<-loess(y~x,span=0.5,degree=2) points(x,lp_fhat$fitted,type='l',lwd=2,col=3) SS_fhat<-smooth.spline(x,y,cv=TRUE) points(x,ss_fhat$y,type='l',lwd=2,col=4) y x Local polynomial and smoothing splines perform best. Regression (cubic) splines can be implemented using the package splines and specifying either the knots or the effective degrees of freedom (in this case equispaced knots are assumed). The function ns build the matrix G we have seen in the lecture notes, than we can simply use lm to fit the model. library(splines) reg_mod<-lm(y~ns(x,df=8)) points(x,reg_mod$fitted.values,type='l',lwd=2) Imagine now you are confronted with a more complicated function which contains some local feature you want to describe in your model: set.seed(1122) # so we all get the same 'random' data x<-seq(0,1,len=100) # equally spaced grid of predictor values true.func<-sin(2*pi*x)+2*x # defines the true (unkown) function f 4

5 true.func[71:80]<-3*sin(2*seq(0,pi,len=10)) y<-true.func+rnorm(length(x),0,0.3) # add random i.i.d errors points(x,true.func,type='l',lwd=2) # plot the true function superimposed to the observed reg_mod<-lm(y~ns(x,df=8)) points(x,reg_mod$fitted.values,type='l',lwd=2) As you can see, we are completely missing the local feature in the fit. We would like therefore to place additional knots between x = 0.7 and x = 0.8. knots<-c(0.2,0.4,0.6,0.7,0.72,0.74,0.75,0.76,0.77,0.8) reg_mod<-lm(y~ns(x,knots=knots)) points(x,reg_mod$fitted.values,type='l',lwd=2) Try to fit this curve using a smoothing spline. What happens? To get an accurate fit for the local feature, we are forced to choose lambda small enough that in the rest of the domain we get a very irregular curve. 2. Cars data We have seen in the first part of the course the dataset cars that which is provided by R and contains speeds and stopping distances for a set of cars. The aim was to predict the stopping time (dist) from the speed (speed). However, neither linear models nor generalized linear models provided a completely satisfactory fit for the data. We try now fitting a nonparametric regression. data(cars) attach(cars) plot(speed,dist) Fit a regression function using smoothing spline and choosing the smoothing parameter using generalized cross-validation. What is an appropriate choice of λ in this case? What are the effective degrees of freedom of the selected model? lambda = and the effective degrees of freedom are cars_fit<-smooth.spline(speed,dist,cv=false) cars_fit$df ## [1] cars_fit$lambda ## [1] plot(speed,dist) points(cars_fit$x,cars_fit$y,type='l') 5

6 dist speed Compare the fitted curve with what you would obtain from a linear or quadratic parametric model. cars_mod<-lm(dist~speed) cars_mod2<-lm(dist~speed+i(speed^2)) plot(speed,dist) points(speed,cars_mod$fitted.values,type='l',lwd=2,col=1) points(speed,cars_mod2$fitted.values,type='l',lwd=2,col=2) points(cars_fit$x,cars_fit$y,type='l',lwd=2,col=4) 6

7 dist speed As suggested by the number of effective degrees of freedom, the nonparametric model selects an intermediate choice between a linear and a quadratic model. 3. Signature acceleration data In a neurophysiological study, researchers put an accelerometer on the index finger of the participants when they are asked to write their signature. The researchers need first to estimate the acceleration as function of time for each participant. The file signature.txt on the course website contains the data for the first participant of the experiment. data<-read.table("signature.txt",header=true) attach(data) plot(time, acceleration) Looking at the plot, which method should be preferred among the ones we have considered in this practical? Why? The prominent presence of a localized feature around 0.75 seconds suggests the use of regression splines. Fit the nonparametric regression curve and find the value of the estimated value and position of the acceleration peak. We fit the curve using regression splines and choosing the knots so that they are dense around the localized feature at time We realize that an additional knot is also needed at the beginning to catch the rapid increase of the acceleration. If we use the predict command to evaluate the function on a finer grid, we find that the peak is at seconds and its estimated value is

8 knots<-c(0.05,0.1,0.4,0.6,0.7,0.72,0.74,0.75,0.76,0.77,0.8) sig_mod<-lm(acceleration~ns(time,knots=knots)) plot(time,acceleration) points(time,sig_mod$fitted.values,type='l',lwd=2) acceleration time 4. A simple additive model We see now how to fit a simple additive model. The file ozone_data contains 330 observations of the concentration of ozone, a measure of the pressure gradient and the day of the year in which the measurement have been taken. The aim is to fit an additive model for the concentration of ozone. You may need to install the package gam first. library(gam) data<-read.table("ozone_data.txt",header=true) attach(data) ozone_model<-gam(ozone~0+lo(pressure_grad,span=0.5,degree=2) + +lo(day,span=0.5,degree=2)) The lo function specifies that the predictor has to be smoothed with a local regression with the chosen degree and bandwidth. Then the function gam fits the additive model using the backfitting algorithm. What is the algebraic form of the model? Let Y i be the concentration of ozone for the i th observation, P i the correspondent pressure gradient and D i the day. The algebraic form of the model is Y i = f 1 (P i ) + f 2 (D i ) + ε i, 8

9 where ε i are i.i.d errors with zero mean and variance σ 2 and f 1 and f 2 are unknown regression functions. If we had use the formula ozone~ lo(pressure_grad,span=0.5,degree=2)+..., R would have included an intercept in the model. What would have been its algebraic form in this case? Y i = β 0 + f 1 (P i ) + f 2 (D i ) + ε i. We can now evaluate the fit by plotting the estimated regression functions and the marginal residuals (the difference between the observations and the other regression function): plot(ozone_model,residuals=true) Is the smoothing appropriate? Try changing the bandwidth in the lo function. The smoothing appears reasonable, it may be possible to slightly shorten the bandwidth for the pressure gradient. Alternatively, it is possible to use smoothing splines: ozone_spl<-gam(ozone~s(pressure_grad,df=3)+s(day,df=3)) #or ozone_sp2<-gam(ozone~s(pressure_grad,spar = 0.5)+s(day,spar=0.5)) 9

The pspline Package. August 4, Author S original by Jim Ramsey R port by Brian Ripley

The pspline Package. August 4, Author S original by Jim Ramsey R port by Brian Ripley The pspline Package August 4, 2004 Version 1.0-8 Date 2004-08-04 Title Penalized Smoothing Splines Author S original by Jim Ramsey . R port by Brian Ripley .

More information

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

HW 10 STAT 472, Spring 2018

HW 10 STAT 472, Spring 2018 HW 10 STAT 472, Spring 2018 1) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, you can merely submit the things

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

HW 10 STAT 672, Summer 2018

HW 10 STAT 672, Summer 2018 HW 10 STAT 672, Summer 2018 1) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, try to use the 64 bit version

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

Lecture 7: Splines and Generalized Additive Models

Lecture 7: Splines and Generalized Additive Models Lecture 7: and Generalized Additive Models Computational Statistics Thierry Denœux April, 2016 Introduction Overview Introduction Simple approaches Polynomials Step functions Regression splines Natural

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Nonparametric Regression and Generalized Additive Models Part I

Nonparametric Regression and Generalized Additive Models Part I SPIDA, June 2004 Nonparametric Regression and Generalized Additive Models Part I Robert Andersen McMaster University Plan of the Lecture 1. Detecting nonlinearity Fitting a linear model to a nonlinear

More information

NONPARAMETRIC REGRESSION TECHNIQUES

NONPARAMETRIC REGRESSION TECHNIQUES NONPARAMETRIC REGRESSION TECHNIQUES C&PE 940, 28 November 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and other resources available at: http://people.ku.edu/~gbohling/cpe940

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Curve fitting using linear models

Curve fitting using linear models Curve fitting using linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark September 28, 2012 1 / 12 Outline for today linear models and basis functions polynomial regression

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR J. D. Maca July 1, 1997 Abstract The purpose of this manual is to demonstrate the usage of software for

More information

STA 414/2104 S: February Administration

STA 414/2104 S: February Administration 1 / 16 Administration HW 2 posted on web page, due March 4 by 1 pm Midterm on March 16; practice questions coming Lecture/questions on Thursday this week Regression: variable selection, regression splines,

More information

Generalized additive models I

Generalized additive models I I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 16, 2015 Urtasun & Zemel (UofT) CSC 411: 02-Regression Sep 16, 2015 1 / 16 Today Linear regression problem continuous

More information

Nonparametric Regression and Cross-Validation Yen-Chi Chen 5/27/2017

Nonparametric Regression and Cross-Validation Yen-Chi Chen 5/27/2017 Nonparametric Regression and Cross-Validation Yen-Chi Chen 5/27/2017 Nonparametric Regression In the regression analysis, we often observe a data consists of a response variable Y and a covariate (this

More information

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona Department of Mathematics University of Arizona hzhang@math.aricona.edu Fall 2019 What is R R is the most powerful and most widely used statistical software Video: A language and environment for statistical

More information

A review of spline function selection procedures in R

A review of spline function selection procedures in R Matthias Schmid Department of Medical Biometry, Informatics and Epidemiology University of Bonn joint work with Aris Perperoglou on behalf of TG2 of the STRATOS Initiative September 1, 2016 Introduction

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

Introduction to ANSYS DesignXplorer

Introduction to ANSYS DesignXplorer Lecture 4 14. 5 Release Introduction to ANSYS DesignXplorer 1 2013 ANSYS, Inc. September 27, 2013 s are functions of different nature where the output parameters are described in terms of the input parameters

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

Interactive Graphics. Lecture 9: Introduction to Spline Curves. Interactive Graphics Lecture 9: Slide 1

Interactive Graphics. Lecture 9: Introduction to Spline Curves. Interactive Graphics Lecture 9: Slide 1 Interactive Graphics Lecture 9: Introduction to Spline Curves Interactive Graphics Lecture 9: Slide 1 Interactive Graphics Lecture 13: Slide 2 Splines The word spline comes from the ship building trade

More information

Four equations are necessary to evaluate these coefficients. Eqn

Four equations are necessary to evaluate these coefficients. Eqn 1.2 Splines 11 A spline function is a piecewise defined function with certain smoothness conditions [Cheney]. A wide variety of functions is potentially possible; polynomial functions are almost exclusively

More information

Lecture 8. Divided Differences,Least-Squares Approximations. Ceng375 Numerical Computations at December 9, 2010

Lecture 8. Divided Differences,Least-Squares Approximations. Ceng375 Numerical Computations at December 9, 2010 Lecture 8, Ceng375 Numerical Computations at December 9, 2010 Computer Engineering Department Çankaya University 8.1 Contents 1 2 3 8.2 : These provide a more efficient way to construct an interpolating

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines EECS 556 Image Processing W 09 Interpolation Interpolation techniques B splines What is image processing? Image processing is the application of 2D signal processing methods to images Image representation

More information

Lecture 9: Introduction to Spline Curves

Lecture 9: Introduction to Spline Curves Lecture 9: Introduction to Spline Curves Splines are used in graphics to represent smooth curves and surfaces. They use a small set of control points (knots) and a function that generates a curve through

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

Smooth Curve from noisy 2-Dimensional Dataset

Smooth Curve from noisy 2-Dimensional Dataset Smooth Curve from noisy 2-Dimensional Dataset Avik Kumar Mahata 1, Utpal Borah 2,, Aravind Da Vinci 3, B.Ravishankar 4, Shaju Albert 5 1,4 Material Science and Engineering, National Institute of Technology,

More information

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different

More information

Video 11.1 Vijay Kumar. Property of University of Pennsylvania, Vijay Kumar

Video 11.1 Vijay Kumar. Property of University of Pennsylvania, Vijay Kumar Video 11.1 Vijay Kumar 1 Smooth three dimensional trajectories START INT. POSITION INT. POSITION GOAL Applications Trajectory generation in robotics Planning trajectories for quad rotors 2 Motion Planning

More information

Nonlinearity and Generalized Additive Models Lecture 2

Nonlinearity and Generalized Additive Models Lecture 2 University of Texas at Dallas, March 2007 Nonlinearity and Generalized Additive Models Lecture 2 Robert Andersen McMaster University http://socserv.mcmaster.ca/andersen Definition of a Smoother A smoother

More information

Smoothing Scatterplots Using Penalized Splines

Smoothing Scatterplots Using Penalized Splines Smoothing Scatterplots Using Penalized Splines 1 What do we mean by smoothing? Fitting a "smooth" curve to the data in a scatterplot 2 Why would we want to fit a smooth curve to the data in a scatterplot?

More information

Splines. Chapter Smoothing by Directly Penalizing Curve Flexibility

Splines. Chapter Smoothing by Directly Penalizing Curve Flexibility Chapter 7 Splines 7.1 Smoothing by Directly Penalizing Curve Flexibility Let s go back to the problem of smoothing one-dimensional data. We imagine, that is to say, that we have data points (x 1, y 1 ),(x

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN

More information

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K. A toolbo of smooths Simon Wood Mathematical Sciences, University of Bath, U.K. Smooths for semi-parametric GLMs To build adequate semi-parametric GLMs requires that we use functions with appropriate properties.

More information

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection - 2. Importance of edge detection in computer vision Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

More information

Package slp. August 29, 2016

Package slp. August 29, 2016 Version 1.0-5 Package slp August 29, 2016 Author Wesley Burr, with contributions from Karim Rahim Copyright file COPYRIGHTS Maintainer Wesley Burr Title Discrete Prolate Spheroidal

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

Package RLRsim. November 4, 2016

Package RLRsim. November 4, 2016 Type Package Package RLRsim November 4, 2016 Title Exact (Restricted) Likelihood Ratio Tests for Mixed and Additive Models Version 3.1-3 Date 2016-11-03 Maintainer Fabian Scheipl

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

Computational Physics PHYS 420

Computational Physics PHYS 420 Computational Physics PHYS 420 Dr Richard H. Cyburt Assistant Professor of Physics My office: 402c in the Science Building My phone: (304) 384-6006 My email: rcyburt@concord.edu My webpage: www.concord.edu/rcyburt

More information

Supplementary Figure 1. Decoding results broken down for different ROIs

Supplementary Figure 1. Decoding results broken down for different ROIs Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas

More information

See the course website for important information about collaboration and late policies, as well as where and when to turn in assignments.

See the course website for important information about collaboration and late policies, as well as where and when to turn in assignments. COS Homework # Due Tuesday, February rd See the course website for important information about collaboration and late policies, as well as where and when to turn in assignments. Data files The questions

More information

Package lmesplines. R topics documented: February 20, Version

Package lmesplines. R topics documented: February 20, Version Version 1.1-10 Package lmesplines February 20, 2015 Title Add smoothing spline modelling capability to nlme. Author Rod Ball Maintainer Andrzej Galecki

More information

15.10 Curve Interpolation using Uniform Cubic B-Spline Curves. CS Dept, UK

15.10 Curve Interpolation using Uniform Cubic B-Spline Curves. CS Dept, UK 1 An analysis of the problem: To get the curve constructed, how many knots are needed? Consider the following case: So, to interpolate (n +1) data points, one needs (n +7) knots,, for a uniform cubic B-spline

More information

Stat 4510/7510 Homework 6

Stat 4510/7510 Homework 6 Stat 4510/7510 1/11. Stat 4510/7510 Homework 6 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that

More information

Edge detection. Convert a 2D image into a set of curves. Extracts salient features of the scene More compact than pixels

Edge detection. Convert a 2D image into a set of curves. Extracts salient features of the scene More compact than pixels Edge Detection Edge detection Convert a 2D image into a set of curves Extracts salient features of the scene More compact than pixels Origin of Edges surface normal discontinuity depth discontinuity surface

More information

Chapter 5: Basis Expansion and Regularization

Chapter 5: Basis Expansion and Regularization Chapter 5: Basis Expansion and Regularization DD3364 April 1, 2012 Introduction Main idea Moving beyond linearity Augment the vector of inputs X with additional variables. These are transformations of

More information

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is...

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is... 1D Regression i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is... 1 Non-linear problems What if the underlying function is

More information

Interpolation - 2D mapping Tutorial 1: triangulation

Interpolation - 2D mapping Tutorial 1: triangulation Tutorial 1: triangulation Measurements (Zk) at irregular points (xk, yk) Ex: CTD stations, mooring, etc... The known Data How to compute some values on the regular spaced grid points (+)? The unknown data

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Package SiZer. February 19, 2015

Package SiZer. February 19, 2015 Version 0.1-4 Date 2011-3-21 Title SiZer: Significant Zero Crossings Package SiZer February 19, 2015 Author Derek Sonderegger Maintainer Derek Sonderegger

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

Nonparametric Mixed-Effects Models for Longitudinal Data

Nonparametric Mixed-Effects Models for Longitudinal Data Nonparametric Mixed-Effects Models for Longitudinal Data Zhang Jin-Ting Dept of Stat & Appl Prob National University of Sinagpore University of Seoul, South Korea, 7 p.1/26 OUTLINE The Motivating Data

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

ME 261: Numerical Analysis Lecture-12: Numerical Interpolation

ME 261: Numerical Analysis Lecture-12: Numerical Interpolation 1 ME 261: Numerical Analysis Lecture-12: Numerical Interpolation Md. Tanver Hossain Department of Mechanical Engineering, BUET http://tantusher.buet.ac.bd 2 Inverse Interpolation Problem : Given a table

More information

Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS

Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS HOUSEKEEPING Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS Quizzes Lab 8? WEEK EIGHT Lecture INTERPOLATION & SPATIAL ESTIMATION Joe Wheaton READING FOR TODAY WHAT CAN WE COLLECT AT POINTS?

More information

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects Patralekha Bhattacharya Thinkalytics The PDLREG procedure in SAS is used to fit a finite distributed lagged model to time series data

More information

CPSC 695. Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova

CPSC 695. Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova CPSC 695 Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova Overview Data sampling for continuous surfaces Interpolation methods Global interpolation Local interpolation

More information

CS 450 Numerical Analysis. Chapter 7: Interpolation

CS 450 Numerical Analysis. Chapter 7: Interpolation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

1 StatLearn Practical exercise 5

1 StatLearn Practical exercise 5 1 StatLearn Practical exercise 5 Exercise 1.1. Download the LA ozone data set from the book homepage. We will be regressing the cube root of the ozone concentration on the other variables. Divide the data

More information

Comparison of Linear Regression with K-Nearest Neighbors

Comparison of Linear Regression with K-Nearest Neighbors Comparison of Linear Regression with K-Nearest Neighbors Rebecca C. Steorts, Duke University STA 325, Chapter 3.5 ISL Agenda Intro to KNN Comparison of KNN and Linear Regression K-Nearest Neighbors vs

More information

Nonparametric Regression

Nonparametric Regression 1 Nonparametric Regression Given data of the form (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), we seek an estimate of the regression function g(x) satisfying the model y = g(x) + ε where the noise term satisfies

More information

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3 Cubic Splines a knots: a< 1 < 2 < < m

More information

99 International Journal of Engineering, Science and Mathematics

99 International Journal of Engineering, Science and Mathematics Journal Homepage: Applications of cubic splines in the numerical solution of polynomials Najmuddin Ahmad 1 and Khan Farah Deeba 2 Department of Mathematics Integral University Lucknow Abstract: In this

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response

More information