Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Similar documents
Generalized Additive Models

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Generalized additive models I

Moving Beyond Linearity

Economics Nonparametric Econometrics

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

Moving Beyond Linearity

Lecture 17: Smoothing splines, Local Regression, and GAMs

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Applied Statistics : Practical 9

Lecture 7: Splines and Generalized Additive Models

HW 10 STAT 472, Spring 2018

Salary 9 mo : 9 month salary for faculty member for 2004

Model Selection and Inference

Splines and penalized regression

Stat 8053, Fall 2013: Additive Models

HW 10 STAT 672, Summer 2018

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Generalized Additive Model

Nonparametric Regression and Generalized Additive Models Part I

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that

A Knitr Demo. Charles J. Geyer. February 8, 2017

STAT 705 Introduction to generalized additive models

Nonparametric Approaches to Regression

Nonparametric Regression

Nonlinearity and Generalized Additive Models Lecture 2

Generalized Additive Models

Lecture 16: High-dimensional regression, non-linear regression

Lecture 8. Divided Differences,Least-Squares Approximations. Ceng375 Numerical Computations at December 9, 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

BayesFactor Examples

Nonparametric Regression

Lasso. November 14, 2017

Analysis of variance - ANOVA

Solution to Series 7

Notes for week 3. Ben Bolker September 26, Linear models: review

Multiple Linear Regression

22s:152 Applied Linear Regression DeCook Fall 2011 Lab 3 Monday October 3

Lecture 06 Decision Trees I

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

BASIC LOESS, PBSPLINE & SPLINE

Computer Graphics Curves and Surfaces. Matthias Teschner

Multiple Linear Regression: Global tests and Multiple Testing

ITSx: Policy Analysis Using Interrupted Time Series

Lecture VIII. Global Approximation Methods: I

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A Comparative Study of LOWESS and RBF Approximations for Visualization

Nina Zumel and John Mount Win-Vector LLC

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

NONPARAMETRIC REGRESSION TECHNIQUES

R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R

Regression on the trees data with R

The linear mixed model: modeling hierarchical and longitudinal data

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Lecture 03 Linear classification methods II

Tips and Guidance for Analyzing Data. Executive Summary

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

1 Standard Errors on Different Models

Week 4: Simple Linear Regression II

Nonparametric Regression in R

Lecture 26: Missing data

STA 414/2104 S: February Administration

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Importance Sampling for Big Data

Lecture 13: Model selection and regularization

Knowledge Discovery and Data Mining

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

PSY 9556B (Feb 5) Latent Growth Modeling

One Factor Experiments

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

22s:152 Applied Linear Regression. Chapter 18: Nonparametric Regression (Lowess smoothing)

Last time... Bias-Variance decomposition. This week

Lecture 25: Review I

Applied Statistics and Econometrics Lecture 6

Categorical explanatory variables

Making the Transition from R-code to Arc

Advanced Econometric Methods EMET3011/8014

Linear Regression and K-Nearest Neighbors 3/28/18

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

MINI-PAPER A Gentle Introduction to the Analysis of Sequential Data

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

CS 450 Numerical Analysis. Chapter 7: Interpolation

Topics in Machine Learning

Linear Methods for Regression and Shrinkage Methods

Section 4 Matching Estimator

Cross-Validation Alan Arnholt 3/22/2016

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

Nonparametric Regression and Cross-Validation Yen-Chi Chen 5/27/2017

Week 10: Heteroskedasticity II

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

2D Spline Curves. CS 4620 Lecture 18

Transcription:

Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common variants: GAM and LOESS 3 Being able to apply the above methods in R and visualize the result Non-Linear Regression 2

Outline 1 Overview 2 Splines 3 Generalized Additive Models 4 Local Regression 5 Wrap-Up Non-Linear Regression 3

Outline 1 Overview 2 Splines 3 Generalized Additive Models 4 Local Regression 5 Wrap-Up Non-Linear Regression: Overview 4

Linear Trend Line Example Load dataset to visualize the relationship between age and wage library(islr) data(wage) Wage.small <- Wage[1:250, ] # fewer points look nicer on slid Load package ggplot2 for visualization library(ggplot2) Plot linear trend line, ( OLS via method="lm") ggplot(wage.small, aes(x=age, y=wage)) + geom_point(size=0.5) + geom_jitter(size=0.5) + geom_smooth(method="lm") + theme_bw() geom_jitter(...) jitters points to reduce overlaps geom_smooth(...) is a default way to add smoothed lines Non-Linear Regression: Overview 5

Linear Trend Line Example Blue line is linear trend line with standard errors (gray area) 100 200 300 20 40 60 age wage With higher age, the wage flattens and even shrinks Relationship is obviously not linear 6 Non-Linear Regression: Overview

Motivation Problem description Linear relationships show that variables are dependent e. g. increase in income increase in happiness In reality, the true nature of effects and relationships is often non-linear e. g. there is a maximum level of happiness Default regression techniques cannot adequately adapt to it Solution: non-linear regressions Advantages 1 Identify which relationships are non-linear 2 Visualize the shape of non-linear effects 3 Improve the fit (and/or the prediction performance) of models Non-Linear Regression: Overview 7

Non-Linear Regressions Overview 1 Polynomial regression adding quadratic, cubic,... terms 2 Step-wise functions similar to dummies for specific intervals 3 Splines piecewise polynomial function 4 Generalized additive models non-linear transformations for each term, but in additive fashion 5 Local regressions sequence of regressions each based on a small neighborhood Non-Linear Regression: Overview 8

Polynomial Regression Definition Linear regression extended by adding quadratic, cubic,... terms y = β 0 + β 1 x + β 2 x 2 + β 3 x 3 +... In practice, compute x 1 := x and x 2 := x 2, etc. before estimating y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 +... Degree of polynomial fixed beforehand or chosen by cross validation Evaluation Coefficients often not relevant difficult to interpret Mostly interested in t-test for coefficients or the fitted model Explosive growth nature of polynomials makes extrapolation difficult Non-Linear Regression: Overview 9

Polynomial Regression in R Generate sample data set.seed(0) x <- runif(50, min=0, max=100) y <- sin(x/50*pi) + runif(50, min=-0.5, max=0.5) Generate polynomial terms of up to degree d via poly(x, degree=d, raw=true), then perform least squares m <- lm(y ~ poly(x, degree=3, raw=true)) Note: raw=true chooses default polynomials; else it uses orthogonal ones which are numerically more convenient Manual alternative m <- lm(y ~ x + I(x^2) + I(x^3)) Note: I(...) is necessary to interpret arithmetic operations in a formula as such Non-Linear Regression: Overview 10

Polynomial Regression in R Visualize result by manually generating the fitted line predict_x <- seq(from=0, to=100, by=1) # Named dataframe to avoid generating polynomial terms predict_y <- predict(m, newdata=data.frame(x=predict_x)) plot(x, y) lines(predict_x, predict_y, col="red") 0 20 40 60 80 100 1.0 0.5 0.0 0.5 1.0 x y 11 Non-Linear Regression: Overview

Polynomial Regression in R ANOVA tests can identify the best-fit model m.d2 <- lm(y ~ poly(x, degree=2, raw=true)) m.d3 <- lm(y ~ poly(x, degree=3, raw=true)) m.d4 <- lm(y ~ poly(x, degree=4, raw=true)) anova(m.d2, m.d3, m.d4) ## Analysis of Variance Table ## ## Model 1: y ~ poly(x, degree = 2, raw = TRUE) ## Model 2: y ~ poly(x, degree = 3, raw = TRUE) ## Model 3: y ~ poly(x, degree = 4, raw = TRUE) ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 47 12.1890 ## 2 46 3.8570 1 8.3321 97.4464 7.786e-13 *** ## 3 45 3.8477 1 0.0093 0.1085 0.7434 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The P-value comparing d = 2 and d = 3 is almost zero Quadratic model is not sufficient cubic is preferred Non-Linear Regression: Overview 12

Stew-Wise Functions Definition Fit a piecewise constant function c 1 if x I 1 y(x) = c 2 if x I 2...... No default procedure to choosing intervals Implemented via set of dummy variables for each interval I 1,... Useful for interaction terms, i. e. x 1 x 2 e. g. salary depends on the interaction of education and tenure Non-Linear Regression: Overview 13

Stew-Wise Functions in R Generate sample data set.seed(0) x <- c(runif(20, min=0, max=40), runif(20, min=40, max=100)) y <- c(runif(20, min=0, max=10), runif(20, min=30, max=40)) y <- y + runif(40, min=-5, max=5) Estimate linear model with dummies m <- lm(y ~ I(x < 40)) coef(m) ## (Intercept) I(x < 40)TRUE ## 35.27628-30.15988 Alternative is to split data via cut(x, breaks=...) x2 <- cut(x, breaks=c(0, 40, 100)) coef(lm(y ~ x2)) ## (Intercept) x2(40,100] ## 5.116405 30.159878 Non-Linear Regression: Overview 14

Step-Wise Regression in R Visualize result by manually generating the fitted line predict_x <- seq(from=0, to=100, by=1) # Named dataframe to avoid generating polynomial terms predict_y <- predict(m, newdata=data.frame(x=predict_x)) plot(x, y) lines(predict_x, predict_y, col="red") y 0 10 20 30 40 0 20 40 60 80 Non-Linear Regression: Overview x 15

Spline Regression Divide the range of variables into k distinct regions I 1,...,I k Fit a polynomial function to each region β (1) 0 + β (1) 1 x + β (1) 2 x +... if x I 1 y(x) =...... β (k) 0 + β (k) 1 x + β (k) 2 x +... if x I k Can be rewritten with basis functions b i as y(x) = β 0 +β 1 b 1 (x)+β 2 b 2 (x)+... Smoothing splines work similarly but enforce a certain continuity 2 4 6 8 10 0 20 40 60 80 100 Non-Linear Regression: Overview 16

Outline 1 Overview 2 Splines 3 Generalized Additive Models 4 Local Regression 5 Wrap-Up Non-Linear Regression: Splines 17

Splines To be added... Non-Linear Regression: Splines 18

Outline 1 Overview 2 Splines 3 Generalized Additive Models 4 Local Regression 5 Wrap-Up Non-Linear Regression: GAM 19

Generalized Additive Models Generalized Additive Models (GAM) Extend standard linear model by allowing non-linear functions Outcome depends linearly on (smooth) non-linear functions f j y i = β 0 + p j=1 f j (x ij ) Model is called additive, since we calculate separate f j (x ij ) for each x i f j can be polynomials, though splines are more common Non-Linear Regression: GAM 20

Pros and Cons Advantages Fits automatically non-linear f j to each x i no need for manual trying Non-linearity achieves a more accurate fit Model is additive, thus it is possible to examine the effect of each x i on y individually Drawbacks Restricted to an additive model important interactions can be missed Non-Linear Regression: GAM 21

GAM Smoothing in ggplot2 ggplot2 has a built-in support for GAM via method="gam" ggplot(wage.small, aes(x=age, y=wage)) + geom_point(size=0.5) + geom_jitter(size=0.5) + geom_smooth(method="gam") + theme_bw() 100 200 300 20 40 60 age wage 22 Non-Linear Regression: GAM

GAM in R Load the gam package library(gam) Estimate model, e. g. with smoothing splines m.gam <- gam(wage ~ s(year, 4) + s(age, 5) + education, data=wage) m.gam ## Call: ## gam(formula = wage ~ s(year, 4) + s(age, 5) + education, data = Wa ## ## Degrees of Freedom: 2999 total; 2986 Residual ## Residual Deviance: 3689770 s(variable, df) introduces smoothing splines of degree df ns(variable, df) are natural splines education is a factor and thus not treated Detailed summary on results via summary(m.gam) Non-Linear Regression: GAM 23

GAM in R Component-wise plots show the effect of each term par(mfrow=c(1, 3)) plot.gam(m.gam, se=true, col="blue") ## Error in 1:object$nsdf: argument of length 0 One might think that effect of year is linear Non-Linear Regression: GAM 24

GAM in R ANOVA test identifies best-fit model e. g. excluding year or assuming a linear or non-linear effect m.gam1 <- gam(wage ~ s(age, 5) + education, data=wage) m.gam2 <- gam(wage ~ year + s(age, 5) + education, data=wage) m.gam3 <- gam(wage ~ s(year, 4) + s(age, 5) + education, data=wage) anova(m.gam1, m.gam2, m.gam3) ## Analysis of Deviance Table ## ## Model 1: wage ~ s(age, 5) + education ## Model 2: wage ~ year + s(age, 5) + education ## Model 3: wage ~ s(year, 4) + s(age, 5) + education ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## 1 2990 3711731 ## 2 2989 3693842 1 17889.2 0.0001419 *** ## 3 2986 3689770 3 4071.1 0.3483897 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 GAM with linear year is better than without (P-value < 0.001) Non-linear effect of year is not necessary (P-value > 0.05) Non-Linear Regression: GAM 25

Outline 1 Overview 2 Splines 3 Generalized Additive Models 4 Local Regression 5 Wrap-Up Non-Linear Regression: LOESS 26

Local Regression Local regression (LOESS) Based on earlier locally weighted scatterplot smoothing (LOWESS) Locally weighted regression using nearest neighbors Weight points stronger that are closer Put less weights on the points further away Non-parametric approach Smoother has no pre-defined form but is constructed from data As such, the underlying distributions can be unknown However, needs more observations to infer relationships from data Non-Linear Regression: LOESS 27

LOESS Idea: locally weighting nearest neighbors 0.75 0.80 0.85 0.90 0.95 1.00 0.75 0.80 0.85 0.90 0.95 1.00 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Non-Linear Regression: LOESS 28

LOESS High-level procedure 1 Choose a point x 0 at which LOESS is calculated 2 Choose subsets whose size is determined by a smoothing parameter Proportion α of all points which influence the curvature of the smoothing at each point Subset consists of the α nearest neighbors of x 0 Useful values of α are often in the range 0.25 to 0.5 3 Fit a low-degree polynomial to each subset Fitting is done by weighted least squares Gives more weight to points closer to x 0 Degree is often linear or quadratic 4 Repeat the above steps for all points in the dataset Non-Linear Regression: LOESS 29

Pros and Contras Advantages Independent of a specific model (or distributions) that fit all the data Very flexible and thus can adapt to complex relationships No need to estimate a global function only use nearest neighbors with a smoothing parameter Drawbacks High computational costs ggplot2 uses LOESS by default for up to 1000 data points Requires large and fairly dense dataset for good results Result is not a closed form solution, which can be further evaluated or interpreted Non-Linear Regression: LOESS 30

LOESS in ggplot2 ggplot2 has a built-in support for LOESS via method="loess" ggplot(wage.small, aes(age, wage)) + geom_point(size=0.5) + geom_jitter(size=0.5) + geom_smooth(method="loess", span=0.3) + theme_bw() 0 100 200 300 20 40 60 age wage 31 Non-Linear Regression: LOESS

LOESS in ggplot2 Parameter span controls the intensity of smoothing span=0.1 0 100 200 300 20 40 60 span=0.5 100 200 300 20 40 60 span=0.25 100 200 300 20 40 60 span=1 100 200 300 20 40 60 32 Non-Linear Regression: LOESS

Outline 1 Overview 2 Splines 3 Generalized Additive Models 4 Local Regression 5 Wrap-Up Non-Linear Regression: Wrap-Up 33

Wrap-Up Summary Non-linear models can effectively supplement least squares Various non-linear models are available manual tests necessary to find a good model For non-parametric methods, the fitting model can be unknown ggplot(...) is helpful to quickly gain first insights or for nice visualizations Further readings Section 7 in the book An Introduction to Statistical Learning Package mgcv is a newer alternative to gam Non-Linear Regression: Wrap-Up 34