Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Similar documents
An Overview MARS is an adaptive procedure for regression, and is well suited for high dimensional problems.

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

Generalized Additive Model

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

An introduction to SPSS

Generalized Additive Models

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Nonparametric Approaches to Regression

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

CH9.Generalized Additive Model

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Chapter 10: Extensions to the GLM

Discussion Notes 3 Stepwise Regression and Model Selection

SPM Users Guide. This guide elaborates on powerful ways to combine the TreeNet and GPS engines to achieve model compression and more.

Last time... Bias-Variance decomposition. This week

Lecture 16: High-dimensional regression, non-linear regression

STAT 705 Introduction to generalized additive models

Data Management - 50%

3 Ways to Improve Your Regression

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Generalized Additive Models

Geology Geomath Estimating the coefficients of various Mathematical relationships in Geology

STA121: Applied Regression Analysis

REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER

GLMSELECT for Model Selection

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

Moving Beyond Linearity

Variable selection is intended to select the best subset of predictors. But why bother?

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Regression Model Building for Large, Complex Data with SAS Viya Procedures

CREATING THE ANALYSIS

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

TERTIARY INSTITUTIONS SERVICE CENTRE (Incorporated in Western Australia)

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Linear Methods for Regression and Shrinkage Methods

Random Forest A. Fornaser

Quantitative Methods in Management

Assessing the Quality of the Natural Cubic Spline Approximation

Monitoring of Manufacturing Process Conditions with Baseline Changes Using Generalized Regression

AN INTRODUCTION TO MULTIVARIATE ADAPTIVE REGRESSION SPLINES FOR THE CANE INDUSTRY YL EVERINGHAM, J SEXTON

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

What is machine learning?

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

Stat 8053, Fall 2013: Additive Models

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM

REGULARIZED REGRESSION FOR RESERVING AND MORTALITY MODELS GARY G. VENTER

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Splines and penalized regression

IBM SPSS Categories 23

Lab 07: Multiple Linear Regression: Variable Selection

Lecture 17: Smoothing splines, Local Regression, and GAMs

Introduction to Statistical Analyses in SAS

Knowledge Discovery and Data Mining

Using R for Analyzing Delay Discounting Choice Data. analysis of discounting choice data requires the use of tools that allow for repeated measures

One Factor Experiments

Clustering in Ratemaking: Applications in Territories Clustering

Paper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).

Generalized least squares (GLS) estimates of the level-2 coefficients,

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

An Optimal Search Process of Eigen Knots. for Spline Logistic Regression. Research Department, Point Right

Predict Outcomes and Reveal Relationships in Categorical Data

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Cognalysis TM Reserving System User Manual

8.11 Multivariate regression trees (MRT)

Two-Stage Least Squares

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002

Multiple Linear Regression

High dimensional data analysis

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Data mining techniques for actuaries: an overview

INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Applied Regression Modeling: A Business Approach

Machine Learning: An Applied Econometric Approach Online Appendix

ALGEBRA II A CURRICULUM OUTLINE

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

5.5 Regression Estimation

Stat 5100 Handout #14.a SAS: Logistic Regression

Nonparametric Regression and Generalized Additive Models Part I

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Business Club. Decision Trees

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Nonlinearity and Generalized Additive Models Lecture 2

22s:152 Applied Linear Regression

Multicollinearity and Validation CIVL 7012/8012

Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

Transcription:

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Why enhance GLM? Shortcomings of the linear modelling approach. GLM being a linear technique shares the usual shortcomings of the linear modelling approach: operate under the assumption that data is distributed according to a distribution in the exponential family are affected by multi-collinearity, outliers and missing values in the data, often troublesome to use for selecting important predictors and their interactions. categorical predictors with large numbers of categories (for example, postcode, occupation code etc) can lead to unreliable results due to sparsityrelated issues. Linear models take longer to build because of the need to address the issues above by transforming both numeric and categorical predictors and choosing predictors and their interactions by hand which can prove to be a lengthy task. Page 2

Data mining techniques in contrast, are typically fast, easily select predictors and their interactions, are minimally affected with missing values, outliers or collinearity and effectively process high-level categorical predictors. This suggests that combining a linear approach with data mining tools can expedite the modelling process, allowing the modeller to attain equal or better model accuracy in less time with the same level of interpretability. Such models, usually combining decision trees, multivariate adaptive regression splines and GLM have been used by our team in a number of projects (see Kolyshkina and Brookes, 2002). Page 3

Strengths of GLM can be effectively combined with the computational power of data mining methods The advance of the new methodologies does not mean that the proven, effective techniques such as GLM should be wholly replaced by them. This talk: 1. Discusses how the advantages and strengths of GLM can be combined with the computational power of data mining methods presenting an example of the combining (MARS ) and GLM approaches 2. Provides comparison of the results of this combined model to those achievable by hand-fitted GLM in terms of: time taken predictive power selection of predictors and their interactions interpretability of the model precision and model fit Page 4

MARS methodology overview.

MARS can be described as Generalization of stepwise linear regression or Modification of the CART method for the regression setting (ie for a continuous dependent variable) Page 6

MARS model - knots for continuous predictors For each continuous predictor X MARS forms pairs of piecewise linear functions with knots at each observed value t of the predictor: X1 is equal to X2 is equal to (X-t) if X >t 0 otherwise (t- X) if X <=t 0 otherwise It then builds a model using only these functions and their products. Value t is then called a knot Page 7

MARS model - knots. Example. For example, for the variable AGE such piecewise linear functions may be: AGE1 is equal to AGE2 is equal to (AGE-20) if AGE>20 0 otherwise (20-AGE) if AGE<=20 0 otherwise Value 20 is then called a knot Page 8

MARS model - piecewise constant functions for categorical predictors For categorical predictors MARS considers all possible binary partitions of the categories. Each such partition generates a pair of piecewise constant basis functions that are indicator functions for the two sets of categories. Page 9

MARS model - basis functions We will call these functions and their products basis functions. MARS model is a linear combination of basis functions: y= a 0 + a 1 *BF1+ a 2 *BF2 + a 3 *BF3+ The coefficients a 0, a 1, a 3, are then estimated by minimising the residual sum of squares (like in linear regression). Page 10

MARS model building. The MARS model is built in a two-phase process. In the first phase, a model is grown by adding basis functions in the manner similar to forward selection process in linear regression until an overly large model is found. In the second phase, basis functions are deleted in order of least contribution to the model in the manner similar to backward deletion process in linear regression Page 11

Optimal MARS model selection. Degrees of Freedom penalty. The optimal MARS model is the one with the lowest GCV (generalized crossvalidation) measure. The GCV criterion actually does not involve cross validation: C(M) is the cost-complexity measure of a model containing M basis functions. Note that C(M)=M is the usual measure used in linear regression. The MSE is calculated by dividing the sum of squared errors by N-M instead of by N. The GCV formula enables C(M) > M; in other words, enables charging each basis function with more than one degree of freedom. Page 12

MARS model building. Why this model strategy? The basis functions and their products are only non-zero over a part of their range, as a result the regression surface is built up parsimoniously, non-zero coefficients are only used locally, only where they are needed (unlike polynomials that produce non-zero product everywhere). By allowing for any arbitrary shape for the response function as well as for interactions, and by using the two-phase model selection method, MARS is capable of reliably tracking very complex data structures that often hide in high-dimensional data. Page 13

How the Use of MARS Can Expedite GLM Building. Most of the shortcomings of linear models outlined above can be overcome by using MARS output basis functions as the input for GLM. This makes sense because MARS : quickly selects important predictors and their interactions and transforms numeric and categorical predictors in such a way that the resulting variables are linearly independent and easy to interpret Is minimally affected by multicollinearity, outliers and missing values in the data, easily handles categorical predictors with large numbers of categories and therefore requires less data preparation than linear methods The modeller though needs to make sure that the transformed predictors make business sense, and that the MARS model is stable. Page 14

Are MARS output functions OK to use as GLM inputs? We have seen that although MARS output functions are not created specifically to be used as the input for a linear model, in practice about 90% of them turn out to be significant predictors in a GLM. Another feature of the MARS output functions that makes them useful is that they are linearly independent as stated in Friedman (1991) which means that the multi-collinearity issues do not arise in the GLM that uses them as explanatory variables. Page 15

Combined models are easily implemented in SAS MARS package is easily available, inexpensive and can work with data in most formats (SAS, SPSS, dbf etc). The output MARS produces can be combined with any GLM software with minimal effort as it is easy to code in any program language such as SAS which is the main data analysis software package used by many actuaries. Page 16

Case Study. Application of combined models to Queensland Industry CTP data provided by Motor Accident Insurance Commission (MAIC).

ackground. The methodology described above was applied in order to model the ultimate incurred number of claims based on reported claim data. The data used was industry-wide auto liability data from Queensland (commonly called Compulsory Third Party or CTP in Australia). Page 18

Data Description. Individual claim data was aggregated into the number of claims reported for each accident month and development month for input to the GLM. The variables used for the analysis were: accident month accident quarter number of casualties development month development quarter number of vehicles in the calendar year number of vehicles exposed in the month Page 19

Hand-fitted GLM An initial GLM was created without using MARS. This was a Poisson model with the log link, using the number of vehicles exposed in the month as the offset. The transformations and interactions of the input variables were created manually for the purposes of both best model fit and interpretability. The model fit was assessed by usual methods such as ratio of deviance to the degrees of freedom, predictor significance, link test and residual analysis. All the assessments showed adequacy of the model fit. Page 20

MARS - enhanced GLM A second model was created by pre-processing the variables in MARS as and then including them in a GLM as the inputs: built a MARS model with the ratio of incurred number of claims to number of vehicles exposed in the month as the dependent variable. we then used basis functions produced by the MARS model as the inputs in the Poisson model with log link and the number of vehicles exposed in the month as the offset. the GLM output showed that most ofthese variables were significant. We used backward elimination to refine the model by excluding the inputs that were not significant. the resulting model fit was assessed in the same way as for the handfitted model. Page 21

The model output included explanatory variables preprocessed by MARS in the form of basis functions: BF1 = max(0, LGDV - 0.405); BF2 = max(0, 0.405 - LGDV ); BF4 = max(0, 13.000 - DEVMTHB ); BF5 = max(0, EXPMONTH - 51.000) * BF4; BF6 = max(0, 51.000 - EXPMONTH ) * BF4; BF59 = max(0, 16.000 - DEVMTHB ) * BF17; Page 22

As you can see, predictors transformed by MARS can be copied and pasted from MARS to SAS directly. They are easy to interpret. We copy and paste them into SAS and run the following SAS code to add them to the data: data data1; set glmcours.data; BF1 = max(0, LGDV - 0.405); BF2 = max(0, 0.405 - LGDV ); BF59 = max(0, 16.000 - DEVMTHB ) * BF17; run ; Page 23

SAS code for MARS-enhanced GLM. PROC GENMOD DATA=DATA1; model Claims = BF1 BF2 BF4 BF5 BF6 BF7 BF8 BF9 BF11 BF12 BF13 BF14 BF17 BF19 BF20 BF21 BF23 BF24 BF26 BF27 BF30 BF31 BF32 BF34 BF35 BF38 BF40 BF42 BF46 BF49 BF50 BF53 BF55 BF59 / DIST=POISSON OFFSET= LGVH LINK=LOG TYPE3 ; scwgt WEIGHT; OUTPUT OUT= residuals STDRESDEV=strdev_CLAIMS XBETA=xbeta_CLAIMS PREDICTED=pred_CLAIMS; RUN; Page 24

Comparison of Models

Goodness of Fit Comparison. The fit of both GLMs was assessed by usual methods such as ratio of deviance to the degrees of freedom, predictor significance, link test and residual analysis. Additionally we assessed gains charts for both models. The MARS - enhanced GLM has shown a similar if not slightly better fit to the hand-fitted GLM. Page 26

MARS-enhanced GLM. Significance of MARS basis functions. The GLM output showed that most of these variables were significant. LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq BF1 1 166.43 <.0001 BF2 1 167.61 <.0001 BF59 1 43.67 <.0001 We then used backward elimination to refine the model by excluding the inputs that were not significant. Page 27

Analysis of actual versus expected values of the number of claims. The records were ranked from highest to lowest in terms of predicted number of claims for each model, and then segmented into 20 equally sized groups. The average predicted and actual values of the number of claims for each group were then calculated and graphed. Claim frequency. Hand-fitted GLM Claim frequency. MARS-enhanced GLM 30,000 30,000 number of claims 20,000 10,000 number of claims 20,000 10,000-0.05 0.2 0.35 0.5 0.65 % of data 0.8 0.95-0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 % of data 0.95 Page 28

Comparison Of the Fit The models fit equally well, with the MARS - enhanced GLM having a marginally better fit. The hand-fitted GLM slightly over-predicts for the fifth group and under-predicts for the third group the MARS - enhanced GLM predicts well for the higher expected numbers of claims but slightly overpredicts for the groups with lower numbers of claims. the scale of these errors is of little business importance. Page 29

Gains chart for number of incurred claims for both models. The gains chart graph shows that the models perform equally well. 1.2 Gains Chart Detailed analysis of the actual tables that gains charts are based on suggests that the MARS - enhanced GLM performs marginally better than the hand-fitted model. 1 0.8 0.6 0.4 0.2 0 Hand-fitted GLM MARS GLM base line Page 30

ARS -created vs hand-transformed variables and predictor interactions: imilarities and differences. MARS -created predictor variables and those which were manually created showed a great degree of similarity. For example, if we compare the predictors based on the variable development month, MARS placed knots mostly at the same points that were found important by the hand-fitted model. The differences included the fact that the hand-fitted model included the variable minimum (development month, 10) which is equal to 10 if development month is greater than 10 and is equal to development month otherwise, while MARS selected 9 rather than 10 as the knot point. MARS also selected interactions of predictors that were not picked up by the hand-fitted model such as the interaction of development month and experience month. Page 31

Interpretability of the models. Interpretability of the models was similar. The hand-fitted model was easier to interpret because it included less predictors and less predictor interactions than the MARS - enhanced model. Page 32

Findings and results. The fit and precision of the models was similar with the MARS -enhanced GLM showing a slightly better fit than the hand-fitted GLM. The MARS -enhanced GLM included predictor interactions not picked up by the hand-fitted model. This effect would be more pronounced in raw data modelling than in modelling summarised data, as the trends observed are likely to be smoother and easier to identify as random variation cancels out. The hand-fitted model was easier to interpret because it included less predictors and less predictor interactions than the other model. Building of the MARS -enhanced GLM was considerably faster and more efficient. These findings suggest that MARS is a useful tool to enhance and expedite GLM modelling. Page 33

Future directions. For large data sets, our team has found that combining decision trees (CART ) with MARS and GLM proves quite effective as described in Kolyshkina & Brookes, 2002. Also as an additional check of fit of a GLM model, a parallel model built in MARS can be used. The model will be built as a part of the stage described previously and will not require additional time. The model equation can be copied and pasted from MARS to SAS or another package directly. Comparison of the predicted values of this model to the GLM model graphically and numerically can suggest some insights and inform the choice of the best of the models. Page 34

The results described above as well as a number of projects completed by our team for large insurer clients demonstrate that combining GLM with data mining methodologies such as MARS can be very useful for actuarial analysis.