Overview and Practical Application of Machine Learning in Pricing

Size: px

Start display at page:

Download "Overview and Practical Application of Machine Learning in Pricing"

Phebe Butler
6 years ago
Views:

and Claudine Modlin (Willis Towers Watson) Mark Richards

1 Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company) 2017 Willis Towers Watson. All rights reserved.

2 Agenda Agenda Context of machine learning in pricing Penalized regression methods Trees, random forests and GBMs Objective: to give you a working knowledge of some machine learning methods that may be used to improve GLM results and/or offer valuable insights in their own right in the field of P&C insurance pricing Conclusions Q&A 2

Applications of machine learning in the insurance sector Marketing and Distribution

management Product development (UBI) Retention segmentation and program design

2500 2000 1500 1000 500-100 Low rating factor High rating factor 0 exposure excess res

3 Applications of machine learning in the insurance sector Marketing and Distribution Customer management Underwriting and risk management Pricing Claims management Asset management Product development (UBI) Retention segmentation and program design Agent/Broker performance evaluation Fraud detection Low rating factor High rating factor 0 exposure excess res P&C to Life cross-selling Web-scraping and feature design in commercial lines Topic modeling and large loss modeling 3

4 Focus of today s talk Pricing 4

5 This is not new. Few factors, simple methods Data enrichment GLMs in demand models GLMs in auto risk models GLM refinement & LOB expansion GLMs Integrating cost and demand More data enrichment 1990s 2000s 2010s 2017 Other Non-GLM models Distributed Big Data storage/ Hadoop Data visualisation tools Machine learning Integrated environments and services Hyper scale parallel computing NoSQL databases Free software environments, analytics libraries Data stream and real-time processing supporting IoT 5

6 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression

7 Does it work? AIC Gini RMSE MAE Log loss 7

8 How do you measure value? Gini Rank hold out observations by their fitted values (high to low) Plot cumulative response by cumulative exposure A better model will explain a higher proportion of the response with a lower proportion of exposure and will give a higher Gini coefficient (yellow area) 8

9 Example results Model Gini GLM

10 Example results Model Gini GLM New Model

11 Example results Model Gini Gini improvement GLM % New Model % 11

12 Example results Model Gini Gini improvement Gini rank GLM (main factor removed) % 4 GLM (minor factor removed) % 3 GLM % 2 New Model % 1 12

13 But Think of a model Multiply it by 123 Square it Add 74½ billion and you get the same Gini coefficient! 13

14 Double lift chart 6.4% % Response 5.4% % % < 88% 88% - 90% 90% - 92% 92% - 94% 94% - 96% 96% - 98% 98% - 100% 100% - 102% 102% - 104% 104% - 106% 106% - 108% 108% - 110% 110% - 112% > 112% 0 Proposed Model / Current Model Observed Current Model Proposed Model 14

15 Financial value estimate Errors in insurance pricing are not symmetrical Financial benefit can be estimated 15

16 Example results Results redacted 16

17 Financial value vs Gini 17

18 Is it really all about the method? The problem dimension The method dimension Original problem, solved with original methods Original problem, solved with new techniques Newly defined problem, solved with original methods Newly defined problem, solved with new techniques 18

19 Where is the value? The problem dimension The method dimension $ $$ $$ $$$ 19

20 The problem dimension The problem dimension The method dimension Original problem, solved with original methods Original problem, solved with new techniques Newly defined problem, solved with original methods Newly defined problem, solved with new techniques 20

21 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression

22 Choosing a method Dimensions of choice Predictive power Analytical time and effort Interpretation Method Execution speed Table implementation Stability 22

23 Predictive power Analytical time and effort Interpretation GLM Execution speed Table Implementation Stability 23

24 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression

25 Penalized Regression GLM Lasso Ridge f(x) = g -1 (X.b) where b estimated by minimizing (, ) + + Elastic Net Ridge Elastic Net Lasso Heavily penalize large parameters, but does not reduce parameters to zero Mix of the two Penalty reduces insignificant parameter values to zero - useful for variable selection 25

Penalized Regression Parameter selection Minimize: (,)+ + Penalty parameters can be re-written: =, = controls the mixture between Lasso ( =1) and Ridge ( =0) controls the overall size of the penalty,

26 Penalized Regression Parameter selection Minimize: (,)+ + Penalty parameters can be re-written: =, = controls the mixture between Lasso ( =1) and Ridge ( =0) controls the overall size of the penalty, selected using cross-validation Factors automatically selected from initial set! Optimal, combination Cross-validation error log(lambda) alpha 26

27 Penalized Regression Case study vehicle classification Physical facticity E.g., height, length, weight Mechanical nature E.g., engine size, fuel type Qualitative descriptors E.g., body type, model range Performance E.g., maximum speed, torque, BHP 27

28 Penalized Regression Case study vehicle classification Results redacted 28

29 Deploying Penalized Regression Same as GLMs! Age Multiplier < Vehicle Group Multiplier Gender Multiplier Male 1.00 Female

30 Predictive Power Analytical time and effort Interpretation Penalized Regression Execution speed Table Implementation Stability 30

31 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression

32 Decision Trees Group Group All data < 5? Y N Age < 40? Y N Group < 15? Age Y N 32

33 Random Forests A tree ( ) A random forest = 1 ( ) Group All Data < 5? Y N Age < 40? Y N Group < 15? Y N

34 Gradient Boosting Machine or GBM A tree ( ) A GBM = λ ( ) Group All Data < 5? Y N λ + λ + λ + λ + Age < 40? Y N Group < 15? Y N λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ 34

35 Gradient Boosted Machine or GBM Four main assumptions l Learning rate / shrinkage Amount by which the old model predictions are varied for the next model iteration New model = Old + (Prediction x Learning rate) Interaction depth Number of splits allowed on each tree (or the number of terminal nodes 1) N Number of trees (iterations) allowed Bag fraction Trees are fitted to a subset of the data (the bag fraction) on a randomized basis Additional noise-reduction can be achieved by using a random subset of the available factors at each iteration A GBM = λ ( ) λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ 35

36 Gradient Boosted Machine or GBM Four main assumptions l Learning rate / shrinkage Amount by which the old model predictions are varied for the next model iteration New model = Old + (Prediction x Learning rate) Interaction depth Number of splits allowed on each tree (or the number of terminal nodes 1) N Number of trees (iterations) allowed Bag fraction Trees are fitted to a subset of the data (the bag fraction) on a randomized basis Additional noise-reduction can be achieved by using a random subset of the available factors at each iteration Fit Fit Test Fit Test Fit Best result shown by this point on brown line (interaction depth 2 and learning rate 2% in this case) Test Fit Fit 36

37 What does a GBM look like?

38 What does a GBM look like?

39 Does it work? How does it work?

40 GBM value add Results redacted 40

41 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression

42 Adding an ensemble Results redacted 42

43 What about a modelled down GBM? Results redacted 43

44 What about an automated GLM? 44

45 What about an automated GLM? Results redacted 45

46 Does it work? How does it work?

47 Partial dependency plots etc 47

48 Deploying GBMs Model down into multiplicative tables via GLMs Age Exposure Burning Cost Vehicle Group Exposure Burning Cost Corner correctors and prebaked interactions Factor Reduction Establish Model Hierarchy 1 <=20 1, , , , , , , , Age Total 281, VG Total 281, Gender Exposure Burning Cost 1 Male 197, Use insights to guide GLM 2 Female 84, Gender Tot al 281, Deploy directly 48

0% - 102% > 112% 10 000 0 800 00 60000 40 000

Model / Current Model 10 4% - 106 % 106 %- 10 8%

49 6.4% 5.9% 5. 4% 4. 9% 4. 4% 90% - 92% 92% 94% - 94 %- 96% - 96% 100 % 10 0% - 102% > 112% Deploying GBMs Pre / post mapping Comfort Diagnostics Res po ns e Data Model < 88% 88 % - 90% 98 % 98 % - 102% % Proposed Model / Current Model 10 4% % 106 %- 10 8% Obs erved Curren tmod el Proposed Model 108% % 110 %- 112% Deploy directly 49

50 Predictive power Analytical time and effort Interpretation GBMs Execution speed Table Implementation Stability 50

51 Predictive power Predictive power Analytical time and effort Interpretation Analytical time and effort Interpretation Penalized Regression GBMs Predictive power Execution speed Table Implementation Predictive power Execution speed Table Implementation Analytical time and effort Interpretation Stability Analytical time and effort Interpretation Stability GLM "Earth" Execution speed Tab l e Implementation Execution speed Table Implementation Stability Stability Predictive power Predictive power Analytical time and effort Interpretation Analytical time and effort Interpretation Predictive power Neural Networks Predictive power Support Vector Machine Analytical time and effort Interpretation Execution speed Stability Table Implementation Analytical time and effort Interpretation Execution speed Stability Table Implementation Trees Random Forests Execution speed Table Implementation Execution speed Table Implementation Stability Stability 51

52 How is the North American market doing with machine learning? Methods used For which business applications do you use or plan to use these methodologies? (Q.13) Modeling Techniques Loss Cost Modeling Claims Analytics Marketing Generalized linear models (GLMs) 100% 42% 37% One-way analyses 87% 40% 37% Decision trees 48% 31% 30% Model combining methods 35% 23% 19% Gradient Boosting Machines (GBMs) 30% 17% 15% Penalized regression methods 30% 17% 22% Random Forest (RF) 26% 21% 22% Other ensemble methods 24% 17% 19% Other Machine Learning methods Grid search techniques 11% 26% 8% 33% 19% 15% Primary Secondary Tertiary Base: U.S. respondents who use or plan to use the methodology for the application specified (Loss Cost Modeling n = 46, Claims Analytics n = 48, Marketing n = 27). 52

53 Machine learning in personal lines pricing: evolution or revolution? Conclusions It s not all about methods Domain expertise in formulating the problem can be more important New (wider) data generally adds more lift than new methods There are practical ways to assess model improvement as well as statistical Predictions: Many methods can augment the traditional GLM modeling process in particular with growing datasets Others (e.g., GMBs) can improve prediction in own right but these require different approaches in interpretation and technology/deployment Methods support predictions in new areas Very wide datasets Market rate analysis Cross-selling and other customer behaviors UW, claims, etc But it s not all about predictive power Fast investigation of new or messy data Data scientists & statisticians Domain experts Quick assessment of emerging experience Operational efficiency Industry (and Profession) has work to do in developing machine learning skills integrated with domain expertise 53

54 Contacts Claudine Modlin Director, Risk Consulting & Software Willis Towers Watson Duncan Anderson Managing Director, Risk Consulting & Software Willis Towers Watson Mark Richards Director, Allstate Personal Lines D 3 : Data, Discovery & Decision Science Mark.Richards@allstate.com

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018 GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2 This is not all new Actuaries adopt