Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018

GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2

This is not all new Actuaries adopt GLMs First computer Neural nets Trees GLMs CART Random forests GBMs 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s Deep Blue vs Kasparov AlphaGo vs Lee Sedol Today GIRO 1996 Hyper scale parallel computing NoSQL databases Free software environments, languages, analytics libraries Data stream and real-time processing supporting IoT Distributed Big Data storage/ processing with Hadoop Data visualisation tools Integrated environments and services 3

There is a spectrum of complexity Vastly more risky than North Korea GLM stepwise macro AI comprehension Bespoke image recognition Speech analytics Machine learning predive modelling Full autonomous driving Object recognition Topic modelling Automated GLMs Hard Evolving Requires significant expertise Not at all hard Already in use Existing teams can normally do this stuff 4

Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 5

AD claim frequency AD claim frequency AD claim frequency AD claim frequency Multivariate adaptive regression splines ( Earth ) 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 Age 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 Age 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 Age 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 Age 7

Penalised Regression GLM Lasso Ridge f(x) = g -1 (X.b) where b estimated by minimising L(β X, y) + λ 1 i β i + λ 2 2 β i i Elastic Net Ridge i β i 2 Elastic Net Lasso i β i Heavily penalises large parameters, but does not reduce parameters to zero Mix of the two Penalty reduces insignificant parameter values to zero - useful for variable selection on client use only. 8

Penalised Regression GLM Lasso Ridge f(x) = g -1 (X.b) where b estimated by minimising L(β X, y) + λ 1 i β i + λ 2 2 β i i Elastic Net 1 2 3 4 5 Test Test Test Test Test Holdout Holdout Holdout Holdout Holdout on client use only. 9

Neural networks x 1 = Age x 1 =21 w 1,1,1 α 1,1 w 2,1,1 α 2,1 w 1,1,2 w 2,1,3 w 2,1,2 w 3,1 w 1,1,3 w 2,2,1 α 1,2 w 2,2,2 α 2,2 w 3,2 y w 1,2,1 w 2,2,3 w 2,3,1 w 3,3 w 1,2,2 w 2,3,2 x 2 = Vehicle group x 2 =5 w 1,2,3 α 1,3 w 2,3,3 α 2,3 11

Neural networks - some assumptions required! 12

Age Decision trees Group Group All data < 5? Y N Age < 40? Y N Group < 15? Y N 14

Random Forests Group All data < 5? Y N Group < 15? Y Age < 40? Y N N 1 N A random forest f x = 1 N N n=1 f n (x) + + + + + + + + + + + + + + + 15

Gradient Boosted Machine or GBM A GBM Group All data < 5? Y N Group < 15? Age < 40? Y N f x = λ N n=1 f n (x) λ + λ + λ + λ + λ + λ + λ + λ + Y N λ + λ + λ + λ + λ + λ + λ + λ 16

Gradient Boosted Machine or GBM l (learning rate or shrinkage ) N (number of trees) Bag fraction (proportion of data used at each iteration) Interaction depth (number of splits on each tree) f x = λ A GBM N n=1 f n (x) λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ 17

Gradient Boosted Machine or GBM l (learning rate or shrinkage ) N (number of trees) Bag fraction (proportion of data used at each iteration) Optimal model of those considered Interaction depth (number of splits on each tree) 18

Do they add value? Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 20

Dimensions of utility Method 21

Dimensions of utility Method Think of a model Multiply it by 123 Square it Add 74½ billion and you get the same Gini coefficient! 22

Dimensions of utility Method Model RMS Error GLM 34.7% Neural Net 33.1% GBM 31.0% 23

Response Dimensions of utility 6.4% 100000 5.9% 80000 Method 5.4% 60000 40000 4.9% 20000 4.4% < 88% 88% - 90% 90% - 92% 92% - 94% 94% - 96% 96% - 98% 98% - 100% 100% - 102% 102% - 104% Proposed Model / Current Model 104% - 106% 106% - 108% 108% - 110% 110% - 112% > 112% 0 Observed Current Model Proposed Model 24

Dimensions of utility Method Loss ratio improvement 3.1%! 25

Dimensions of utility Method 26

Dimensions of utility Method 27

Dimensions of utility Method 28

60000 0 Dimensions of utility How much do you need to understand? How much would you normally understand? (eg vehicle classification) Cost of error? (eg marketing) Regulatory requirements Professional standards Comfort diagnostics Method Model Comfort Diagnostics 6.4% Response 5.9% 5.4% 100000 80000 Pre/post adjustments 40000 4.9% 20000 4.4% < 88% 88% - 90% - 92% - 94% - 96% - 98% - 100% - 102% - 104% - 106% - 108% - 110% - > 112% 90% 92% 94% 96% 98% 100% 102% 104% 106% 108% 110% 112% Proposed Model / Current Model Observed Current Model Proposed Model 29

600 610 620 630 640 650 660 670 680 690 700 710 720 730 740 750 760 770 780 790 800 10% 10% 40% 40% Dimensions of utility environment environment Next gen rating engine Main Po Admin Sy Method -2.0% +2.0% Current Rate -5.0% +5.0% New Business Price Next generation rating engine Policy administration system 30

Dimensions of utility Method 31

A toolkit Penalised Regression GBMs Table Table GLM Table "Earth" Table Method Neural Networks Support Vector Machine Table Table Trees Random Forests Table Table 32

That is already in use Marketing and Distribution Customer management Underwriting and risk management Pricing Claims management Reserving 2017 US market survey For which business applications do you use or plan to use these methods? Underwriting/Pricing Claims Marketing Generalized linear models (GLMs) One-way analyses 84% 94% 54% 78% 61% 58% Method Decision trees 55% 54% 58% Model combining methods (e.g., stacking, blending) 41% 35% 27% Gradient boosting machines (GBMs) 37% 32% 24% Random forest (RF) 41% 35% 36% Penalized regression methods (e.g., lasso, ridge, elastic net) 41% 30% 27% Neural networks 37% 41% 24% Generalized additive models (GAMs) 37% 19% 21% Support vector machines 20% 19% 12% Willis Towers Watson Predictive Modeling Survey 2017 33

That spectrum of complexity Happening now AI comprehension Bespoke image recognition Speech analytics Machine learning predive modelling Full autonomous driving Object recognition Topic modelling Automated GLMs This end could be interesting 34

So Machine learning is already in use Actuaries are already involved It s not just about methods Data beats models It s not just about methods Working out what to model matters - Data beats factor engineering beats models It s not just about predictiveness A broader set of problems can be analysed - rapid basic insight adds value Evolution not revolution Models are complementary to existing methods 36

Issues for the Profession(s) Training A generation less familiar with stats? Role of the actuary Domain expertise matters (at least currently) Easier for an actuary to pick up machine learning than for a data scientist to understand insurance? Siloed teams don t work Familiarity and the right vernacular can help Scope of involvement? Pricing Reserving Claims analytics Customer management? Marketing??? CAS, SOA ahead? (eg CSPA) GIRO too big now to help? IFoA on the case, but fast enough? Regulatory issues TAS: Judgement - what judgement? GDPR Government Select Committee (Science and Technology) 37

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018