Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Similar documents
Overview and Practical Application of Machine Learning in Pricing

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Data mining techniques for actuaries: an overview

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Practical Guidance for Machine Learning Applications

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

The exam is closed book, closed notes except your one-page cheat sheet.

Interpretable Machine Learning with Applications to Banking

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Scalable Machine Learning in R. with H2O

Random Forest A. Fornaser

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

F-SECURE S UNIQUE CAPABILITIES IN DETECTION & RESPONSE

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

From Building Better Models with JMP Pro. Full book available for purchase here.

Network Traffic Measurements and Analysis

Model selection and validation 1: Cross-validation

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Machine Learning. Chao Lan

Sparkling Water. August 2015: First Edition

Data Science Bootcamp Curriculum. NYC Data Science Academy

Python With Data Science

Machine Learning: An Applied Econometric Approach Online Appendix

BIG DATA SCIENTIST Certification. Big Data Scientist

Lecture 17: Smoothing splines, Local Regression, and GAMs

Using Machine Learning to Optimize Storage Systems

Random Forests and Boosting

Introduction to Automated Text Analysis. bit.ly/poir599

7. Boosting and Bagging Bagging

Predicting Service Outage Using Machine Learning Techniques. HPE Innovation Center

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Gradient LASSO algoithm

Machine Learning / Jan 27, 2010

Lecture 20: Bagging, Random Forests, Boosting

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Business Club. Decision Trees

ECS289: Scalable Machine Learning

CPSC Coding Project (due December 17)

Advanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

The Basics of Decision Trees

Fast or furious? - User analysis of SF Express Inc

Machine Learning Part 1

Nonparametric Classification Methods

REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER

1 Training/Validation/Testing

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

Variable Selection 6.783, Biomedical Decision Support

Machine Learning with Python

Applying Supervised Learning

Allstate Insurance Claims Severity: A Machine Learning Approach

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Data Science Tutorial

Species distribution modelling for combined data sources

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Lecture 13: Model selection and regularization

REGULARIZED REGRESSION FOR RESERVING AND MORTALITY MODELS GARY G. VENTER

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Predict Outcomes and Reveal Relationships in Categorical Data

KNIME for the life sciences Cambridge Meetup

Machine Learning in Action

INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN

Business Data Analytics

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Data Mining Lecture 8: Decision Trees

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Supervised Learning Classification Algorithms Comparison

Machine Learning: Think Big and Parallel

Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions

SCIENCE. An Introduction to Python Brief History Why Python Where to use

Nonparametric Methods Recap

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Doing the Data Science Dance

A study of classification algorithms using Rapidminer

Demystifying Machine Learning

COURSE WEBPAGE. Peter Orbanz Applied Data Mining

VIDEO SCREEN EXPLANATION

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Oracle Machine Learning Notebook

Lecture 16: High-dimensional regression, non-linear regression

GLMSELECT for Model Selection

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Practical Machine Learning Agenda

Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

People risk. Capital risk. Technology risk

Certified Data Science with Python Professional VS-1442

Transcription:

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018

GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2

This is not all new Actuaries adopt GLMs First computer Neural nets Trees GLMs CART Random forests GBMs 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s Deep Blue vs Kasparov AlphaGo vs Lee Sedol Today GIRO 1996 Hyper scale parallel computing NoSQL databases Free software environments, languages, analytics libraries Data stream and real-time processing supporting IoT Distributed Big Data storage/ processing with Hadoop Data visualisation tools Integrated environments and services 3

There is a spectrum of complexity Vastly more risky than North Korea GLM stepwise macro AI comprehension Bespoke image recognition Speech analytics Machine learning predive modelling Full autonomous driving Object recognition Topic modelling Automated GLMs Hard Evolving Requires significant expertise Not at all hard Already in use Existing teams can normally do this stuff 4

Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 5

Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 6

AD claim frequency AD claim frequency AD claim frequency AD claim frequency Multivariate adaptive regression splines ( Earth ) 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 Age 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 Age 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 Age 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 Age 7

Penalised Regression GLM Lasso Ridge f(x) = g -1 (X.b) where b estimated by minimising L(β X, y) + λ 1 i β i + λ 2 2 β i i Elastic Net Ridge i β i 2 Elastic Net Lasso i β i Heavily penalises large parameters, but does not reduce parameters to zero Mix of the two Penalty reduces insignificant parameter values to zero - useful for variable selection on client use only. 8

Penalised Regression GLM Lasso Ridge f(x) = g -1 (X.b) where b estimated by minimising L(β X, y) + λ 1 i β i + λ 2 2 β i i Elastic Net 1 2 3 4 5 Test Test Test Test Test Holdout Holdout Holdout Holdout Holdout on client use only. 9

Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 10

Neural networks x 1 = Age x 1 =21 w 1,1,1 α 1,1 w 2,1,1 α 2,1 w 1,1,2 w 2,1,3 w 2,1,2 w 3,1 w 1,1,3 w 2,2,1 α 1,2 w 2,2,2 α 2,2 w 3,2 y w 1,2,1 w 2,2,3 w 2,3,1 w 3,3 w 1,2,2 w 2,3,2 x 2 = Vehicle group x 2 =5 w 1,2,3 α 1,3 w 2,3,3 α 2,3 11

Neural networks - some assumptions required! 12

Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 13

Age Decision trees Group Group All data < 5? Y N Age < 40? Y N Group < 15? Y N 14

Random Forests Group All data < 5? Y N Group < 15? Y Age < 40? Y N N 1 N A random forest f x = 1 N N n=1 f n (x) + + + + + + + + + + + + + + + 15

Gradient Boosted Machine or GBM A GBM Group All data < 5? Y N Group < 15? Age < 40? Y N f x = λ N n=1 f n (x) λ + λ + λ + λ + λ + λ + λ + λ + Y N λ + λ + λ + λ + λ + λ + λ + λ 16

Gradient Boosted Machine or GBM l (learning rate or shrinkage ) N (number of trees) Bag fraction (proportion of data used at each iteration) Interaction depth (number of splits on each tree) f x = λ A GBM N n=1 f n (x) λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ 17

Gradient Boosted Machine or GBM l (learning rate or shrinkage ) N (number of trees) Bag fraction (proportion of data used at each iteration) Optimal model of those considered Interaction depth (number of splits on each tree) 18

Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 19

Do they add value? Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 20

Dimensions of utility Method 21

Dimensions of utility Method Think of a model Multiply it by 123 Square it Add 74½ billion and you get the same Gini coefficient! 22

Dimensions of utility Method Model RMS Error GLM 34.7% Neural Net 33.1% GBM 31.0% 23

Response Dimensions of utility 6.4% 100000 5.9% 80000 Method 5.4% 60000 40000 4.9% 20000 4.4% < 88% 88% - 90% 90% - 92% 92% - 94% 94% - 96% 96% - 98% 98% - 100% 100% - 102% 102% - 104% Proposed Model / Current Model 104% - 106% 106% - 108% 108% - 110% 110% - 112% > 112% 0 Observed Current Model Proposed Model 24

Dimensions of utility Method Loss ratio improvement 3.1%! 25

Dimensions of utility Method 26

Dimensions of utility Method 27

Dimensions of utility Method 28

60000 0 Dimensions of utility How much do you need to understand? How much would you normally understand? (eg vehicle classification) Cost of error? (eg marketing) Regulatory requirements Professional standards Comfort diagnostics Method Model Comfort Diagnostics 6.4% Response 5.9% 5.4% 100000 80000 Pre/post adjustments 40000 4.9% 20000 4.4% < 88% 88% - 90% - 92% - 94% - 96% - 98% - 100% - 102% - 104% - 106% - 108% - 110% - > 112% 90% 92% 94% 96% 98% 100% 102% 104% 106% 108% 110% 112% Proposed Model / Current Model Observed Current Model Proposed Model 29

600 610 620 630 640 650 660 670 680 690 700 710 720 730 740 750 760 770 780 790 800 10% 10% 40% 40% Dimensions of utility environment environment Next gen rating engine Main Po Admin Sy Method -2.0% +2.0% Current Rate -5.0% +5.0% New Business Price Next generation rating engine Policy administration system 30

Dimensions of utility Method 31

A toolkit Penalised Regression GBMs Table Table GLM Table "Earth" Table Method Neural Networks Support Vector Machine Table Table Trees Random Forests Table Table 32

That is already in use Marketing and Distribution Customer management Underwriting and risk management Pricing Claims management Reserving 2017 US market survey For which business applications do you use or plan to use these methods? Underwriting/Pricing Claims Marketing Generalized linear models (GLMs) One-way analyses 84% 94% 54% 78% 61% 58% Method Decision trees 55% 54% 58% Model combining methods (e.g., stacking, blending) 41% 35% 27% Gradient boosting machines (GBMs) 37% 32% 24% Random forest (RF) 41% 35% 36% Penalized regression methods (e.g., lasso, ridge, elastic net) 41% 30% 27% Neural networks 37% 41% 24% Generalized additive models (GAMs) 37% 19% 21% Support vector machines 20% 19% 12% Willis Towers Watson Predictive Modeling Survey 2017 33

That spectrum of complexity Happening now AI comprehension Bespoke image recognition Speech analytics Machine learning predive modelling Full autonomous driving Object recognition Topic modelling Automated GLMs This end could be interesting 34

35

So Machine learning is already in use Actuaries are already involved It s not just about methods Data beats models It s not just about methods Working out what to model matters - Data beats factor engineering beats models It s not just about predictiveness A broader set of problems can be analysed - rapid basic insight adds value Evolution not revolution Models are complementary to existing methods 36

Issues for the Profession(s) Training A generation less familiar with stats? Role of the actuary Domain expertise matters (at least currently) Easier for an actuary to pick up machine learning than for a data scientist to understand insurance? Siloed teams don t work Familiarity and the right vernacular can help Scope of involvement? Pricing Reserving Claims analytics Customer management? Marketing??? CAS, SOA ahead? (eg CSPA) GIRO too big now to help? IFoA on the case, but fast enough? Regulatory issues TAS: Judgement - what judgement? GDPR Government Select Committee (Science and Technology) 37

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018