Machine Learning Duncan Anderson Managing Director, Willis Towers Watson
|
|
- Amos Chapman
- 5 years ago
- Views:
Transcription
1 Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018
2 GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2
3 This is not all new Actuaries adopt GLMs First computer Neural nets Trees GLMs CART Random forests GBMs 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s Deep Blue vs Kasparov AlphaGo vs Lee Sedol Today GIRO 1996 Hyper scale parallel computing NoSQL databases Free software environments, languages, analytics libraries Data stream and real-time processing supporting IoT Distributed Big Data storage/ processing with Hadoop Data visualisation tools Integrated environments and services 3
4 There is a spectrum of complexity Vastly more risky than North Korea GLM stepwise macro AI comprehension Bespoke image recognition Speech analytics Machine learning predive modelling Full autonomous driving Object recognition Topic modelling Automated GLMs Hard Evolving Requires significant expertise Not at all hard Already in use Existing teams can normally do this stuff 4
5 Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 5
6 Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 6
7 AD claim frequency AD claim frequency AD claim frequency AD claim frequency Multivariate adaptive regression splines ( Earth ) Age Age Age Age 7
8 Penalised Regression GLM Lasso Ridge f(x) = g -1 (X.b) where b estimated by minimising L(β X, y) + λ 1 i β i + λ 2 2 β i i Elastic Net Ridge i β i 2 Elastic Net Lasso i β i Heavily penalises large parameters, but does not reduce parameters to zero Mix of the two Penalty reduces insignificant parameter values to zero - useful for variable selection on client use only. 8
9 Penalised Regression GLM Lasso Ridge f(x) = g -1 (X.b) where b estimated by minimising L(β X, y) + λ 1 i β i + λ 2 2 β i i Elastic Net Test Test Test Test Test Holdout Holdout Holdout Holdout Holdout on client use only. 9
10 Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 10
11 Neural networks x 1 = Age x 1 =21 w 1,1,1 α 1,1 w 2,1,1 α 2,1 w 1,1,2 w 2,1,3 w 2,1,2 w 3,1 w 1,1,3 w 2,2,1 α 1,2 w 2,2,2 α 2,2 w 3,2 y w 1,2,1 w 2,2,3 w 2,3,1 w 3,3 w 1,2,2 w 2,3,2 x 2 = Vehicle group x 2 =5 w 1,2,3 α 1,3 w 2,3,3 α 2,3 11
12 Neural networks - some assumptions required! 12
13 Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 13
14 Age Decision trees Group Group All data < 5? Y N Age < 40? Y N Group < 15? Y N 14
15 Random Forests Group All data < 5? Y N Group < 15? Y Age < 40? Y N N 1 N A random forest f x = 1 N N n=1 f n (x)
16 Gradient Boosted Machine or GBM A GBM Group All data < 5? Y N Group < 15? Age < 40? Y N f x = λ N n=1 f n (x) λ + λ + λ + λ + λ + λ + λ + λ + Y N λ + λ + λ + λ + λ + λ + λ + λ 16
17 Gradient Boosted Machine or GBM l (learning rate or shrinkage ) N (number of trees) Bag fraction (proportion of data used at each iteration) Interaction depth (number of splits on each tree) f x = λ A GBM N n=1 f n (x) λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ 17
18 Gradient Boosted Machine or GBM l (learning rate or shrinkage ) N (number of trees) Bag fraction (proportion of data used at each iteration) Optimal model of those considered Interaction depth (number of splits on each tree) 18
19 Example machine learning methods Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 19
20 Do they add value? Ensemble Methods Classifications Trees "Earth" Gradient Boosting Machines K-Means Clustering Support Vector Machines Elastic Net Neural Networks Naïve Bayes Random Forests Regression Trees Principal Components Analysis Lasso K-nearest Neighbours Ridge Regression 20
21 Dimensions of utility Method 21
22 Dimensions of utility Method Think of a model Multiply it by 123 Square it Add 74½ billion and you get the same Gini coefficient! 22
23 Dimensions of utility Method Model RMS Error GLM 34.7% Neural Net 33.1% GBM 31.0% 23
24 Response Dimensions of utility 6.4% % Method 5.4% % % < 88% 88% - 90% 90% - 92% 92% - 94% 94% - 96% 96% - 98% 98% - 100% 100% - 102% 102% - 104% Proposed Model / Current Model 104% - 106% 106% - 108% 108% - 110% 110% - 112% > 112% 0 Observed Current Model Proposed Model 24
25 Dimensions of utility Method Loss ratio improvement 3.1%! 25
26 Dimensions of utility Method 26
27 Dimensions of utility Method 27
28 Dimensions of utility Method 28
29 Dimensions of utility How much do you need to understand? How much would you normally understand? (eg vehicle classification) Cost of error? (eg marketing) Regulatory requirements Professional standards Comfort diagnostics Method Model Comfort Diagnostics 6.4% Response 5.9% 5.4% Pre/post adjustments % % < 88% 88% - 90% - 92% - 94% - 96% - 98% - 100% - 102% - 104% - 106% - 108% - 110% - > 112% 90% 92% 94% 96% 98% 100% 102% 104% 106% 108% 110% 112% Proposed Model / Current Model Observed Current Model Proposed Model 29
30 % 10% 40% 40% Dimensions of utility environment environment Next gen rating engine Main Po Admin Sy Method -2.0% +2.0% Current Rate -5.0% +5.0% New Business Price Next generation rating engine Policy administration system 30
31 Dimensions of utility Method 31
32 A toolkit Penalised Regression GBMs Table Table GLM Table "Earth" Table Method Neural Networks Support Vector Machine Table Table Trees Random Forests Table Table 32
33 That is already in use Marketing and Distribution Customer management Underwriting and risk management Pricing Claims management Reserving 2017 US market survey For which business applications do you use or plan to use these methods? Underwriting/Pricing Claims Marketing Generalized linear models (GLMs) One-way analyses 84% 94% 54% 78% 61% 58% Method Decision trees 55% 54% 58% Model combining methods (e.g., stacking, blending) 41% 35% 27% Gradient boosting machines (GBMs) 37% 32% 24% Random forest (RF) 41% 35% 36% Penalized regression methods (e.g., lasso, ridge, elastic net) 41% 30% 27% Neural networks 37% 41% 24% Generalized additive models (GAMs) 37% 19% 21% Support vector machines 20% 19% 12% Willis Towers Watson Predictive Modeling Survey
34 That spectrum of complexity Happening now AI comprehension Bespoke image recognition Speech analytics Machine learning predive modelling Full autonomous driving Object recognition Topic modelling Automated GLMs This end could be interesting 34
35 35
36 So Machine learning is already in use Actuaries are already involved It s not just about methods Data beats models It s not just about methods Working out what to model matters - Data beats factor engineering beats models It s not just about predictiveness A broader set of problems can be analysed - rapid basic insight adds value Evolution not revolution Models are complementary to existing methods 36
37 Issues for the Profession(s) Training A generation less familiar with stats? Role of the actuary Domain expertise matters (at least currently) Easier for an actuary to pick up machine learning than for a data scientist to understand insurance? Siloed teams don t work Familiarity and the right vernacular can help Scope of involvement? Pricing Reserving Claims analytics Customer management? Marketing??? CAS, SOA ahead? (eg CSPA) GIRO too big now to help? IFoA on the case, but fast enough? Regulatory issues TAS: Judgement - what judgement? GDPR Government Select Committee (Science and Technology) 37
38 Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018
Overview and Practical Application of Machine Learning in Pricing
Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationData mining techniques for actuaries: an overview
Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of
More informationModel Inference and Averaging. Baging, Stacking, Random Forest, Boosting
Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationPractical Guidance for Machine Learning Applications
Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering
More informationADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationInterpretable Machine Learning with Applications to Banking
Interpretable Machine Learning with Applications to Banking Linwei Hu Advanced Technologies for Modeling, Corporate Model Risk Wells Fargo October 26, 2018 2018 Wells Fargo Bank, N.A. All rights reserved.
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationScalable Machine Learning in R. with H2O
Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationMachine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums
Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September
More informationF-SECURE S UNIQUE CAPABILITIES IN DETECTION & RESPONSE
TECHNOLOGY F-SECURE S UNIQUE CAPABILITIES IN DETECTION & RESPONSE Jyrki Tulokas, EVP, Cyber security products & services UNDERSTANDING THE THREAT LANDSCAPE Human orchestration NATION STATE ATTACKS Nation
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationFrom Building Better Models with JMP Pro. Full book available for purchase here.
From Building Better Models with JMP Pro. Full book available for purchase here. Contents Acknowledgments... ix About This Book... xi About These Authors... xiii Part 1 Introduction... 1 Chapter 1 Introduction...
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationSparkling Water. August 2015: First Edition
Sparkling Water Michal Malohlava Alex Tellez Jessica Lanford http://h2o.gitbooks.io/sparkling-water-and-h2o/ August 2015: First Edition Sparkling Water by Michal Malohlava, Alex Tellez & Jessica Lanford
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationMachine Learning: An Applied Econometric Approach Online Appendix
Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail
More informationBIG DATA SCIENTIST Certification. Big Data Scientist
BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,
More informationLecture 17: Smoothing splines, Local Regression, and GAMs
Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationRandom Forests and Boosting
Random Forests and Boosting Tree-based methods are simple and useful for interpretation. However they typically are not competitive with the best supervised learning approaches in terms of prediction accuracy.
More informationIntroduction to Automated Text Analysis. bit.ly/poir599
Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last
More information7. Boosting and Bagging Bagging
Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known
More informationPredicting Service Outage Using Machine Learning Techniques. HPE Innovation Center
Predicting Service Outage Using Machine Learning Techniques HPE Innovation Center HPE Innovation Center - Our AI Expertise Sense Learn Comprehend Act Computer Vision Machine Learning Natural Language Processing
More informationUsing Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers
Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Why enhance GLM? Shortcomings of the linear modelling approach. GLM being
More informationGradient LASSO algoithm
Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationLecture 20: Bagging, Random Forests, Boosting
Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationCPSC Coding Project (due December 17)
CPSC Coding Project (due December 17) matlearn For the coding project, as a class we are going to develop a new Matlab toolbox for supervised learning, called matlearn. This toolbox will make a wide range
More informationAdvanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach
Advanced and Predictive Analytics with JMP 12 PRO JMP User Meeting 9. Juni 2016 -Schwalbach Definition Predictive Analytics encompasses a variety of statistical techniques from modeling, machine learning
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationCPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class
More informationThe Basics of Decision Trees
Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationMachine Learning Part 1
Data Science Weekend Machine Learning Part 1 KMK Online Analytic Team Fajri Koto Data Scientist fajri.koto@kmklabs.com Machine Learning Part 1 Outline 1. Machine Learning at glance 2. Vector Representation
More informationNonparametric Classification Methods
Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)
More informationREPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER
REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER ESTIMATION Problems with MLE known since Charles Stein 1956 paper He showed that when estimating 3 or more means, shrinking
More information1 Training/Validation/Testing
CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.
More informationGLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015
GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationMachine Learning with Python
DEVNET-2163 Machine Learning with Python Dmitry Figol, SE WW Enterprise Sales @dmfigol Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationData Science Tutorial
Eliezer Kanal Technical Manager, CERT Daniel DeCapria Data Scientist, ETC Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 2017 SEI SEI Data Science in in Cybersecurity Symposium
More informationSpecies distribution modelling for combined data sources
Species distribution modelling for combined data sources Ian Renner and Olivier Gimenez. oaggimenez oliviergimenez.github.io Ian Renner - Australia 1 Outline Background (Species Distribution Models) Combining
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationGENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute
GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationREGULARIZED REGRESSION FOR RESERVING AND MORTALITY MODELS GARY G. VENTER
REGULARIZED REGRESSION FOR RESERVING AND MORTALITY MODELS GARY G. VENTER TODAY Advances in model estimation methodology Application to data that comes in rectangles Examples ESTIMATION Problems with MLE
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationKNIME for the life sciences Cambridge Meetup
KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More
More informationMachine Learning in Action
Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting
More informationINTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN
INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN MOTIVATION FOR RANDOM FOREST Random forest is a great statistical learning model. It works well with small to medium data. Unlike Neural Network which requires
More informationBusiness Data Analytics
MTAT.03.319 Business Data Analytics Lecture 9 The slides are available under creative common license. The original owner of these slides is the University of Tartu Fraud Detection Wrongful act for financial
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationComputer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging
Prof. Daniel Cremers 8. Boosting and Bagging Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T
More informationData Mining Lecture 8: Decision Trees
Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationPredicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions
2016 IEEE International Conference on Big Data (Big Data) Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions Jeff Hebert, Texas Instruments
More informationSCIENCE. An Introduction to Python Brief History Why Python Where to use
DATA SCIENCE Python is a general-purpose interpreted, interactive, object-oriented and high-level programming language. Currently Python is the most popular Language in IT. Python adopted as a language
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationUsing Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear
Using Machine Learning to Identify Security Issues in Open-Source Libraries Asankhaya Sharma Yaqin Zhou SourceClear Outline - Overview of problem space Unidentified security issues How Machine Learning
More informationDoing the Data Science Dance
Doing the Data Science Dance Dean Abbott Abbott Analytics, SmarterHQ KNIME Fall Summit 2018 Email: dean@abbottanalytics.com Twitter: @deanabb 1 Data Science vs. Other Labels 2 Google Trends 3 Abbott Analytics,
More informationA study of classification algorithms using Rapidminer
Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja
More informationDemystifying Machine Learning
Demystifying Machine Learning Dmitry Figol, WW Enterprise Sales Systems Engineer - Programmability @dmfigol CTHRST-1002 Agenda Machine Learning examples What is Machine Learning Types of Machine Learning
More informationCOURSE WEBPAGE. Peter Orbanz Applied Data Mining
INTRODUCTION COURSE WEBPAGE http://stat.columbia.edu/~porbanz/un3106s18.html iii THIS CLASS What to expect This class is an introduction to machine learning. Topics: Classification; learning ; basic neural
More informationVIDEO SCREEN EXPLANATION
INTRODUCTION The Actuarial Laboratory (A.L.) is an interactive, user friendly and powerful software to produce easily and rapidly sophisticated statistical analysis concerning mass risk insurance. The
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationCART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology
CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationOracle Machine Learning Notebook
Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com
More informationLecture 16: High-dimensional regression, non-linear regression
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we
More informationGLMSELECT for Model Selection
Winnipeg SAS User Group Meeting May 11, 2012 GLMSELECT for Model Selection Sylvain Tremblay SAS Canada Education Copyright 2010 SAS Institute Inc. All rights reserved. Proc GLM Proc REG Class Statement
More informationEnd-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.
End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationPractical Machine Learning Agenda
Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data
More informationEvolution of Regression II: From OLS to GPS to MARS Hands-on with SPM
Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM March 2013 Dan Steinberg Mikhail Golovnya Salford Systems Salford Systems 2013 1 Course Outline Today s Webinar: Hands-on companion
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationPS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:
Economics 1660: Big Data PS 6: Regularization Prof. Daniel Björkegren PART A: (Source: HTF page 95) The Ridge regression problem is: : β "#$%& = argmin (y # β 2 x #4 β 4 ) 6 6 + λ β 4 #89 Consider the
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More informationPeople risk. Capital risk. Technology risk
Decode secure. People risk Capital risk Technology risk Cybersecurity needs a new battle plan. A better plan that deals with the full spectrum of your company s cybersecurity not just your technology.
More informationCertified Data Science with Python Professional VS-1442
Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional Certified Data Science with Python Professional Certification Code VS-1442 Data science has become
More information