Overview and Practical Application of Machine Learning in Pricing
|
|
- Phebe Butler
- 6 years ago
- Views:
Transcription
1 Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company) 2017 Willis Towers Watson. All rights reserved.
2 Agenda Agenda Context of machine learning in pricing Penalized regression methods Trees, random forests and GBMs Objective: to give you a working knowledge of some machine learning methods that may be used to improve GLM results and/or offer valuable insights in their own right in the field of P&C insurance pricing Conclusions Q&A 2
3 Applications of machine learning in the insurance sector Marketing and Distribution Customer management Underwriting and risk management Pricing Claims management Asset management Product development (UBI) Retention segmentation and program design Agent/Broker performance evaluation Fraud detection Low rating factor High rating factor 0 exposure excess res P&C to Life cross-selling Web-scraping and feature design in commercial lines Topic modeling and large loss modeling 3
4 Focus of today s talk Pricing 4
5 This is not new. Few factors, simple methods Data enrichment GLMs in demand models GLMs in auto risk models GLM refinement & LOB expansion GLMs Integrating cost and demand More data enrichment 1990s 2000s 2010s 2017 Other Non-GLM models Distributed Big Data storage/ Hadoop Data visualisation tools Machine learning Integrated environments and services Hyper scale parallel computing NoSQL databases Free software environments, analytics libraries Data stream and real-time processing supporting IoT 5
6 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression
7 Does it work? AIC Gini RMSE MAE Log loss 7
8 How do you measure value? Gini Rank hold out observations by their fitted values (high to low) Plot cumulative response by cumulative exposure A better model will explain a higher proportion of the response with a lower proportion of exposure and will give a higher Gini coefficient (yellow area) 8
9 Example results Model Gini GLM
10 Example results Model Gini GLM New Model
11 Example results Model Gini Gini improvement GLM % New Model % 11
12 Example results Model Gini Gini improvement Gini rank GLM (main factor removed) % 4 GLM (minor factor removed) % 3 GLM % 2 New Model % 1 12
13 But Think of a model Multiply it by 123 Square it Add 74½ billion and you get the same Gini coefficient! 13
14 Double lift chart 6.4% % Response 5.4% % % < 88% 88% - 90% 90% - 92% 92% - 94% 94% - 96% 96% - 98% 98% - 100% 100% - 102% 102% - 104% 104% - 106% 106% - 108% 108% - 110% 110% - 112% > 112% 0 Proposed Model / Current Model Observed Current Model Proposed Model 14
15 Financial value estimate Errors in insurance pricing are not symmetrical Financial benefit can be estimated 15
16 Example results Results redacted 16
17 Financial value vs Gini 17
18 Is it really all about the method? The problem dimension The method dimension Original problem, solved with original methods Original problem, solved with new techniques Newly defined problem, solved with original methods Newly defined problem, solved with new techniques 18
19 Where is the value? The problem dimension The method dimension $ $$ $$ $$$ 19
20 The problem dimension The problem dimension The method dimension Original problem, solved with original methods Original problem, solved with new techniques Newly defined problem, solved with original methods Newly defined problem, solved with new techniques 20
21 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression
22 Choosing a method Dimensions of choice Predictive power Analytical time and effort Interpretation Method Execution speed Table implementation Stability 22
23 Predictive power Analytical time and effort Interpretation GLM Execution speed Table Implementation Stability 23
24 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression
25 Penalized Regression GLM Lasso Ridge f(x) = g -1 (X.b) where b estimated by minimizing (, ) + + Elastic Net Ridge Elastic Net Lasso Heavily penalize large parameters, but does not reduce parameters to zero Mix of the two Penalty reduces insignificant parameter values to zero - useful for variable selection 25
26 Penalized Regression Parameter selection Minimize: (,)+ + Penalty parameters can be re-written: =, = controls the mixture between Lasso ( =1) and Ridge ( =0) controls the overall size of the penalty, selected using cross-validation Factors automatically selected from initial set! Optimal, combination Cross-validation error log(lambda) alpha 26
27 Penalized Regression Case study vehicle classification Physical facticity E.g., height, length, weight Mechanical nature E.g., engine size, fuel type Qualitative descriptors E.g., body type, model range Performance E.g., maximum speed, torque, BHP 27
28 Penalized Regression Case study vehicle classification Results redacted 28
29 Deploying Penalized Regression Same as GLMs! Age Multiplier < Vehicle Group Multiplier Gender Multiplier Male 1.00 Female
30 Predictive Power Analytical time and effort Interpretation Penalized Regression Execution speed Table Implementation Stability 30
31 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression
32 Decision Trees Group Group All data < 5? Y N Age < 40? Y N Group < 15? Age Y N 32
33 Random Forests A tree ( ) A random forest = 1 ( ) Group All Data < 5? Y N Age < 40? Y N Group < 15? Y N
34 Gradient Boosting Machine or GBM A tree ( ) A GBM = λ ( ) Group All Data < 5? Y N λ + λ + λ + λ + Age < 40? Y N Group < 15? Y N λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ 34
35 Gradient Boosted Machine or GBM Four main assumptions l Learning rate / shrinkage Amount by which the old model predictions are varied for the next model iteration New model = Old + (Prediction x Learning rate) Interaction depth Number of splits allowed on each tree (or the number of terminal nodes 1) N Number of trees (iterations) allowed Bag fraction Trees are fitted to a subset of the data (the bag fraction) on a randomized basis Additional noise-reduction can be achieved by using a random subset of the available factors at each iteration A GBM = λ ( ) λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ + λ 35
36 Gradient Boosted Machine or GBM Four main assumptions l Learning rate / shrinkage Amount by which the old model predictions are varied for the next model iteration New model = Old + (Prediction x Learning rate) Interaction depth Number of splits allowed on each tree (or the number of terminal nodes 1) N Number of trees (iterations) allowed Bag fraction Trees are fitted to a subset of the data (the bag fraction) on a randomized basis Additional noise-reduction can be achieved by using a random subset of the available factors at each iteration Fit Fit Test Fit Test Fit Best result shown by this point on brown line (interaction depth 2 and learning rate 2% in this case) Test Fit Fit 36
37 What does a GBM look like?
38 What does a GBM look like?
39 Does it work? How does it work?
40 GBM value add Results redacted 40
41 What are these methods? Ensemble Methods Classifications Trees "Earth" Regression Trees Gradient Boosting Machines K-nearest Neighbors Elastic Net Neural Networks Naïve Bayes Random Forests K-Means Clustering Principal Components Analysis Lasso Support Vector Machines Ridge Regression
42 Adding an ensemble Results redacted 42
43 What about a modelled down GBM? Results redacted 43
44 What about an automated GLM? 44
45 What about an automated GLM? Results redacted 45
46 Does it work? How does it work?
47 Partial dependency plots etc 47
48 Deploying GBMs Model down into multiplicative tables via GLMs Age Exposure Burning Cost Vehicle Group Exposure Burning Cost Corner correctors and prebaked interactions Factor Reduction Establish Model Hierarchy 1 <=20 1, , , , , , , , Age Total 281, VG Total 281, Gender Exposure Burning Cost 1 Male 197, Use insights to guide GLM 2 Female 84, Gender Tot al 281, Deploy directly 48
49 6.4% 5.9% 5. 4% 4. 9% 4. 4% 90% - 92% 92% 94% - 94 %- 96% - 96% 100 % 10 0% - 102% > 112% Deploying GBMs Pre / post mapping Comfort Diagnostics Res po ns e Data Model < 88% 88 % - 90% 98 % 98 % - 102% % Proposed Model / Current Model 10 4% % 106 %- 10 8% Obs erved Curren tmod el Proposed Model 108% % 110 %- 112% Deploy directly 49
50 Predictive power Analytical time and effort Interpretation GBMs Execution speed Table Implementation Stability 50
51 Predictive power Predictive power Analytical time and effort Interpretation Analytical time and effort Interpretation Penalized Regression GBMs Predictive power Execution speed Table Implementation Predictive power Execution speed Table Implementation Analytical time and effort Interpretation Stability Analytical time and effort Interpretation Stability GLM "Earth" Execution speed Tab l e Implementation Execution speed Table Implementation Stability Stability Predictive power Predictive power Analytical time and effort Interpretation Analytical time and effort Interpretation Predictive power Neural Networks Predictive power Support Vector Machine Analytical time and effort Interpretation Execution speed Stability Table Implementation Analytical time and effort Interpretation Execution speed Stability Table Implementation Trees Random Forests Execution speed Table Implementation Execution speed Table Implementation Stability Stability 51
52 How is the North American market doing with machine learning? Methods used For which business applications do you use or plan to use these methodologies? (Q.13) Modeling Techniques Loss Cost Modeling Claims Analytics Marketing Generalized linear models (GLMs) 100% 42% 37% One-way analyses 87% 40% 37% Decision trees 48% 31% 30% Model combining methods 35% 23% 19% Gradient Boosting Machines (GBMs) 30% 17% 15% Penalized regression methods 30% 17% 22% Random Forest (RF) 26% 21% 22% Other ensemble methods 24% 17% 19% Other Machine Learning methods Grid search techniques 11% 26% 8% 33% 19% 15% Primary Secondary Tertiary Base: U.S. respondents who use or plan to use the methodology for the application specified (Loss Cost Modeling n = 46, Claims Analytics n = 48, Marketing n = 27). 52
53 Machine learning in personal lines pricing: evolution or revolution? Conclusions It s not all about methods Domain expertise in formulating the problem can be more important New (wider) data generally adds more lift than new methods There are practical ways to assess model improvement as well as statistical Predictions: Many methods can augment the traditional GLM modeling process in particular with growing datasets Others (e.g., GMBs) can improve prediction in own right but these require different approaches in interpretation and technology/deployment Methods support predictions in new areas Very wide datasets Market rate analysis Cross-selling and other customer behaviors UW, claims, etc But it s not all about predictive power Fast investigation of new or messy data Data scientists & statisticians Domain experts Quick assessment of emerging experience Operational efficiency Industry (and Profession) has work to do in developing machine learning skills integrated with domain expertise 53
54 Contacts Claudine Modlin Director, Risk Consulting & Software Willis Towers Watson Duncan Anderson Managing Director, Risk Consulting & Software Willis Towers Watson Mark Richards Director, Allstate Personal Lines D 3 : Data, Discovery & Decision Science Mark.Richards@allstate.com
Machine Learning Duncan Anderson Managing Director, Willis Towers Watson
Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018 GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2 This is not all new Actuaries adopt
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationScalable Machine Learning in R. with H2O
Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationData mining techniques for actuaries: an overview
Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of
More informationADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationIBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:
IBM Software IBM SPSS Statistics 19 IBM SPSS Categories Predict outcomes and reveal relationships in categorical data Highlights With IBM SPSS Categories you can: Visualize and explore complex categorical
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationPractical Guidance for Machine Learning Applications
Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering
More informationGLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015
GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background
More informationPutting it all together: Creating a Big Data Analytic Workflow with Spotfire
Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Authors: David Katz and Mike Alperin, TIBCO Data Science Team In a previous blog, we showed how ultra-fast visualization of
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationRandom Forests and Boosting
Random Forests and Boosting Tree-based methods are simple and useful for interpretation. However they typically are not competitive with the best supervised learning approaches in terms of prediction accuracy.
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationMachine Learning: An Applied Econometric Approach Online Appendix
Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail
More informationINTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...
INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationAdvanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach
Advanced and Predictive Analytics with JMP 12 PRO JMP User Meeting 9. Juni 2016 -Schwalbach Definition Predictive Analytics encompasses a variety of statistical techniques from modeling, machine learning
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationPractical OmicsFusion
Practical OmicsFusion Introduction In this practical, we will analyse data, from an experiment which aim was to identify the most important metabolites that are related to potato flesh colour, from an
More informationInterpretable Machine Learning with Applications to Banking
Interpretable Machine Learning with Applications to Banking Linwei Hu Advanced Technologies for Modeling, Corporate Model Risk Wells Fargo October 26, 2018 2018 Wells Fargo Bank, N.A. All rights reserved.
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationExploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019
Exploring the Structure of Data at Scale Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Outline Why exploration of large datasets matters Challenges in working with large data
More informationData Mining Lecture 8: Decision Trees
Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?
More informationIntroduction to Automated Text Analysis. bit.ly/poir599
Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last
More informationNetwork Lasso: Clustering and Optimization in Large Graphs
Network Lasso: Clustering and Optimization in Large Graphs David Hallac, Jure Leskovec, Stephen Boyd Stanford University September 28, 2015 Convex optimization Convex optimization is everywhere Introduction
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationClassifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods
Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods Kunal Sharma, Nov 26 th 2018 Dr. Lewe, Dr. Duncan Areospace Design Lab Georgia Institute of Technology Objective
More informationFrom Building Better Models with JMP Pro. Full book available for purchase here.
From Building Better Models with JMP Pro. Full book available for purchase here. Contents Acknowledgments... ix About This Book... xi About These Authors... xiii Part 1 Introduction... 1 Chapter 1 Introduction...
More informationBusiness Cases for Machine Learning
Business Cases for Machine Learning Matching Predictive Algorithms to Usage Overview Outline basic problem Predictive modeling templates Discussion of common business problems What algorithms fit? Factors
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationSparkling Water. August 2015: First Edition
Sparkling Water Michal Malohlava Alex Tellez Jessica Lanford http://h2o.gitbooks.io/sparkling-water-and-h2o/ August 2015: First Edition Sparkling Water by Michal Malohlava, Alex Tellez & Jessica Lanford
More informationPS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:
Economics 1660: Big Data PS 6: Regularization Prof. Daniel Björkegren PART A: (Source: HTF page 95) The Ridge regression problem is: : β "#$%& = argmin (y # β 2 x #4 β 4 ) 6 6 + λ β 4 #89 Consider the
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationPredictive Modeling with SAS Enterprise Miner
Predictive Modeling with SAS Enterprise Miner Practical Solutions for Business Applications Second Edition Kattamuri S. Sarma, PhD From Predictive Modeling with SAS Enterprise Miner TM, From Predictive
More informationModel Inference and Averaging. Baging, Stacking, Random Forest, Boosting
Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different
More informationTaking Your Application Design to the Next Level with Data Mining
Taking Your Application Design to the Next Level with Data Mining Peter Myers Mentor SolidQ Australia HDNUG 24 June, 2008 WHO WE ARE Industry experts: Growing, elite group of over 90 of the world s best
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationNonparametric Classification Methods
Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationINTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN
INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN MOTIVATION FOR RANDOM FOREST Random forest is a great statistical learning model. It works well with small to medium data. Unlike Neural Network which requires
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More information3 Ways to Improve Your Regression
3 Ways to Improve Your Regression Introduction This tutorial will take you through the steps demonstrated in the 3 Ways to Improve Your Regression webinar. First, you will be introduced to a dataset about
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationSCIENCE. An Introduction to Python Brief History Why Python Where to use
DATA SCIENCE Python is a general-purpose interpreted, interactive, object-oriented and high-level programming language. Currently Python is the most popular Language in IT. Python adopted as a language
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationFinal Report: Kaggle Soil Property Prediction Challenge
Final Report: Kaggle Soil Property Prediction Challenge Saurabh Verma (verma076@umn.edu, (612)598-1893) 1 Project Goal Low cost and rapid analysis of soil samples using infrared spectroscopy provide new
More informationThe Basics of Decision Trees
Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationCART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology
CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationNonparametric Approaches to Regression
Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More information1 Topic. Image classification using Knime.
1 Topic Image classification using Knime. The aim of image mining is to extract valuable knowledge from image data. In the context of supervised image classification, we want to assign automatically a
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationPractical Machine Learning Agenda
Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data
More informationREPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER
REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER ESTIMATION Problems with MLE known since Charles Stein 1956 paper He showed that when estimating 3 or more means, shrinking
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationBoosted Optimization for Network Classification. Bioinformatics Center Kyoto University
Boosted Optimization for Network Classification Timothy Hancock Hiroshi Mamitsuka Bioinformatics Center Kyoto University 2 of 22 Motivation We want to construct a classifier that has good performance where
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationParallel learning of content recommendations using map- reduce
Parallel learning of content recommendations using map- reduce Michael Percy Stanford University Abstract In this paper, machine learning within the map- reduce paradigm for ranking
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More informationProbabilistic Classifiers DWML, /27
Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium
More informationMACHINE LEARNING TOOLBOX. Random forests and wine
MACHINE LEARNING TOOLBOX Random forests and wine Random forests Popular type of machine learning model Good for beginners Robust to overfitting Yield very accurate, non-linear models Random forests Unlike
More informationSubject Summary: Mathematics
Subject Summary: Mathematics Years 7 to 10 The scheme of work a student follows is based on their ability rather than their year group. We refer to each scheme of work as a Stage. Each Stage is numbered,
More informationSAS High-Performance Analytics Products
Fact Sheet What do SAS High-Performance Analytics products do? With high-performance analytics products from SAS, you can develop and process models that use huge amounts of diverse data. These products
More informationMachine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums
Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September
More informationPeople risk. Capital risk. Technology risk
Decode secure. People risk Capital risk Technology risk Cybersecurity needs a new battle plan. A better plan that deals with the full spectrum of your company s cybersecurity not just your technology.
More informationThe Data Science Process. Polong Lin Big Data University Leader & Data Scientist IBM
The Data Science Process Polong Lin Big Data University Leader & Data Scientist IBM polong@ca.ibm.com Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationLeveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg
Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationUsing Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear
Using Machine Learning to Identify Security Issues in Open-Source Libraries Asankhaya Sharma Yaqin Zhou SourceClear Outline - Overview of problem space Unidentified security issues How Machine Learning
More informationTrade-offs in Explanatory
1 Trade-offs in Explanatory 21 st of February 2012 Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 2 Outline Motivation: need for interpretable
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationLecture 20: Bagging, Random Forests, Boosting
Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationtowards advanced HR Analytics by Arie-Jan Baan and Bram Eigenhuis
towards advanced HR Analytics by Arie-Jan Baan and Bram Eigenhuis Content #1 advanced Data Analytics (?) #2 data Science Process #3 a case study #4 your case #5 Q&A Who is who? and what is your expectation?
More informationExpensive to drill wells when all goes to plan. Drilling problems add costs throughout the well s life.
Beau Rollins Map View Cross Section View Expensive to drill wells when all goes to plan. Surface Location Drilling problems add costs throughout the well s life. The best scenario for minimizing costs
More information