Evolution of Regression III:

Size: px
Start display at page:

Download "Evolution of Regression III:"

Transcription

1 Evolution of Regression III: From OLS to GPS, MARS, CART, TreeNet and RandomForests March 2013 Dan Steinberg Mikhail Golovnya Salford Systems

2 Course Outline Previous Webinars: Regression Problem quick overview Classical OLS the starting point RIDGE/LASSO/GPS regularized regression MARS adaptive non-linear regression Today s Webinar CART Regression tree Bagger and Random Forest CART ensembles TreeNet Stochastic Gradient Boosting Hybrid TreeNet/GPS models 2

3 Boston Housing Data Set Concerns the housing values in Boston area Harrison, D. and D. Rubinfeld. Hedonic Prices and the Demand For Clean Air. Journal of Environmental Economics and Management, v5, , 1978 Combined information from 10 separate governmental and educational sources to produce this data set 506 census tracts in City of Boston for the year 1970 o Goal: study relationship between quality of life variables and property values o MV median value of owner-occupied homes in tract ($1,000 s) o CRIM per capita crime rates o NOX concentration of nitric oxides (pp 10 million) o AGE percent built before 1940 o DIS weighted distance to centers of employment o RM average number of rooms per house o LSTAT % lower status of the population o RAD accessibility to radial highways o CHAS borders Charles River (0/1) o INDUS percent non-retail business o TAX property tax rate per $10,000 o PT pupil teacher ratio 3

4 Running Score: Test Sample MSE Method 20% random Parametric Bootstrap Regression MARS Reg Splines GPS Lasso/ Regularized Ba6ery Partition 4

5 Regression Tree Decision tree is a nonparametric learning machine with capability for classification and regression (select methods) Stepwise procedure in which predictors enter the model one at a time In 1975 Jerome H. Friedman labeled the methodology Recursive Partitioning (his first version for binary target) Procedure works by carving a high dimensional data space into a small to moderate set of regions Produce a prediction for each region 5

6 Simple Nonparametric Regression Consider a set of predictors which we simplify, characterizing each as having just two values (low,high) Consider all possible combinations of the predictor values (K predictors allow for 2 K possible patterns) Attractive nonparametric model: prediction of target conditional on predictor values is mean target in the region in the training data 6

7 Simple Nonparametric Regression Mean Y X1 X2 X3 y y y y y y y y predictors allow for 8 regions defined in terms of whether the value of any predictor is LOW (0) or HIGH(1) Purely mechanical, flexible Resolves many challenges such as functional form 7

8 Curse of Dimensionality Curse of dimensionality: 2 K very large number o K=16 yields 65,536 regions o K=32 yields 4.3 billion regions We must often work with hundreds of potential predictors and current predictive analytics problems now using thousands if not tens of thousands 8

9 Tree Resolution of Curse of Dimensionality Be selective as to which predictors to select Reduce the number of predictors used in model Define regions with varying numbers of high/low conditions End result is a similarly inspired non-parametric regression Make use of vastly fewer regions Never generate more regions than rows of data 9

10 Growing the Tree Tree is grown by starting with all training data in root node Look for a rule based on the available predictors to split database into two parts Rules look like (for continuous predictors) X k <= c X k > c For regression we look for the rule that most reduces learn sample MSE Search every possible split point (data value for X) Search every available predictor 10

11 Regression Tree Out of the box results, no tuning of controls Test MSE=

12 Root Node Split Start with all training data in root node Find split that reduces learn sample MSE the most (or optionally MAD) Split is always of same form for continuous variables We know this is best on learn sample because search is exhaustive We have searched every available value of every available predictor

13 Next Split Next split could have been on RM again but LSTAT performs beaer CART happens to grow trees depth first (always looks for left most node to split next Order of spliaing does not maaer. But tree growing is myopic.

14 Third Split Order of spliaing nodes does not maaer as our goal is to generate a massive tree Spliaing will usually continue until we run out of data

15 Grow/Prune Strategy Maximal Tree (363 nodes) Test MSE= Overfit fit but still beaer than linear regression Grow tree to maximum possible size: here learn sample R 2 = 1.0 Then prune back to find best performing sub- tree on test sample Maximal tree is almost certainly overfit yet frequently offers decent performance on new data

16 Pruning/Back Stepping Back stepping accomplished via weakest-link pruning Analogous to backwards stepwise regression: remove split that harms learn sample performance least Defines a set of candidate models each of a different size 16

17 Access to every tree in sequence of pruned trees from largest to smallest Several trees of different sizes shown. Allows choice of model combining performance, complexity, judgment

18 Test Error Displayed in Lower Blue Curve Click on any point on blue test sample performance curve to display tree pruned to that size Zoom in Access to every tree in back pruning sequence Green beam marks best performing tree

19 Single Decision Tree Key features and strengths of the single CART decision tree o Impervious to the scale or coding of predictors. CART uses only rank order information to partition data o CART tree is unaffected by order preserving transforms of data o Any single split involves only one predictor (in standard mode) meaning that collinearity becomes largely irrelevant o Automatic variable selection methodology built-in o Missing value handling in CART managed via surrogate splitters (substitute splitters used when primary splitter is missing). A form of local implicit missing value imputation o CART is the only learning machine that can handle patterns of missing values not seen in the training data 19

20 Regression Tree Generally the single regression tree is not expected to perform outstandingly Works by carving a high dimensional data space into mutually exclusive and collectively exhaustive regions Single prediction (mean for the region) is generated for every region o Our optimal tree above produces only 9 distinct outputs o For any input record there are only 9 possible predicted values Linear regression typically produces a unique response for every input record (for rich enough data) 20

21 Regression Tree Representation of a Surface High Dimensional Step function Should be at a disadvantage relative to other tools. Can never be smooth. But always worth checking

22 Regression Tree Partial Dependency Plot LSTAT NOX Use model to simulate impact of a change in predictor Here we simulate separately for every training data record and then average For CART trees is essentially a step function May only get one knot in graph if variable appears only once in tree See appendix to learn how to get these plots

23 Distribution of MV Within Node

24 Learn & Test Within Node

25 Repeated Runs Different 20% Test Samples 100 repetitions dividing data random into 80% learn, 20% test Percentiles: 5 th = th MSE= 32.32, median

26 Running Score Method 20% random Parametric Bootstrap Regression MARS GPS Lasso CART Repeated % Partitions 26

27 Regression Trees and Ensembles of Trees Shortly after the regression tree began to enjoy success in real world and scientific applications search began for improvements One early idea for improvement was the notion of the committee of experts o Instead of having just one predictive model generate several models o Allow the separate models to vote for classification o Majority rule to arrive at a final classification o Regression committees just average individual predictions Challenge: How to generate committees (or ensembles) o Needs to be automatic to be practical o Too expensive to use competing modeling teams or to allow time develop hand made alternative models 27

28 Bagger (Bootstrap Aggregation) Breiman introduced the influential Bagger in 1996 to solve the automatic model generation problem Intuition: Imagine that we have an unlimited supply of data o Draw a random sample from the master data (of good size) Sample could be stratified to oversample rare target class o Grow a regression tree o Draw a new, different, random sample from the master data o Grow a new regression tree o Process can be continued indefinitely as each draw will result in a different sample and typically a different (at least slightly) tree o Combine results into a single prediction for any input data record o For BINARY classification let the number of YES votes be model score 28

29 Bootstrap Sampling Details Bootstrap resampling is defined as sampling with replacement We do not literally extract a record from the original training set but only copy it We then forget that this record was copied into the new training data set and allow it to be selected again Natural to think that since each record has only a very small chance of being selected, being selected more than once would be a rare event However, about 26% of the original data will be selected more than once in the construction of a standard bootstrap sample (much more than everyday intuition would suggest) 29

30 OOB: Records Not Selected: About 36.8% OOB Out of Bag records In a same-size bootstrap sample more than a third of the original records will not be drawn into new sample This is an important artifact of the sample generation process Ensures that the bootstrap samples are fairly different from each other in terms of which records are included The shortfall of records is compensated for by drawing extra copies of the data that is included Including a record more than once does not add information but it gives added weight to that record OOB records can be used for testing. An extreme form of cross-validation 30

31 Bootstrap Sample Reweighting: An example Probability of being omitted in a single draw is (1-1/n) Probability of being omitted in all n draws is (1-1/n) n Limit of series as n goes to infinity is (1/e) = o approximately 36.8% sample excluded o 36.8% sample included once o 18.4% sample included twice thus represent 0.0 % of resample 36.8 % of resample 36.8 % of resample o 6.1% sample included three times % of resample o 1.9% sample included four or more times completes sample Example: distribution of weights in a 2,000 record random resample:

32 Bagger Mechanism Generate a reasonable number of bootstrap samples o Breiman started with numbers like 50, 100, 200 Grow a standard CART tree on each sample Use the unpruned tree to make predictions o Pruned trees yield inferior predictive accuracy for the ensemble Simple voting for classification o Majority rule voting for binary classification o Plurality rule voting for multi-class classification o Average predicted target for regression models Will result in a much smoother range of predictions o Single tree gives same prediction for all records in a terminal node o In bagger records will have different patterns of terminal node results Each record likely to have a unique score from ensemble 32

33 Bagger via Bootstrapping: Test MSE=9.545 Score using 100 tree ensemble 33

34 Predictions From First 6 Trees First 10 records of TEST partition First 6 Bagger Trees, each tree grown to maximal size unpruned No feedback, learning, or tuning derived from test partition Predictions vary as each tree is built on a different bootstrap sample Bagger outputs the AVERAGE prediction for a regression model TEST MSE=

35 CART Partial Dependency Plot LSTAT NOX Use model to simulate impact of a change in predictor Here we simulate separately for every training data record and then average For CART trees is essentially a step function May only get one knot in graph if variable appears only once in tree See appendix to learn how to get these plots

36 Bagger Partial Dependency Plot LSTAT NOX Averaging over many trees allows for a more complex dependency Opportunity for many splits of a variable (100 large trees) Jaggedness may reflect existence of interactions 36

37 Running Score Method 20% random Parametric Bootstrap Regression MARS GPS Lasso CART Bagged CART ,79 Ba6ery Partition 37

38 RandomForests: Bagger on Steroids Leo Breiman was frustrated by the fact that the bagger did not perform better. Convinced there was a better way Observed that trees generated bagging across different bootstrap samples were surprisingly similar How to make them more different? Bagger induces randomness in how the rows of the data are used for model construction Why not also introduce randomness in how the columns are used for model construction Pick a random subset of predictors as candidate predictors a new random subset for every node Breiman was inspired by earlier research that experimented with variations on these ideas Breiman perfected the bagger to make RandomForests 38

39 RF Main Innovation Counter- Intuitive Picking splitter at random appears to be a recipe for poor trees Indeed the RF model may not perform so well if the splitter is chosen completely at random o At every node choose the variable that splits the node at random o Better to allow at least some opportunity for optimization by considering several potential splitters By limiting access to a different subset of predictors at each node we guarantee that trees will be different across the different bootstrap samples Performance of individual trees on holdout data might be inferior to bagger trees but ensemble will do better 39

40 RF Model using defaults: Test MSE=8.286 RF Results RF Controls 40

41 Running Score Method 20% random Parametric Bootstrap Regression MARS GPS Lasso CART Bagged CART ,79 RF Defaults Ba6ery Partition 41

42 RF Core Parameter: N Predictors Tuning parameter that can have a substantial impact on model predictive performance K predictors available PREDS=1 all splitters chosen completely at random Breiman had suggested SQRT(K) and LOG 2 (K) as good defaults o 1000 predictors suggests PREDS=32 (SQRT rule) or PREDS=10 (log 2 rule) o Typically PREDS far below K o Experimentation is needed o To obtain honest performance measures, make judgments just on OOB performance then consult TEST performance 42

43 Varying PREDS Try every value from 1 to 13 Purely- at- random RF yields inferior results but still impressive (Test MSE=11.15) Test Sample results OOB Results At optimum OOB PREDS=6 Test MSE=

44 Running Score Method 20% random Parametric Bootstrap Regression MARS GPS Lasso CART Bagged CART RF Defaults RF PREDS= Ba6ery Partition 44

45 RandomForests Strengths RandomForests can be used for o Classification (binary or multi-class classification) o Regression (prediction of a continuous target) o Clustering/Density estimation Robust behavior relatively insensitive to dirty data Easy to use with few control parameters Graphical displays of key model insights RandomForests can be effectively used with wide data sets o Possibly many more columns than rows o High speed even with millions of predictors Naturally parallelizable as every tree grown independently Retains advantages offered by most regression trees Very strong variable (predictor) selection 45

46 Stochastic Gradient Boosting (TreeNet ) SGB is a revolutionary data mining methodology first introduced by Jerome H. Friedman in 1999 Seminal paper defining SGB released in 2001 o Google scholar reports more than 1600 references to this paper and a further 3300 references to a companion paper Extended further by Friedman in major papers in 2004 and 2008 (Model compression and rule extraction) Ongoing development and refinement by Salford Systems o Latest version released 2013 as part of SPM 7.0 TreeNet/Gradient boosting has emerged as one of the most used learning machines and has been successfully applied across many industries Friedman s proprietary code in TreeNet 46

47 TreeNet Gradient Boosting Process Begin with one very small tree as initial model o Could be as small as ONE split generating 2 terminal nodes o Default model will have 4-6 terminal nodes o Final output is a predicted target (regression) or predicted probability o First tree is an intentionally weak model Compute residuals for this simple model (prediction error) for every record in data (even for classification model) Grow second small tree to predict the residuals from first tree New model is now: Tree 1 + Tree 2 Compute residuals from this new 2-tree model and grow 3 rd tree to predict revised residuals 47

48 Trees incrementally revise predictions Tree 1 Tree 2 Tree First tree grown on original target. Intentionally weak model 2 nd tree grown on residuals from first. Predictions made to improve first tree 3rd tree grown on residuals from model consisting of first two trees Every tree produces at least one positive and at least one negative node. Red reflects a relatively large positive and deep blue reflects a relatively negative node. Total score for a given record is obtained by finding relevant terminal node in every tree in model and summing across all trees 48

49 Gradient Boosting Methodology: Key points Trees are usually kept small (2-6 nodes common) o However, should experiment with larger trees (12, 20, 30 nodes) o Sometimes larger trees are surprisingly good Updates are small (downweighted). Update factors can be as small as.01,.001, o Do not accept the full learning of a tree (small step size, also GPS style) o Larger trees should be coupled with slower learn rates Use random subsets of the training data in each cycle. Never train on all the training data in any one cycle o Typical is to use a random half of the learn data to grow each tree 49

50 Gradient Boosting Contrast With Single Tree Every cycle of learning begins with entire training data set Equivalent to going back to the root node with respect to availability of data o Single tree drills progressively deeper into data, working with progressively smaller subsamples o Single tree cannot share information across nodes (local) o Deep in tree sample sizes may not support discovery of important effects which in fact are common across many of the deeper nodes (broader, even global structure) Gradient boosting rapidly returns to root node with the construction of a new tree target variable has changed however o Target is set of residuals from previous iteration of model 50

51 TreeNet Gradient Boosting Model Setup Differ from defaults only in selecting Least Squares Loss and TREES=1000 Best TreeNet results frequently require thousands of trees 51

52 Gradient Boosting Results (LOSS=LS) Progress of model as trees are added Best MSE and best MAD typically occur with different sized models (N Trees) Essentially default settings yields Test MSE= (bit better yet allowing 1200 trees) 52

53 Running Score Method 20% random Parametric Bootstrap Regression MARS GPS Lasso CART Bagged CART ,79 RF Defaults RF PREDS= TreeNet Defaults Ba6ery Partition Using cross-validation on learn partition to determine optimal number of trees and then scoring the test partition with that model: TreeNet MSE=

54 Tuning Gradient Boosting Regression Huber loss function uses squared error for the smaller errors For larger errors incremental loss is based on absolute error Robust form of regression less sensitive to outliers 54

55 Vary HUBER Threshold: Best MSE=6.71 Vary threshold where we switch from squared errors to absolute errors Optimum when the 5% largest errors are not squared in loss computation Yields best MSE on test data. Sometimes LAD yields best test sample MSE. 55

56 Gradient Boosting Partial Dependency Plots LSTAT NOX 56

57 Running Score Method 20% random Parametric Bootstrap Regression MARS GPS Lasso CART Bagged CART ,79 RF Defaults RF PREDS= TreeNet Defaults TreeNet Huber TN Additive Ba6ery Partition If we had used cross-validation to determine the optimal number of trees and then used those to score test partition the TreeNet Default model MSE=

58 Interactions When the impact of the change in a predictor Xi depends on the value of another predictor Xj Classical statistical models usually start with creating a new feature defined as the product Xi*Xj Extraordinarily difficult to discover, refine, select interactions into classical models Trees generate interactions naturally (perhaps too many and too readily) Technically there is an an interaction between Xi and Xj if the two predictors appear on the same branch of a tree Still need to check to see if in simulations there really is any material dependency that qualifies as an interaction 58

59 Preventing An Interaction Friedman suggested that one can develop models without interactions by just growing 2-node trees If this technique is used one must compensate for the restricted learning allowed in each tree by growing many more trees Preventing interactions in the TreeNet model substantially increases the test sample MSE to

60 Interaction Detection with TreeNet TreeNet offers diagnostics and formal tests for the existence of interactions and their strengths Build a model without interactions (for example, limiting depth of the tree to one split only) Compare results with a more flexible model (trees allowed to grow deeper) TreeNet offers reports based on these comparisons 60

61 TreeNet Interaction Strength Measures LOSS=LS ====================! TreeNet Interactions! ====================! Whole Variable Interactions! Measure Predictor! ! LSTAT! DIS! RM! NOX! CRIM! AGE! B! PT! TAX! INDUS! RAD! CHAS! ZN! Interaction strength is a measure of the difference between predictions generated by a flexible and additive model For a specific 2-way interaction the differences are measured over the joint range of the pair of predictors in question Statistical tests are required to eliminate spurious interactions 61

62 Top Ranked 2- way interactions 62

63 Parametric Bootstrap Evaluate True Strength of Interactions Blue bar: Mean difference in strength measure (flexible additve) Red bar: Standard Deviation of strength measure (in additive model) 63

64 Hybrid Models: ISLE There are many styles of hybrid model We focus on just one: Importance Sampled Learning Ensembles We start with an ensemble of trees and treat each of them as a new feature o Every tree transforms the one or more predictors into a single output o Create a data set in which every tree of the ensemble is a variable o Could be a very large number of such new features (thousands) Now use regularized regression to select and reweight the trees o Could yield a model with a much smaller set of trees 64

65 Example of an important interaction Slope reversal due to interaction Note that the dominant paaern is downward sloping But a key segment defined by the 3 rd variable is upward sloping 65

66 What s Next Hands-on session, part IV Live walk through examples using o Regression trees o Ensemble models (Bagger) o Random Forests o TreeNet Stochastic Gradient Boosting o TreeNet/GPS Hybrid models 66

67 Some other Variations of Bagger Decision Forest: Keep training data set fixed but randomly select which predictors to use (a random half of all predictors for any tree) o Grow many trees and then combine via voting or averaging Bagged Decision Forest: Randomize both over the training records and the allowed predictors o Each data set is a different bootstrap sample o Each predictor set is a different random half of available predictors For binary classification problem vary priors systematically over a broad range o Data set remains unchanged o Important tree growing parameter is changed systematically Random Splitter selection from top K best predictors (Dieterrich) Wagging: Exponential distribution assigns non-zero weights to all records o Webb and Zheng (2004) IEEE Transactions on Knowledge and Data Engineering o Inferior to bagging but designed for systems that must include all records in every tree 67

68 ISLE Ensemble Compression GPS regularized regression progress in dashed line TreeNet Gradient Boosting ensemble solid line In this example negligible improvement and slight compression 68

69 Strength of Interaction Measurement: Friedman s measure From the TreeNet model extract the function (or smooth) Y(xi, xj Z) (1) We extract the smooth by using our actual training data and setting Xi=a, Xj=b and then averaging predicted Y over all records. We then vary the preset values of a and b to extract the model generated surface Now repeat the process for the single dependencies Y(xi xj, Z) (2) Y(xj xi, Z) (3) Compare the predictions derived from (1) with those derived from (2) and (3) across an appropriate region of the (xi, xj) space Construct a measure of the difference between the two 3D surfaces (1) and (2)+(3) 69

70 Jerome Friedman Friedman is one of the most influential researchers in data mining o Studied Physics at UC Berkeley o Professor in Statistics at Stanford since 1975 Co-author of the CART decision tree in 1984 (Classification and Regression trees (with Jerome Friedman, Richard Olshen, and Charles Stone) Creator of MARS regression splines, PRIM rule discovery, ISLE ensemble compression, GPS regularized regression, Rule Learning Ensembles Series of many influential machine learning papers spanning 40 years 70

71 Salford Predictive Modeler SPM Download a current version from our website Version will run without a license key for 10-days Request a license key from unlock@salford-systems.com Request configuration to meet your needs o Data handling capacity o Data mining engines made available 71

Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM

Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM March 2013 Dan Steinberg Mikhail Golovnya Salford Systems Salford Systems 2013 1 Course Outline Today s Webinar: Hands-on companion

More information

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration

More information

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems Salford Systems Predictive Modeler Unsupervised Learning Salford Systems http://www.salford-systems.com Unsupervised Learning In mainstream statistics this is typically known as cluster analysis The term

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Package nodeharvest. June 12, 2015

Package nodeharvest. June 12, 2015 Type Package Package nodeharvest June 12, 2015 Title Node Harvest for Regression and Classification Version 0.7-3 Date 2015-06-10 Author Nicolai Meinshausen Maintainer Nicolai Meinshausen

More information

SPM Users Guide. This guide elaborates on powerful ways to combine the TreeNet and GPS engines to achieve model compression and more.

SPM Users Guide. This guide elaborates on powerful ways to combine the TreeNet and GPS engines to achieve model compression and more. SPM Users Guide Model Compression via ISLE and RuleLearner This guide elaborates on powerful ways to combine the TreeNet and GPS engines to achieve model compression and more. Title: Model Compression

More information

3 Ways to Improve Your Regression

3 Ways to Improve Your Regression 3 Ways to Improve Your Regression Introduction This tutorial will take you through the steps demonstrated in the 3 Ways to Improve Your Regression webinar. First, you will be introduced to a dataset about

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Tutorial 1. Linear Regression

Tutorial 1. Linear Regression Tutorial 1. Linear Regression January 11, 2017 1 Tutorial: Linear Regression Agenda: 1. Spyder interface 2. Linear regression running example: boston data 3. Vectorize cost function 4. Closed form solution

More information

Random Forests and Boosting

Random Forests and Boosting Random Forests and Boosting Tree-based methods are simple and useful for interpretation. However they typically are not competitive with the best supervised learning approaches in terms of prediction accuracy.

More information

The Basics of Decision Trees

The Basics of Decision Trees Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Introduction to Random Forests. This guide provides a brief introduction to Random Forests.

Introduction to Random Forests. This guide provides a brief introduction to Random Forests. Introduction to Random Forests This guide provides a brief introduction to Random Forests. 2018 by Minitab Inc. All rights reserved. Minitab, SPM, SPM Salford Predictive Modeler, Salford Predictive Modeler,

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

Allstate Insurance Claims Severity: A Machine Learning Approach

Allstate Insurance Claims Severity: A Machine Learning Approach Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

SPM Users Guide. Introducing TreeNet. This guide describes the TreeNet Product and illustrates some practical examples of its basic usage and approach

SPM Users Guide. Introducing TreeNet. This guide describes the TreeNet Product and illustrates some practical examples of its basic usage and approach SPM Users Guide Introducing TreeNet This guide describes the TreeNet Product and illustrates some practical examples of its basic usage and approach 2 TreeNet Introduction TreeNet is a revolutionary advance

More information

Lecture 06 Decision Trees I

Lecture 06 Decision Trees I Lecture 06 Decision Trees I 08 February 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/33 Problem Set #2 Posted Due February 19th Piazza site https://piazza.com/ 2/33 Last time we starting fitting

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

Note: In the presentation I should have said "baby registry" instead of "bridal registry," see

Note: In the presentation I should have said baby registry instead of bridal registry, see Q-and-A from the Data-Mining Webinar Note: In the presentation I should have said "baby registry" instead of "bridal registry," see http://www.target.com/babyregistryportalview Q: You mentioned the 'Big

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Nonparametric Classification Methods

Nonparametric Classification Methods Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Introducing TreeNet Gradient Boosting Machine

Introducing TreeNet Gradient Boosting Machine This guide describes the TreeNet product and illustrates some practical examples of its basic usage and approach. 2018 by Minitab Inc. All rights reserved. Minitab, SPM, SPM, Salford Predictive Modeler,

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

7. Boosting and Bagging Bagging

7. Boosting and Bagging Bagging Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known

More information

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Interpretable Machine Learning with Applications to Banking

Interpretable Machine Learning with Applications to Banking Interpretable Machine Learning with Applications to Banking Linwei Hu Advanced Technologies for Modeling, Corporate Model Risk Wells Fargo October 26, 2018 2018 Wells Fargo Bank, N.A. All rights reserved.

More information

Nearest Neighbor Predictors

Nearest Neighbor Predictors Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Lecture 20: Bagging, Random Forests, Boosting

Lecture 20: Bagging, Random Forests, Boosting Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information

Estimating Data Center Thermal Correlation Indices from Historical Data

Estimating Data Center Thermal Correlation Indices from Historical Data Estimating Data Center Thermal Correlation Indices from Historical Data Manish Marwah, Cullen Bash, Rongliang Zhou, Carlos Felix, Rocky Shih, Tom Christian HP Labs Palo Alto, CA 94304 Email: firstname.lastname@hp.com

More information

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different

More information

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is

More information

Practical Guidance for Machine Learning Applications

Practical Guidance for Machine Learning Applications Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering

More information

GLMSELECT for Model Selection

GLMSELECT for Model Selection Winnipeg SAS User Group Meeting May 11, 2012 GLMSELECT for Model Selection Sylvain Tremblay SAS Canada Education Copyright 2010 SAS Institute Inc. All rights reserved. Proc GLM Proc REG Class Statement

More information

SPM Users Guide. RandomForests Modeling Basics. This guide provides an introduction into RandomForests Modeling Basics.

SPM Users Guide. RandomForests Modeling Basics. This guide provides an introduction into RandomForests Modeling Basics. SPM Users Guide RandomForests Modeling Basics This guide provides an introduction into RandomForests Modeling Basics. Title: RandomForests Modeling Basics Short Description: This guide provides an introduction

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers

More information

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Random Forest in Genomic Selection

Random Forest in Genomic Selection Random Forest in genomic selection 1 Dpto Mejora Genética Animal, INIA, Madrid; Universidad Politécnica de Valencia, 20-24 September, 2010. Outline 1 Remind 2 Random Forest Introduction Classification

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

April 3, 2012 T.C. Havens

April 3, 2012 T.C. Havens April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate

More information

Package KernelKnn. January 16, 2018

Package KernelKnn. January 16, 2018 Type Package Title Kernel k Nearest Neighbors Version 1.0.8 Date 2018-01-16 Package KernelKnn January 16, 2018 Author Lampros Mouselimis Maintainer Lampros Mouselimis

More information

The Automation of the Feature Selection Process. Ronen Meiri & Jacob Zahavi

The Automation of the Feature Selection Process. Ronen Meiri & Jacob Zahavi The Automation of the Feature Selection Process Ronen Meiri & Jacob Zahavi Automated Data Science http://www.kdnuggets.com/2016/03/automated-data-science.html Outline The feature selection problem Objective

More information

The Review of Attributes Influencing Housing Prices using Data Mining Methods

The Review of Attributes Influencing Housing Prices using Data Mining Methods International Journal of Sciences: Basic and Applied Research (IJSBAR) ISSN 2307-4531 (Print & Online) http://gssrr.org/index.php?journal=journalofbasicandapplied ---------------------------------------------------------------------------------------------------------------------------

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Engineering the input and output Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Attribute selection z Scheme-independent, scheme-specific

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics.

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics. 2018 by Minitab Inc. All rights reserved. Minitab, SPM, SPM Salford

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?

More information

8.11 Multivariate regression trees (MRT)

8.11 Multivariate regression trees (MRT) Multivariate regression trees (MRT) 375 8.11 Multivariate regression trees (MRT) Univariate classification tree analysis (CT) refers to problems where a qualitative response variable is to be predicted

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Why enhance GLM? Shortcomings of the linear modelling approach. GLM being

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Introduction to Classification & Regression Trees

Introduction to Classification & Regression Trees Introduction to Classification & Regression Trees ISLR Chapter 8 vember 8, 2017 Classification and Regression Trees Carseat data from ISLR package Classification and Regression Trees Carseat data from

More information

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018 GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2 This is not all new Actuaries adopt

More information

Cyber attack detection using decision tree approach

Cyber attack detection using decision tree approach Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information

More information

Advanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach

Advanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach Advanced and Predictive Analytics with JMP 12 PRO JMP User Meeting 9. Juni 2016 -Schwalbach Definition Predictive Analytics encompasses a variety of statistical techniques from modeling, machine learning

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey

More information

An introduction to random forests

An introduction to random forests An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying

More information

INTELLIGENT WEB CACHING USING MACHINE LEARNING METHODS

INTELLIGENT WEB CACHING USING MACHINE LEARNING METHODS INTELLIGENT WEB CACHING USING MACHINE LEARNING METHODS Sarina Sulaiman, Siti Mariyam Shamsuddin, Ajith Abraham, Shahida Sulaiman Abstract: Web caching is a technology to improve network traffic on the

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS Deep Neural Decision Forests Microsoft Research Cambridge UK, ICCV 2015 Decision Forests, Convolutional Networks and the Models in-between

More information

High dimensional data analysis

High dimensional data analysis High dimensional data analysis Cavan Reilly October 24, 2018 Table of contents Data mining Random forests Missing data Logic regression Multivariate adaptive regression splines Data mining Data mining

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Visualisation of Regression Trees

Visualisation of Regression Trees Visualisation of Regression Trees Chris Brunsdon April 8, 2007 The regression tree [1] has been used as a tool for exploring multivariate data sets for some time. As in multiple linear regression, the

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Elemental Set Methods. David Banks Duke University

Elemental Set Methods. David Banks Duke University Elemental Set Methods David Banks Duke University 1 1. Introduction Data mining deals with complex, high-dimensional data. This means that datasets often combine different kinds of structure. For example:

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Lecture 2 :: Decision Trees Learning

Lecture 2 :: Decision Trees Learning Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

8. Tree-based approaches

8. Tree-based approaches Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Optimizing Completion Techniques with Data Mining

Optimizing Completion Techniques with Data Mining Optimizing Completion Techniques with Data Mining Robert Balch Martha Cather Tom Engler New Mexico Tech Data Storage capacity is growing at ~ 60% per year -- up from 30% per year in 2002. Stored data estimated

More information

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

CS 521 Data Mining Techniques Instructor: Abdullah Mueen CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks

More information

VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS

VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering, Faculty of Eletrical Engineering, Czech

More information

Random Forests for Big Data

Random Forests for Big Data Random Forests for Big Data R. Genuer a, J.-M. Poggi b, C. Tuleau-Malot c, N. Villa-Vialaneix d a Bordeaux University c Nice University b Orsay University d INRA Toulouse October 27, 2017 CNAM, Paris Outline

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Application of Additive Groves Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Daria Sorokina Carnegie Mellon University Pittsburgh PA 15213

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

CART Bagging Trees Random Forests. Leo Breiman

CART Bagging Trees Random Forests. Leo Breiman CART Bagging Trees Random Forests Leo Breiman Breiman, L., J. Friedman, R. Olshen, and C. Stone, 1984: Classification and regression trees. Wadsworth Books, 358. Breiman, L., 1996: Bagging predictors.

More information