Ensemble Methods: Bagging

Similar documents
Instructor: Jessica Wu Harvey Mudd College

Multiclass Classification

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

STA 4273H: Statistical Machine Learning

Linear Regression & Gradient Descent

CSC 411 Lecture 4: Ensembles I

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

7. Boosting and Bagging Bagging

CS 229 Midterm Review

April 3, 2012 T.C. Havens

Random Forest A. Fornaser

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Classification/Regression Trees and Random Forests

Lecture #17: Autoencoders and Random Forests with R. Mat Kallada Introduction to Data Mining with R

Introduction to Classification & Regression Trees

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

Fast Edge Detection Using Structured Forests

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Ensemble Methods, Decision Trees

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Network Traffic Measurements and Analysis

Machine Learning Techniques for Data Mining

Decision Tree CE-717 : Machine Learning Sharif University of Technology

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Slides for Data Mining by I. H. Witten and E. Frank

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

Classification: Decision Trees

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

BITS F464: MACHINE LEARNING

The Basics of Decision Trees

Logical Rhythm - Class 3. August 27, 2018

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Deep Learning for Computer Vision

Decision Trees Oct

CSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram

Algorithms: Decision Trees

INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN

An introduction to random forests

Data Mining Lecture 8: Decision Trees

Lecture 7: Decision Trees

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Advanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

Nearest neighbor classification DSE 220

Business Club. Decision Trees

Naïve Bayes for text classification

Decision Trees / Discrete Variables

Oliver Dürr. Statistisches Data Mining (StDM) Woche 12. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Classification. Instructor: Wei Ding

Globally Induced Forest: A Prepruning Compression Scheme

Cross-validation. Cross-validation is a resampling method.

CS229 Lecture notes. Raphael John Lamarre Townshend

Random Forests and Boosting

Tree-based methods for classification and regression

Nonparametric Approaches to Regression

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Data Mining. Lecture 03: Nearest Neighbor Learning

RESAMPLING METHODS. Chapter 05

Lecture 20: Bagging, Random Forests, Boosting

Introduction to Machine Learning

User Documentation Decision Tree Classification with Bagging

Machine Learning Models for Pattern Classification. Comp 473/6731

CSE4334/5334 DATA MINING

Nonparametric Methods Recap

Supervised Learning for Image Segmentation

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Stat 342 Exam 3 Fall 2014

8. Tree-based approaches

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

CISC 4631 Data Mining

How Learning Differs from Optimization. Sargur N. Srihari

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Topics in Machine Learning

Part I. Instructor: Wei Ding

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Feature selection. LING 572 Fei Xia

CS489/698: Intro to ML

Machine Learning Lecture 11

Machine Learning (CSE 446): Decision Trees

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Evaluating Classifiers

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

CS Machine Learning

Introduction to Automated Text Analysis. bit.ly/poir599

Classification with Decision Tree Induction

Statistics 202: Statistical Aspects of Data Mining

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

CSC411/2515 Tutorial: K-NN and Decision Tree

Transcription:

Ensemble Methods: Bagging Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Eric Eaton (UPenn), Jenna Wiens (UMich), Tommi Jaakola (MIT), David Kauchak (Pomona), David Sontag (NYU), Piyush Rai (Utah), and the many others who made their course materials freely available online. Robot Image Credit: Viktoriya Sukhanova 123RF.com Ensemble Methods Basics Learning Goals Describe the goal of ensemble methods

Basic Idea If one classifier works well why not use multiple classifiers!?! Ensemble: All the parts of a thing taken together so that each part is considered in relation to the whole. ML Ensemble: Combine multiple classifiers to improve prediction training testing learning algo model 1 model 1 prediction 1 data learning algo model 2 example model 2 prediction 2 learning algo model m model m prediction m Based on slide by David Kauchak Bias vs Variance Recall ML balances estimation error (variance) precision of match sensitivity to training data structural error (bias) distance from true relationship high bias low variance error model complexity high variance low bias test error training error Goals Reduce variance without increasing bias Reduce bias and variance

Benefits of Ensemble Learning Setup (for example) Assume a binary classification problem Suppose that we have trained 3 classifiers h 1 (x), h 2 (x), h 3 (x), each with a misclassification rate of 0.4 Assume that decisions made between classifiers are independent Take majority vote of classifiers model 1 model 2 model 3 probability of this combination c c c What is the probability that we make a mistake? Benefits of Ensemble Learning In general, for m classifiers with r probability of mistake, Let be # of votes for wrong class. Then B binomial(m, r). r for fixed r [cumulative probability distribution for binomial] P(error) as m, P(error) 0 m Based on example by David Kauchak

Bagging and Random Forests Learning Goals Describe bagging How does bagging improve performance? Describe random forests How is random forest related to DTs and bagging? averaging Combining Classifiers final hypothesis is simple vote of members weighted averaging coefficients of individual members trained using validation set stacking predictions of one layer used as input to next layer averaging reduce variance (without increasing bias) (when predictions are independent) But how do we get independent classifiers? Based on slides by Eric Eaton and David Sontag

Obtaining Independent Classifiers Idea: Use different learning methods Pros Lots of existing classifiers Can work well for some problems Cons Often classifiers are not independent (they make same mistakes!) E.g. many classifiers are linear models Voting will not help if they make the same mistakes Successful ensembles require diversity! achieving diversity cause of mistake difficult pattern overfitting noisy features diversification strategy Based on slides by David Kauchak and Eric Eaton Obtaining Independent Classifiers Idea: Split up training data (use same learning algorithm but train of different parts of the data) Pros Learning from different data so cannot overfit to same examples Easy to implement Fast Cons Each classifier only trained on small amount (e.g. 10%) of data Not clear why this would do better than training on full data and using good regularization Based on slide by David Kauchak

Obtaining Independent Classifiers Idea: Bagging (Bootstrap Aggregation) Bootstrap Sampling Use training data as proxy for data generating distribution Given S n with n training examples, create S n by sampling with replacement n examples Bagging Create m bootstrap samples S n (1),, S n (m) Train distinct classifier on each S n (i) Classify new example by majority vote / averaging Based on notes and slides by Eric Eaton and David Sontag Bagging Best Case Scenario In practice: models correlated so reduction smaller than 1/m model trained on fewer training samples so variance usually somewhat larger Based on slide by David Sontag

Bagging Concerns Will bootstrap samples all be basically the same? For data set of size n, what is the probability that a given example will not be selected in a bootstrap sample? probability that example is not chosen the first time = probability that example is not chosen any of n times = for large n, so converges to probability that example is chosen in any of n times = on average, a bootstrap sample contains of training examples Based on notes and slides by Jenna Wiens and David Kauchak When does Bagging work? Bagging tends to reduce the variance of the classifier By voting, the ensemble classifier is more robust to noisy examples Bagging is most useful for classifiers that are Unstable small changes in training set produce very different models What classifiers have we seen so far with these properties? Prone to overfitting Often has similar effect to regularization Based on slide by David Kauchak

Random Forests Introduces two sources of randomness bagging random feature subset at each node, best split is chosen from random subset of k < d features Based on notes by Jenna Wiens [Image by Léon Bottou] Random Forest Procedure for b = 1,, m draw bootstrap sample S n (b) of size n from S n grow random forest DT (b) output ensemble [subprocedure for growing DT (b) ] until stopping criteria are met, recursively repeat following steps for each node of tree i) select k features at random from d features ii) pick best feature to split on using info gain iii) split node into children Based on notes by Jenna Wiens