BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Similar documents
CSC 411 Lecture 4: Ensembles I

Naïve Bayes for text classification

RESAMPLING METHODS. Chapter 05

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Network Traffic Measurements and Analysis

Random Forest A. Fornaser

STA 4273H: Statistical Machine Learning

Model combination. Resampling techniques p.1/34

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens

Lecture #11: The Perceptron

Nonparametric Methods Recap

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Support Vector Machines + Classification for IR

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Tree-based methods for classification and regression

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Machine Learning in Biology

Stat 342 Exam 3 Fall 2014

Data Mining Lecture 8: Decision Trees

Bias-Variance Analysis of Ensemble Learning

Machine Learning. Chao Lan

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Probabilistic Approaches

A Systematic Overview of Data Mining Algorithms

Cross-validation. Cross-validation is a resampling method.

Classification/Regression Trees and Random Forests

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

CS 229 Midterm Review

Statistics 202: Statistical Aspects of Data Mining

CS570: Introduction to Data Mining

Nonparametric Classification Methods

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Information Management course

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Stat 602X Exam 2 Spring 2011

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Chapter 5. Tree-based Methods

Notes and Announcements

Lecture 20: Bagging, Random Forests, Boosting

Regularization and model selection

Machine Learning (CS 567)

INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN

Supervised Learning for Image Segmentation

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff

Instance-based Learning

Mondrian Forests: Efficient Online Random Forests

Lecture #17: Autoencoders and Random Forests with R. Mat Kallada Introduction to Data Mining with R

CSE 446 Bias-Variance & Naïve Bayes

8. Tree-based approaches

Random Forest Classification and Attribute Selection Program rfc3d

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05

Ensemble Methods: Bagging

Applying Supervised Learning

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

CSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram

Multi-label classification using rule-based classifier systems

Chapter 7: Numerical Prediction

Machine Learning Models for Pattern Classification. Comp 473/6731

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

What is machine learning?

Bagging and Random Forests

Content-based image and video analysis. Machine learning

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

An introduction to random forests

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Oliver Dürr. Statistisches Data Mining (StDM) Woche 12. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Lecture 25: Review I

Cross-validation and the Bootstrap

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Classification. Slide sources:

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets

Bias-Variance Decomposition Error Estimators

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Bias-Variance Decomposition Error Estimators Cross-Validation

Combining Models to Improve Classifier Accuracy and Robustness

Lecture 9: Support Vector Machines

Bias - variance tradeoff of soft decision trees

Introduction to Classification & Regression Trees

Classification and Regression Trees

Estimating Data Center Thermal Correlation Indices from Historical Data

Combining Models to Improve Classifier Accuracy and Robustness 1

Introduction to Automated Text Analysis. bit.ly/poir599

Advanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach

Classification: Decision Trees

Contents. Preface to the Second Edition

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Machine Learning Classifiers and Boosting

Model Complexity and Generalization

Data Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Clustering and Classification. Basic principles of clustering. Clustering. Classification

Nonparametric Approaches to Regression

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Transcription:

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1

Outline Bias and Variance Bagging Application of Bagging to CART (Classfication & Regression Tree) 2

Bias and Variance The reducible error of a model can be decomposed into two parts. Error due to Bias Error due to Variance Good references include Scott Fortmann-Roe s blog: http://scott.fortmann-roe.com/docs/biasvariance.html#fn:1 Lecture 8 of the free online course: Introductory Machine Learning (a bit technical). https://work.caltech.edu/telecourse.html http://www.hlt.utdallas.edu/~vgogate/ml/2015s/lectures/ensemblemethods.pdf https://followthedata.wordpress.com/2012/06/02/practical-advice-for-machine-learning-bias-variance/ 3

The Error Due to Bias The error due to bias refers to the difference between the expected (or average) prediction of the model and the true value that we are trying to predict. Suppose that there are infinitely many marbles in the bin. How can we estimate the fraction of red marbles in the bin? We use the fraction of red marbles in the sample to estimate the fraction of red marbles in the bin. Figure is from http://work.caltech.edu/slides/slides02.pdf 4

The Error Due to Bias Is the fraction of red marbles computed from the sample a good approximation to its real value? unlucky It is unfair to say a model is bad if it performs poorly on one sample. We average the performance on many samples. μ 1 μ n The error due to bias is 0 n μ = 1 n i=1 μ i 5

The Error Due to Bias Low bias means the model fits the data well Linear regression applied to linear data SVM applied to linearly separable data with large margin Caution: can be overfitting High bias means poor approximation to the data Linear regression to nonlinear data KNN with very large k 6

The Error due to Variance The error due to variance refers to the variance of a model prediction for a given data point. x Data 1 g D1 g D1 x Data 2 g D2 g D2 x error due to variance = Var g D x fixed Data n g Dn g Dn x Random variable 7

The Error due to Variance Low variance means the prediction of the model is stable Linear regression applied to nonlinear data Model independent with the data Caution: can be under-fitting High variance means the prediction of the model is unstable High degree polynomial KNN with k = 1 8

Graphical illustration of bias and variance (figure is from Scott Fortmann-Roe s blog) 9

Bias/Variance Tradeoff E D g D x g x 2 = E D g D x g x 2 + ED g D x E D g D x 2 The true model (usually unknown) bias 2 variance We usually have low bias, high variance (the model is too complex) low variance, high bias (the model is too simple) Duda, Pattern Classification Tradeoff: Bias 2 vs variance 10

Bias/Variance Tradeoff Hastie, Tibshirani, Friedman Elements of Statistical Learning 11

Tips to reduce errors High bias Try to get new features High variance Try to get more training samples Try subset of features 12

Reduce Variance without increasing Bias Averaging reduces variance Var N i=1 N Xi = Var X N X i are identical and independent distributed (i.i.d.) random variables The training data is given. How can we find more data? 13

Bagging Breiman, 1996 Derived from bootstrap (Efron, 1993) Create classifiers using training sets that are bootstrapped (drawn with replacement) Average results for each case

What is Bagging? bootstrap aggregating Basic idea of bootstrap Suppose we have a model fit to a set of training data. The training set is Z = z 1, z 2,, z N where z i = x i, y i. The basic idea is to randomly draw datasets with replacement from the training data. Each sample set is the same size as the original training set. This is done B times and we will have B bootstrap datasets. We refit the model to each of the bootstrap datasets, and examines the behavior over the B replications. Question: what would happen if you draw datasets without replacement?

Review on Bootstrap Bootstrap can be used to assess the accuracy of a parameter estimate or prediction In Bagging, we use it to improve the estimation or prediction itself. For each bootstrap sample set Z b, b = 1,2,, B, we fit our model, giving prediction f b x, the bagging estimate is f bag x = 1 B b=1 B f b x. The average over the prediction reduces variance (not bias).

Bagging Example (Opitz, 1999) Original 1 2 3 4 5 6 7 8 Training set 1 2 7 8 3 7 6 3 1 Training set 2 7 8 5 6 4 2 7 1 Training set 3 3 6 2 7 5 6 2 2 Training set 4 4 5 1 4 6 4 3 8

Bagging ML f 1 ML f 2 f ML f T

Examples in Tree-based methods Tree-based methods Partition the feature space into a set of rectangles, and usually fit a constant in each one. The figure is from Wei-Yin Loh (2011) Classfication and regression trees. WIREs Data Mining and Knowledge Discovery 1:14-23.

Why Bagging in CART CART = Classification and Regression Tree To construct the tree, usually MSE or Misclassification error is minimized over the training sample. Tree based methods have very high variance. It is unstable because of the hierarchical structure. Bagging can average many trees to reduce the variance.

Example I: Tree-based Regression

Example II: Classification Tree Training Sample Results from one CART Bagged Tree Decision Boundary

Results from Breiman 96

Tips of Using Bagging Bagging is useful when the base-learner is unstable. A base-learner is unstable if a small perturbation in the training data leads to large change in the model. Neural networks, KNN with small k, and decision tree are unstable. KNN and naive Bayes classifier are stable. Bagging is not intended for reducing bias. 24

References Breiman 96, Bagging Predictors, Machine Learning, 26, 123-140 Leblanc M. and Tibshirani, R., 96, Combining estimates in regression and classfication, J. of Amer. Stat. Assoc. 91:1641-1650 Textbook: The elements of Statistical Learning