Deep Boosting. Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Umar Syed (Google Research)

Size: px
Start display at page:

Download "Deep Boosting. Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Umar Syed (Google Research)"

Transcription

1 Deep Boosting Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Umar Syed (Google Research) MEHRYAR MOHRI COURANT INSTITUTE & GOOGLE RESEARCH.

2 Deep Boosting Essence page 2

3 Ensemble Methods in ML Combining several base classifiers to create a more accurate one. Bagging (Breiman 1996). AdaBoost (Freund and Schapire 1997). Stacking (Smyth and Wolpert 1999). Bayesian averaging (MacKay 1996). Other averaging schemes e.g., (Freund et al. 2004). Often very effective in practice. Benefit of favorable learning guarantees. page 3

4 Convex Combinations Base classifier set. boosting stumps. decision trees with limited depth or number of leaves. Ensemble combinations: convex hull of base classifier set. conv(h) = T t=1 H th t : t 0; T t=1 t 1; t, h t H. page 4

5 Ensembles - Margin Bound (Koltchinskii and Panchencko, 2002) H >0 Theorem: let be a family of real-valued functions. Fix. Then, for any >0, with probability at least 1, the following holds for all : f = T t=1 th t conv(h) s log 1 2m, R(f) apple b R S, (f)+ 2 R m(h)+ where R S, (f) = 1 m m i=1 1 yi f(x i ). page 5

6 Questions Can we use a much richer or deeper base classifier set? richer families needed for difficult tasks. but generalization bound indicates risk of overfitting. page 6

7 AdaBoost Description: coordinate descent applied to (Freund and Schapire, 1997) F ( )= m i=1 e y if(x i ) = m i=1 exp y i T t=1 th t (x i ). Guarantees: ensemble margin bound. but AdaBoost does not maximize the margin! some margin maximizing algorithms such as arc-gv are outperformed by AdaBoost! (Reyzin and Schapire, 2006) page 7

8 Suspicions Complexity of hypotheses used: arc-gv tends to use deeper decision trees to achieve a larger margin. Notion of margin: minimal margin perhaps not the appropriate notion. margin distribution is key. can we shed more light on these questions? page 8

9 Question Main question: how can we design ensemble algorithms that can succeed even with very deep decision trees or other complex sets? theory. algorithms. experimental results. model selection. page 9

10 Theory Deep Boosting - page

11 Base Classifier Set H Decomposition in terms of sub-families or their union. H 1 H 2 H 3 H 4 H 5 H 1 H 1 H 2 H 1 H p page 11

12 Ensemble Family Non-negative linear ensembles F =conv( p k=1 H k) : with t 0, f = T t=1 th t T t=1 t 1,h t H kt. H 2 H 4 H 1 H 3 H 5 page 12

13 Ideas Use hypotheses drawn from s with larger ks but allocate more weight to hypotheses drawn from smaller ks. how can we determine quantitatively the amounts of mixture weights apportioned to different families? H k can we provide learning guarantees guiding these choices? page 13

14 Learning Guarantee (Cortes, MM, and Syed, 2014) Theorem: Fix >0. Then, for any >0, with probability at least 1, the following holds for all f = T t=1 th t F : R(f) R S, (f)+ 4 T t=1 tr m (H kt )+O log p 2 m. page 14

15 Consequences Complexity term with explicit dependency on mixture weights. quantitative guide for controlling weights assigned to more complex sub-families. bound can be used to inspire, or directly define an ensemble algorithm. page 15

16 Algorithms Deep Boosting - page

17 Set-Up H 1,...,H p [ 1, +1] : disjoint sub-families of functions taking values in. Further assumption (not necessary): symmetric subfamilies, i.e.. Notation: h H k h H k r j = R m (H kj ) h j H kj with. page 17

18 Derivation Learning bound suggests seeking to minimize 1 m m i=1 1 yi P Tt=1 th t (x i ) + 4 T 0 with t=1 tr t. T t=1 t 1 page 18

19 Convex Surrogates u ( u) u 1 u 0 Let be a decreasing convex function upper bounding, with differentiable. Two principal choices: Exponential loss: Logistic loss: ( u) =exp( u). ( u) =log 2 (1 + exp( u)). page 19

20 Optimization Problem (Cortes, MM, and Syed, 2014) Moving the constraint to the objective and using the fact that the sub-families are symmetric leads to: min R N 1 m m i=1 1 y i N j=1 jh j (x i ) + N t=1 ( r j + ) j, where, 0, and for each hypothesis, keep either h or -h. page 20

21 DeepBoost Algorithm Coordinate descent applied to convex objective. non-differentiable function. definition of maximum coordinate descent. page 21

22 Direction & Step Maximum direction: definition based on the error t,j = E [y i h j (x i )], i D t where D t is the distribution over sample at iteration t. Step: closed-form expressions for exponential and logistic losses. general case: line search. page 22

23 Pseudocode j = r j +. page 23

24 Connections with Previous Work For = =0, DeepBoost coincides with AdaBoost (Freund and Schapire 1997), run with union of subfamilies, for the exponential loss. additive Logistic Regression (Friedman et al., 1998), run with union of sub-families, for the logistic loss. =0 =0 For and, DeepBoost coincides with L1-regularized AdaBoost (Raetsch, Mika, and Warmuth 2001), for the exponential loss. L1-regularized Logistic Regression (Duchi and Singer 2009), for the logistic loss. page 24

25 Experiments Deep Boosting - page

26 Rad. Complexity Estimates Benefit of data-dependent analysis: empirical estimates of each. example: for kernel function K k, R S (H k ) R m (H k ) Tr[K k ] m. alternatively, upper bounds in terms of growth functions, R m (H k ) 2log Hk (m) m. page 26

27 Experiments (1) Family of base classifiers defined by boosting stumps: boosting stumps (threshold functions). in dimension,, thus decision trees of depth 2, d H stumps 1 R m (H stumps 1 ) H stumps 1 question at the internal nodes of depth 1. d (m) 2md 2log(2md) m H stumps 2, with the same in dimension, H stumps, thus 2 R m (H stumps 2 ) (m) (2m) 2 d(d 1) 2log(2m 2 d(d 1)) m. 2. page 27

28 Experiments (1) Base classifier set:. Data sets: same UCI Irvine data sets as (Breiman 1999) and (Reyzin and Schapire 2006). H stumps 1 H stumps 2 OCR data sets used by (Reyzin and Schapire 2006): ocr17, ocr49. MNIST data sets: ocr17-mnist, ocr49-mnist. Experiments with exponential loss. Comparison with AdaBoost and AdaBoost-L1. page 28

29 Experiments - Stumps Exp Loss Table 1: Results for boosted decision stumps and the exponential loss function. (Cortes, MM, and Syed, 2014) AdaBoost AdaBoost AdaBoost AdaBoost breastcancer H stumps 1 H stumps 2 AdaBoost-L1 DeepBoost ocr17 H stumps 1 H stumps 2 AdaBoost-L1 DeepBoost Error Error (std dev) (0.0248) (0.0214) (0.0223) (0.0225) (std dev) (0.0048) Avg tree size Avg tree size Avg no. of trees Avg no. of trees AdaBoost AdaBoost AdaBoost AdaBoost ionosphere H stumps 1 H stumps 2 AdaBoost-L1 DeepBoost ocr49 H stumps 1 H stumps 2 AdaBoost-L1 DeepBoost Error Error (std dev) (0.0414) (0.0413) (0.0331) (0.0394) (std dev) (0.0095) Avg tree size Avg tree size Avg no. of trees Avg no. of trees AdaBoost AdaBoost AdaBoost AdaBoost german H stumps 1 H stumps 2 AdaBoost-L1 DeepBoost ocr17-mnist H stumps 1 H stumps 2 AdaBoost-L1 DeepBoost Error Error (std dev) (0.0445) (0.0487) (0.0438) (0.0462) (std dev) (0.0014) Avg tree size Avg tree size Avg no. of trees Avg no. of trees AdaBoost AdaBoost AdaBoost AdaBoost diabetes H stumps 1 H stumps 2 AdaBoost-L1 DeepBoost ocr49-mnist H stumps 1 H stumps 2 AdaBoost-L1 DeepBoost Error Error (std dev) (0.0330) (0.0518) ( ) (0.0510) (std dev) ( ) Avg tree size Avg tree size Avg no. of trees Avg no. of trees page 29

30 Experiments (2) Family of base classifiers defined by decision trees of depth. For trees with at most n nodes: k R m (T n ) (4n +2)log 2 (d +2)log(m +1) m. K k=1 Htrees k Base classifier set:. Same data sets as with Experiments (1). Both exponential and logistic loss. Comparison with AdaBoost and AdaBoost-L1, Logistic Regression and L1-Logistic Regression. page 30

31 Experiments - Trees Exp Loss (Cortes, MM, and Syed, 2014) breastcancer AdaBoost AdaBoost-L1 DeepBoost ocr17 AdaBoost AdaBoost-L1 DeepBoost Error Error (std dev) ( ) (0.0098) ( ) (std dev) ( ) ( ) ( ) Avg tree size Avg tree size Avg no. of trees Avg no. of trees ionosphere AdaBoost AdaBoost-L1 DeepBoost ocr49 AdaBoost AdaBoost-L1 DeepBoost Error Error (std dev) (0.0315) (0.0257) (0.0316) (std dev) ( ) ( ) ( ) Avg tree size Avg tree size Avg no. of trees Avg no. of trees german AdaBoost AdaBoost-L1 DeepBoost ocr17-mnist AdaBoost AdaBoost-L1 DeepBoost Error Error (std dev) (0.0165) (0.0201) (0.0148) (std dev) (0.0022) (0.0021) (0.0021) Avg tree size Avg tree size Avg no. of trees Avg no. of trees diabetes AdaBoost AdaBoost-L1 DeepBoost ocr49-mnist AdaBoost AdaBoost-L1 DeepBoost Error Error (std dev) (0.0272) (0.0313) (0.0399) (std dev) ( ) ( ) ( ) Avg tree size Avg tree size Avg no. of trees Avg no. of trees page 31

32 Experiments - Trees Log Loss (Cortes, MM, and Syed, 2014) breastcancer LogReg LogReg-L1 DeepBoost ocr17 LogReg LogReg-L1 DeepBoost Error Error (std dev) (0.0101) (0.0120) ( ) (std dev) ( ) ( ) ( ) Avg tree size Avg tree size Avg no. of trees Avg no. of trees ionosphere LogReg LogReg-L1 DeepBoost ocr49 LogReg LogReg-L1 DeepBoost Error Error (std dev) (0.0236) (0.0219) (0.0188) (std dev) ( ) ( ) ( ) Avg tree size Avg tree size Avg no. of trees Avg no. of trees german LogReg LogReg-L1 DeepBoost ocr17-mnist LogReg LogReg-L1 DeepBoost Error Error (std dev) (0.0114) (0.0123) (0.0103) (std dev) ( ) ( ) ( ) Avg tree size Avg tree size Avg no. of trees Avg no. of trees diabetes LogReg LogReg-L1 DeepBoost ocr49-mnist LogReg LogReg-L1 DeepBoost Error Error (std dev) (0.0374) (0.0356) (0.0356) (std dev) ( ) ( ) ( ) Avg tree size Avg tree size Avg no. of trees Avg no. of trees page 32

33 Multi-Class Learning Guarantee with c number of classes. (Kuznetsov, MM, and Syed, 2014) Theorem: Fix >0. Then, for any >0, with probability at least 1, the following holds for all f = T t=1 th t F : s! R(f) apple R b S, (f)+ 8c TX t R m ( 1 (H kt )) + O e log p 2. m t=1 and 1(H k )={x h(x, y): y Y,h H k }. page 33

34 Extension to Multi-Class Similar data-dependent learning guarantee proven for the multi-class setting. bound depending on mixture weights and complexity of sub-families. Deep Boosting algorithm for multi-class: similar extension taking into account the complexities of sub-families. several variants depending on number of classes. different possible loss functions for each variant. page 34

35 Experiments - Multi-Class = exp. AB stands for Ad- Table 1: Empirical results for MDeepBoostSum, aboost. abalone AB.MR AB.MR-L1 MDeepBoost handwritten AB.MR AB.MR-L1 MDeepBoost Error Error (std dev) (0.0130) (0.0132) (0.0092) (std dev) (0.0047) (0.0026) (0.0012) Avg tree size Avg tree size Avg no. of trees Avg no. of trees letters AB.MR AB.MR-L1 MDeepBoost pageblocks AB.MR AB.MR-L1 MDeepBoost Error Error (std dev) (0.0023) (0.0018) (0.0016) (std dev) (0.0037) (0.0021) (0.0027) Avg tree size Avg tree size Avg no. of trees Avg no. of trees pendigits AB.MR AB.MR-L1 MDeepBoost satimage AB.MR AB.MR-L1 MDeepBoost Error Error (std dev) (0.0015) (0.0023) (0.0011) (std dev) (0.0062) (0.0040) (0.0045) Avg tree size Avg tree size Avg no. of trees Avg no. of trees statlog AB.MR AB.MR-L1 MDeepBoost yeast AB.MR AB.MR-L1 MDeepBoost Error Error (std dev) (0.0059) (0.0035) (0.0030) (std dev) (0.0392) (0.0431) (0.0402) Avg tree size Avg tree size Avg no. of trees Avg no. of trees page 35

36 Experiments - Multi-Class Table 1: Empirical results for MDeepBoostCompSum, comparison with multinomial logistic regression. abalone LogReg LogReg-L1 MDeepBoost handwritten LogReg LogReg-L1 MDeepBoost Error Error (std dev) (0.0170) (0.0102) (0.0104) (std dev) (0.0031) (0.0020) (0.0024) Avg tree size Avg tree size Avg no. of trees Avg no. of trees letters LogReg LogReg-L1 MDeepBoost pageblocks LogReg LogReg-L1 MDeepBoost Error Error (std dev) (0.0018) (0.0012) (0.0012) (std dev) (0.0035) (0.0025) (0.0022) Avg tree size Avg tree size Avg no. of trees Avg no. of trees pendigits LogReg LogReg-L1 MDeepBoost satimage LogReg LogReg-L1 MDeepBoost Error Error (std dev) (0.0021) (0.0014) (0.0012) (std dev) (0.0066) (0.0057) (0.0056) Avg tree size Avg tree size Avg no. of trees Avg no. of trees statlog LogReg LogReg-L1 MDeepBoost yeast LogReg LogReg-L1 MDeepBoost Error Error (std dev) (0.0054) (0.0020) (0.0022) (std dev) (0.0467) (0.0458) (0.0468) Avg tree size Avg tree size Avg no. of trees Avg no. of trees page 36

37 Other Related Algorithms Structural Maxent models (Cortes, Kuznetsov, MM, and Syed, ICML 2015): feature functions chosen from a union of very complex families. Deep Cascades (DeSalvo, MM, and Syed, ALT 2015): cascade of predictors with leaf predictors and node questions selected from very rich families. page 37

38 Model Selection Deep Boosting - page

39 Model Selection Problem: how to select hypothesis set? H too complex, no gen. bound, overfitting. H too simple, gen. bound, but underfitting. balance between estimation and approx. errors. H H h h h Bayes page 39

40 Structural Risk Minimization SRM: solution: error H = k=1 f (Vapnik and Chervonenkis, 1974; Vapnik, 1995) H k with H 1 H 2 H k = argmin h H k,k 1 R S (h)+pen(k, m). training error + penalty penalty training error complexity page 40

41 Voted Risk Minimization Ideas: no selection of specific. H k instead, use all s:,,. hypothesis-dependent penalty: p k=1 Deep ensembles. H k h = p k=1 kh k h k H k kr m (H k ). page 41

42 Conclusion Deep Boosting: ensemble learning with increasingly complex families. data-dependent theoretical analysis. algorithm based on learning bound. extension to multi-class. ranking and other losses. enhancement of many existing algorithms. compares favorably to AdaBoost and Logistic Regression or their L1-regularized variants in experiments. page 42

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes 1 CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215

More information

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA 1 HALF&HALF BAGGING AND HARD BOUNDARY POINTS Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu Technical Report 534 Statistics Department September 1998

More information

Bias-Variance Analysis of Ensemble Learning

Bias-Variance Analysis of Ensemble Learning Bias-Variance Analysis of Ensemble Learning Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Outline Bias-Variance Decomposition

More information

CGBoost: Conjugate Gradient in Function Space

CGBoost: Conjugate Gradient in Function Space CGBoost: Conjugate Gradient in Function Space Ling Li Yaser S. Abu-Mostafa Amrit Pratap Learning Systems Group, California Institute of Technology, Pasadena, CA 91125, USA {ling,yaser,amrit}@caltech.edu

More information

Machine Learning (CS 567)

Machine Learning (CS 567) Machine Learning (CS 567) Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol Han (cheolhan@usc.edu)

More information

Consistency Theory in Machine Learning

Consistency Theory in Machine Learning Consistency Theory in Machine Learning Wei Gao ( 高尉 ) Learning And Mining from DatA (LAMDA) National Key Laboratory for Novel Software Technology Nanjing University Machine learning Training data learn

More information

Generalizing Binary Classiers to the Multiclass Case

Generalizing Binary Classiers to the Multiclass Case Generalizing Binary Classiers to the Multiclass Case Dain, Eliyahu eliyahud@post.tau.ac.il Zach, Rotem rotemzac@post.tau.ac.il March 3, 2013 Abstract The course described several ways to solve binary classication

More information

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005

More information

NEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE)

NEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE) NEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE) University of Florida Department of Electrical and

More information

A General Greedy Approximation Algorithm with Applications

A General Greedy Approximation Algorithm with Applications A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

CS6716 Pattern Recognition. Ensembles and Boosting (1)

CS6716 Pattern Recognition. Ensembles and Boosting (1) CS6716 Pattern Recognition Ensembles and Boosting (1) Aaron Bobick School of Interactive Computing Administrivia Chapter 10 of the Hastie book. Slides brought to you by Aarti Singh, Peter Orbanz, and friends.

More information

Leave-One-Out Support Vector Machines

Leave-One-Out Support Vector Machines Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Bagging and Boosting Algorithms for Support Vector Machine Classifiers Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima

More information

Perceptron Learning with Random Coordinate Descent

Perceptron Learning with Random Coordinate Descent Perceptron Learning with Random Coordinate Descent Ling Li Learning Systems Group, California Institute of Technology Abstract. A perceptron is a linear threshold classifier that separates examples with

More information

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Computer Vision Group Prof. Daniel Cremers. 6. Boosting Prof. Daniel Cremers 6. Boosting Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T (x) To do this,

More information

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade Paul Viola and Michael Jones Mistubishi Electric Research Lab Cambridge, MA viola@merl.com and mjones@merl.com Abstract This

More information

Risk bounds for some classification and regression models that interpolate

Risk bounds for some classification and regression models that interpolate Risk bounds for some classification and regression models that interpolate Daniel Hsu Columbia University Joint work with: Misha Belkin (The Ohio State University) Partha Mitra (Cold Spring Harbor Laboratory)

More information

LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning

LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning Outline LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning Guy Lebanon Motivation Introduction Previous Work Committee:John Lafferty Geoff Gordon, Michael I. Jordan, Larry Wasserman

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

Using Iterated Bagging to Debias Regressions

Using Iterated Bagging to Debias Regressions Machine Learning, 45, 261 277, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Using Iterated Bagging to Debias Regressions LEO BREIMAN Statistics Department, University of California

More information

A kernel method for multi-labelled classification

A kernel method for multi-labelled classification A kernel method for multi-labelled classification André Elisseeff and Jason Weston BOwulf Technologies, 305 Broadway, New York, NY 10007 andre,jason @barhilltechnologies.com Abstract This article presents

More information

Bayesian model ensembling using meta-trained recurrent neural networks

Bayesian model ensembling using meta-trained recurrent neural networks Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Adaptive Dropout Training for SVMs

Adaptive Dropout Training for SVMs Department of Computer Science and Technology Adaptive Dropout Training for SVMs Jun Zhu Joint with Ning Chen, Jingwei Zhuo, Jianfei Chen, Bo Zhang Tsinghua University ShanghaiTech Symposium on Data Science,

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Multi-resolution Boosting for Classification and Regression Problems

Multi-resolution Boosting for Classification and Regression Problems Multi-resolution Boosting for Classification and Regression Problems Chandan K. Reddy 1 and Jin-Hyeong Park 2 1 Department of Computer Science, Wayne State University, Detroit, MI-48202, USA reddy@cs.wayne.edu

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual

More information

Cluster based boosting for high dimensional data

Cluster based boosting for high dimensional data Cluster based boosting for high dimensional data Rutuja Shirbhate, Dr. S. D. Babar Abstract -Data Dimensionality is crucial for learning and prediction systems. Term Curse of High Dimensionality means

More information

Combining SVMs with Various Feature Selection Strategies

Combining SVMs with Various Feature Selection Strategies Combining SVMs with Various Feature Selection Strategies Yi-Wei Chen and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan Summary. This article investigates the

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging Prof. Daniel Cremers 8. Boosting and Bagging Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 16: Finale Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Penalizied Logistic Regression for Classification

Penalizied Logistic Regression for Classification Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different

More information

Optimal Extension of Error Correcting Output Codes

Optimal Extension of Error Correcting Output Codes Book Title Book Editors IOS Press, 2003 1 Optimal Extension of Error Correcting Output Codes Sergio Escalera a, Oriol Pujol b, and Petia Radeva a a Centre de Visió per Computador, Campus UAB, 08193 Bellaterra

More information

Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent

Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August 12 17, 07 Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent Ling Li and Hsuan-Tien Lin Abstract

More information

An Empirical Comparison of Spectral Learning Methods for Classification

An Empirical Comparison of Spectral Learning Methods for Classification An Empirical Comparison of Spectral Learning Methods for Classification Adam Drake and Dan Ventura Computer Science Department Brigham Young University, Provo, UT 84602 USA Email: adam drake1@yahoo.com,

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Supervised Learning CS 586 Machine Learning Prepared by Jugal Kalita With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Topics What is classification? Hypothesis classes and learning

More information

7. Boosting and Bagging Bagging

7. Boosting and Bagging Bagging Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known

More information

Probabilistic Approaches

Probabilistic Approaches Probabilistic Approaches Chirayu Wongchokprasitti, PhD University of Pittsburgh Center for Causal Discovery Department of Biomedical Informatics chw20@pitt.edu http://www.pitt.edu/~chw20 Overview Independence

More information

USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS. Leo Breiman Statistics Department University of California at Berkeley Berkeley, CA 94720

USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS. Leo Breiman Statistics Department University of California at Berkeley Berkeley, CA 94720 1 USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS Leo Breiman Statistics Department University of California at Berkeley Berkeley, CA 94720 Technical Report 547, February 1999 Abstract Breiman[1996] showed

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Generic Object Class Detection using Feature Maps

Generic Object Class Detection using Feature Maps Generic Object Class Detection using Feature Maps Oscar Danielsson and Stefan Carlsson CVAP/CSC, KTH, Teknikringen 4, S- 44 Stockholm, Sweden {osda2,stefanc}@csc.kth.se Abstract. In this paper we describe

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Stat 602X Exam 2 Spring 2011

Stat 602X Exam 2 Spring 2011 Stat 60X Exam Spring 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . Below is a small p classification training set (for classes) displayed in

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Version Space Support Vector Machines: An Extended Paper

Version Space Support Vector Machines: An Extended Paper Version Space Support Vector Machines: An Extended Paper E.N. Smirnov, I.G. Sprinkhuizen-Kuyper, G.I. Nalbantov 2, and S. Vanderlooy Abstract. We argue to use version spaces as an approach to reliable

More information

Boosted Backpropagation Learning for Training Deep Modular Networks

Boosted Backpropagation Learning for Training Deep Modular Networks Boosted Backpropagation Learning for Training Deep Modular Networks Alexander Grubb and J. Andrew Bagnell June 2010 CMU-RI-TR-09-45 CMU-CS-09-172 School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1 Homework 2 Due: March 2, 2018 at 7:00PM Written Questions Problem 1: Estimator (5 points) Let x 1, x 2,..., x m be an i.i.d. (independent and identically distributed) sample drawn from distribution B(p)

More information

r=0 for instances discarded at A r=1 for instances discarded at B r=2 for instances discarded at C r=3 for instances at the leaf node

r=0 for instances discarded at A r=1 for instances discarded at B r=2 for instances discarded at C r=3 for instances at the leaf node Boosting Lazy Decision Trees Xiaoli Zhang Fern xz@ecn.purdue.edu Carla E. Brodley brodley@ecn.purdue.edu School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 Abstract

More information

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.

More information

Supervised Learning of Classifiers

Supervised Learning of Classifiers Supervised Learning of Classifiers Carlo Tomasi Supervised learning is the problem of computing a function from a feature (or input) space X to an output space Y from a training set T of feature-output

More information

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM XIAO-DONG ZENG, SAM CHAO, FAI WONG Faculty of Science and Technology, University of Macau, Macau, China E-MAIL: ma96506@umac.mo, lidiasc@umac.mo,

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen

More information

Chapter 1. Maximal Margin Algorithms for Pose Estimation

Chapter 1. Maximal Margin Algorithms for Pose Estimation Chapter 1 Maximal Margin Algorithms for Pose Estimation Ying Guo and Jiaming Li ICT Centre, CSIRO PO Box 76, Epping, NSW 1710, Australia ying.guo, jiaming.li@csiro.au This chapter compares three machine

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

On the selection of decision trees in Random Forests

On the selection of decision trees in Random Forests On the selection of decision trees in Random Forests Simon Bernard, Laurent Heutte, Sébastien Adam To cite this version: Simon Bernard, Laurent Heutte, Sébastien Adam. On the selection of decision trees

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Machine Learning Lecture 11

Machine Learning Lecture 11 Machine Learning Lecture 11 Random Forests 23.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

Combining Models to Improve Classifier Accuracy and Robustness

Combining Models to Improve Classifier Accuracy and Robustness Combining Models to Improve Classifier Accuracy and Robustness Dean W. Abbott Abbott Consulting P.O. Box 22536 San Diego, CA 92192-2536 USA Email: dean@abbott-consulting.com Abstract Recent years have

More information

Subsemble: A Flexible Subset Ensemble Prediction Method. Stephanie Karen Sapp. A dissertation submitted in partial satisfaction of the

Subsemble: A Flexible Subset Ensemble Prediction Method. Stephanie Karen Sapp. A dissertation submitted in partial satisfaction of the Subsemble: A Flexible Subset Ensemble Prediction Method by Stephanie Karen Sapp A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Statistics

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Project Proposals. Xiang Zhang. Department of Computer Science Courant Institute of Mathematical Sciences New York University.

Project Proposals. Xiang Zhang. Department of Computer Science Courant Institute of Mathematical Sciences New York University. Project Proposals Xiang Zhang Department of Computer Science Courant Institute of Mathematical Sciences New York University March 26, 2013 Xiang Zhang (NYU) Project Proposals March 26, 2013 1 / 9 Contents

More information

Support Vector Machine Ensemble with Bagging

Support Vector Machine Ensemble with Bagging Support Vector Machine Ensemble with Bagging Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, Daijin Kim, and Sung-Yang Bang Department of Computer Science and Engineering Pohang University of Science and Technology

More information

Constrained optimization

Constrained optimization Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:

More information

A Practical Generalization of Fourier-based Learning

A Practical Generalization of Fourier-based Learning Adam Drake Dan Ventura Computer Science Department, Brigham Young University, Provo, UT 84602 USA acd2@cs.byu.edu ventura@cs.byu.edu Abstract This paper presents a search algorithm for finding functions

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens Faculty of Economics and Applied Economics Trimmed bagging a Christophe Croux, Kristel Joossens and Aurélie Lemmens DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) KBI 0721 Trimmed Bagging

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Subject-Oriented Image Classification based on Face Detection and Recognition

Subject-Oriented Image Classification based on Face Detection and Recognition 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Empirical Comparison of Bagging-based Ensemble Classifiers

Empirical Comparison of Bagging-based Ensemble Classifiers Empirical Comparison of Bagging-based Ensemble Classifiers Ren Ye, P.N. Suganthan School of Electrical and Electronic Engineering Nanyang Technological University, Singapore {re0003ye, epnsugan@ntu.edu.sg

More information

II Fig. 1. Overview of our video retrieval system modalities as evidence. Text retrieval first finds a set of relevant shots for each query, with asso

II Fig. 1. Overview of our video retrieval system modalities as evidence. Text retrieval first finds a set of relevant shots for each query, with asso Co-Retrieval: A Boosted Reranking Approach for Video Retrieval Rong Yan and Alexander G. Hauptmann School of Computer Science Carnegie Mellon University Pittsburgh PA, 15213, USA fyanrong, alex+g@cs.cmu.edu

More information

The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform

The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform 385 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 The

More information

April 3, 2012 T.C. Havens

April 3, 2012 T.C. Havens April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate

More information

Active learning for visual object recognition

Active learning for visual object recognition Active learning for visual object recognition Written by Yotam Abramson and Yoav Freund Presented by Ben Laxton Outline Motivation and procedure How this works: adaboost and feature details Why this works:

More information

Combining Models to Improve Classifier Accuracy and Robustness 1

Combining Models to Improve Classifier Accuracy and Robustness 1 Combining Models to Improve Classifier Accuracy and Robustness 1 Dean W. Abbott Abbott Consulting P.O. Box 22536 San Diego, CA 92192-2536 USA Email: dean@abbott-consulting.com Abstract Recent years have

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Decision Trees. This week: Next week: Algorithms for constructing DT. Pruning DT Ensemble methods. Random Forest. Intro to ML

Decision Trees. This week: Next week: Algorithms for constructing DT. Pruning DT Ensemble methods. Random Forest. Intro to ML Decision Trees This week: Algorithms for constructing DT Next week: Pruning DT Ensemble methods Random Forest 2 Decision Trees - Boolean x 1 0 1 +1 x 6 0 1 +1-1 3 Decision Trees - Continuous Decision stumps

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

A Bagging Method using Decision Trees in the Role of Base Classifiers

A Bagging Method using Decision Trees in the Role of Base Classifiers A Bagging Method using Decision Trees in the Role of Base Classifiers Kristína Machová 1, František Barčák 2, Peter Bednár 3 1 Department of Cybernetics and Artificial Intelligence, Technical University,

More information

SELF-ORGANIZING methods such as the Self-

SELF-ORGANIZING methods such as the Self- Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Maximal Margin Learning Vector Quantisation Trung Le, Dat Tran, Van Nguyen, and Wanli Ma Abstract

More information

Minimum Risk Feature Transformations

Minimum Risk Feature Transformations Minimum Risk Feature Transformations Shivani Agarwal Dan Roth Department of Computer Science, University of Illinois, Urbana, IL 61801 USA sagarwal@cs.uiuc.edu danr@cs.uiuc.edu Abstract We develop an approach

More information

Globally Induced Forest: A Prepruning Compression Scheme

Globally Induced Forest: A Prepruning Compression Scheme Globally Induced Forest: A Prepruning Compression Scheme Jean-Michel Begon, Arnaud Joly, Pierre Geurts Systems and Modeling, Dept. of EE and CS, University of Liege, Belgium ICML 2017 Goal and motivation

More information

1. What is the VC dimension of the family of finite unions of closed intervals over the real line?

1. What is the VC dimension of the family of finite unions of closed intervals over the real line? Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 March 06, 2011 Due: March 22, 2011 A. VC Dimension 1. What is the VC dimension of the family

More information