CS6716 Pattern Recognition. Ensembles and Boosting (1)

Size: px
Start display at page:

Download "CS6716 Pattern Recognition. Ensembles and Boosting (1)"

Transcription

1 CS6716 Pattern Recognition Ensembles and Boosting (1) Aaron Bobick School of Interactive Computing

2 Administrivia Chapter 10 of the Hastie book. Slides brought to you by Aarti Singh, Peter Orbanz, and friends. Slides posted sorry that took so long Final project discussion

3 ENSEMBLES A randomly chosen hyperplane classifier has an expected error of 0.5 (i.e. 50%).

4 ENSEMBLES A randomly chosen hyperplane classifier has an expected error of 0.5 (i.e. 50%).

5 Ensembles Many random hyperplanes combined by majority vote: Still 0.5 A single classifier slightly better than random: ε. What if we use m such classifiers and take a majority vote? 146 / 531

6 Voting Decision by majority vote m individuals (or classifiers) take a vote. m is an odd number. They decide between two choices; one is correct, one is wrong. After everyone has voted, a decision is made by simple majority. Note: For two-class classifiers ff 1,, ff mm (with output ±1): mm majority vote = ssssss ff jj jj=1

7 Voting likelihoods We make some simplifying assumptions: Each individual makes the right choice with probability pp 0, 1 The votes are independent, i.e. stochastically independent when regarded as random outcomes. Given n voters, the probability the majority makes the right choice: Pr(majority correct) m+ 1 j= 2 m! j!( m (1 p) This formula is known as Condorcet s jury theorem m p j)! j m j 147 / 531

8 Power of weak classifiers Pr(majority correct) m m+ 1 j= 2 m! j!( m j)! p j (1 p) m j

9 ENSEMBLE METHODS An ensemble method makes a prediction by combining the predictions of many classifiers into a single vote. The individual classifiers are usually required to perform only slightly better than random. For two classes, this means slightly more than 50% of the data are classified correctly. Such a classifier is called a weak learner. 149 / 531

10 ENSEMBLE METHODS From before: if the weak learners are random and independent, the prediction accuracy of the majority vote will increase with the number of weak learners. But, since the weak learners are all typically trained on the same training data, producing random, independent weak learners is difficult. (See later for random forests) Different ensemble methods (e.g. Boosting, Bagging, etc) use different strategies to train and combine weak learners that behave relatively independently. 149 / 531

11 Making ensembles work Boosting (today) After training each weak learner, data is modified using weights. Deterministic algorithm. Bagging (bootstrap aggregation from earlier) Each weak learner is trained on a random subset of the data. Random forests (later) Bagging with tree classifiers as weak learners. Uses an additional step to remove dimensions that carry little information. 150 / 531

12 Why boost weak learners? Goal: Automatically categorize type of call requested (Collect, Calling card, Person- to- person, etc.) Easy to find rules of thumb that are often correct. E.g. If card occurs in utterance, then predict calling card Hard to find single highly accurate prediction rule.

13 Fighting the bias- variance tradeoff Simple (a.k.a. weak) learners e.g., naïve Bayes, logistic regression, decision stumps (or shallow decision trees) Are good - Low variance, don t usually overfit Are bad - High bias, can t solve hard learning problems Can we make weak learners always good??? No!!! But often yes

14 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the input space Output class: (Weighted) vote of each classifier Classifiers that are most sure will vote with more conviction Classifiers will be most sure about a particular part of the space On average, do better than single classifier! h1(x) h2(x) H: X Y (-1,1) H(X) = h1(x)+h2(x) H(X) = sign( αt ht(x)) t 1-1???? 1-1 weights

15 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the input space Output class: (Weighted) vote of each classifier Classifiers that are most sure will vote with more conviction Classifiers will be most sure about a par' cular part of the space On average, do better than single classifier! But how do you??? force classifiers h tt space? to learn about different parts of the input weight the votes of different classifiers? αα tt 1 5

16 Boosting (Shapire, 1989) Idea: given a weak learner, run it multiple times (reweighted) training data, then let learned classifiers vote On each iteration tt : weight each training example by "how incorrectly" it was classified Learn a weak hypothesis h tt A strength for this hypothesis αα tt Final classifier: HH xx = ssssss(σαα tt h tt xx ) Practically useful AND Theoretically interesting

17 Learning from weighted data Consider a weighted dataset D(i) weight of i th training example xx ii, yy ii Interpretations: i th training example counts as D(i) examples If you were to resample data, you would get more samples of heavier data points Now, in all calculations, whenever used, i th training example counts as D(i) examples e.g., in MLE redefine Count(Y=y) to be weighted count Unweighted data m Count(Y=y) = 1(Y i =y) i =1 Weights D(i) m Count(Y=y) = D(i)1(Y i =y) i =1 1 7

18 Boosting weak learners

19 AdaBoost.M1 (1)

20 AdaBoost.M1 (2)

21 Example sequence

22 Boosting Iteratively reweighting training samples. Higher weights to previously misclassified samples. ICCV09 Tutorial Tae-Kyun Kim University of Cambridge rounds 22 /76

23 Not mysterious????

24 Minimizing a loss function

25 Forward stage additive model

26 Exponential loss (1)

27 Exponential loss (2) At each node/level:

28

29 Analyzing training error Training error of final classifier is bounded by: Convex upper bound Where 0/1 loss exp loss If boosting can make upper bound 0, then training error

30 Why zero training error isn t the end?!?!

31 Boosting results Digit recognition [Schapire, 1989] Test Error Training Error Boosting often but not always, Robust to overfitting Test set error decreases even after training error is zero 31

32 Boosting and Logistic Regression Logistic regression equivalent to minimizing log loss Boosting minimizes similar loss function!! Weighted average of weak learners log loss 0/1 loss exp loss 1 Both smooth approximations of 0/1 loss!

33 Boosting and Logistic Regression Logistic regression: Boosting: Minimize log loss Minimize exp loss Define Define where x j predefined features (linear classifier) Jointly optimize over all weights w0, w1, w2 where h t (x) defined dynamically to fit data (not a linear classifier) Weights αα tt learned per iteration t incrementally 33

34 Effect of Outliers Good : Can identify outliers since focuses on examples that are hard to categorize Bad : : Too many outliers can degrade classification performance dramatically increase time to convergence 34

35 Bagging [Breiman, 1996] Related approach to combining classifiers: 1. Run independent weak learners on bootstrap replicates (sample with replacement) of the training set 2. Average/vote over weak hypotheses Bagging Resamples data points Weight of each classifier is the same Only variance reduction vs. Boosting Reweights data points (modifies their distribution) Weight is dependent on classifier s accuracy Both bias and variance reduced learning rule becomes more complex with each iteration

36 Example AdaBoost test error (simulated data).., Weak learners used are decision stumps..., Combining many trees of depth 1 yields much better results than a single large tree. 155 / 531

37 SPAM DATA Tree classifier: 9.3% overall error rate Boosting with decision stumps: 4.5% Figure shows feature selection results of Boosting.

38 Best known boosting application Face Detection Viola/Jones But it s easy to forget that two things make this work: Boosting they used real AdaBoost Cascade architecture P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.

39 Face detection Searching for faces in images Two problems: Face detection - Find locations of all faces in image. Two classes. Face recognition Identify a person depicted in an image by recognizing the face. One class per person to be identified + background class (all other people). "Face detection can be regarded as a solved problem." "Face recognition is not solved." Face detection as a classification problem Divide image into patches Classify each patch as "face" or "not face" 161 / 531

40 Face detection Basic idea: slide a window across image and evaluate a face model at every location

41 Viola Jones Technique Overview Three major contributions/phases of the algorithm : Feature extraction Learning using cascaded boosting and decision stumps Multi-scale detection algorithm Feature extraction and feature evaluation. Rectangular features are used, with a new image representation their calculation is very fast. (First) classifier was actual AdaBoost Maybe first demonstration to computer vision that a combination of simple classifiers is very effective

42 Feature Extraction Features are extracted from sub windows of a sample image. The base size for a sub window is 24 by 24 pixels. Basic features are difference of sums of rectangles (white minus black below). Each of the four feature types are scaled and shifted across all possible combinations

43 Example Source Result

44 Key to feature computation: Integral images Integral image is new image S(x,y) from I(x,y) such that such that xx SS xx, yy = II(ii, jj) yy ii=1 jj=1

45 Fast Computation of Pixel Sums MATLAB: ii = cumsum(cumsum(double(i)), 2);

46 Feature selection For a 24x24 detection region, the number of possible rectangle features is ~160,000!

47 Feature selection For a 24x24 detection region, the number of possible rectangle features is ~160,000! At test time, it is impractical to evaluate the entire feature set Can we create a good classifier using just a small subset of all possible features? How to select such a subset? No surprise: Boosting!

48 Paul s slide: Boosting Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier A weak learner need only do better than chance Training consists of multiple boosting rounds During each boosting round, we select a weak learner that does well on examples that were hard for the previous weak learners Hardness is captured by weights attached to training examples Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): , September, 1999.

49 Paul s Slide: Boosting vs. SVM Advantages of boosting Integrates classification with feature selection Complexity of training is linear instead of quadratic in the number of training examples Flexibility in the choice of weak learners, boosting scheme Testing is fast Easy to implement Disadvantages Needs many training examples Often doesn t work as well as SVM (especially for many-class problems)

50 Boosting for face detection Define weak learners based on rectangle features For each round of boosting: Evaluate each rectangle filter on each example Select best threshold for each filter Select best filter/threshold combination Reweight examples Computational complexity of learning: O(MNK) M rounds, N examples, K features

51 Boosting for face detection First two features selected by boosting: This feature combination can yield 100% detection rate and 50% false positive rate

52 Boosting for face detection A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 in Not good enough! Receiver operating characteristic (ROC) curve

53 Challenges of face detection Sliding window detector must evaluate tens of thousands of location/scale combinations Faces are rare: 0 10 per image For computational efficiency, we should try to spend as little time as possible on the non-face windows A megapixel image has 10 6 pixels and a comparable number of candidate face locations To avoid having a false positive in every image, our false positive rate has to be less than An unbalanced system with the positive class very small. Standard training algorithm can achieve good error rate by classifying all data as negative. The error rate will be precisely the proportion of points in positive class.

54 Attentional cascade We start with simple classifiers which reject many of the negative sub-windows while detecting almost all positive subwindows Positive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so on A negative outcome at any point leads to the immediate rejection of the sub-window IMAGE SUB-WINDOW Classifier 1 T Classifier 2 T Classifier 3 T FACE F F F NON-FACE NON-FACE NON-FACE

55 Why does a cascade work? We have to consider two rates false positive rate FPR(f j ) = detection rate DR(f j ) = # negative points classified as "+1" # negative training points at stage j #correctly classified positive points # positive training points at stage j We want to achieve a low value of FPR(f ) and a high value of DR(f ). Class imbalance: Number of faces classified as background is (size of face class) (1 DR(f )) We would like to see a decently high detection rate, say 90% Number of background patches classified as faces is (size of background class) (FPR(f )) Since background class is huge, FPR(f ) has to be very small to yield roughly the same amount of errors in both classes. How small?

56 Why does a cascade work? Cascade detection rate The rates of the overall cascade classifier f are Suppose we use a 10-stage cascade (k = 10) and that each DR(ff jj ) is 99% and we permit FPR(ff jj ) of 30%. We obtain DR ff = and a FPR ff = Sine k is powerful on false positives, we can set each function to have a very high DR and live with a fair FP.

57 Training the cascade Set target detection and false positive rates for each stage Keep adding features to the current stage until its target rates have been met Need to lower AdaBoost threshold to maximize detection (as opposed to minimizing total classification error) Test on a validation set If the overall false positive rate is not low enough, then add another stage Use false positives from current stage as the negative training examples for the next stage

58 Training the cascade Training procedure 1. User selects acceptable rates (FPR and DR) for each level of cascade. 2. At each level of cascade: Train boosting classifier with final AdaBoost threshold lowered to maximize detection (as opposed to minimizing total classification error) Gradually increase number of selected features until overall rates achieved. Test on a validation set 3. If the overall false positive rate is not low enough, then add another stage Use of training data. Each training step uses: All positive examples (= faces). Negative examples (= non-faces) that are misclassified at previous cascade layer. (plus more?)

59 Classifier cascades Training a cascade: Use imbalanced loss (very low false negative rate for each ff ii ). 1. Train classifier ff 1 on entire training data set. 2. Remove all x i in negative class which ff 1 classifies correctly from training set ( Get some more negatives ) 3. On smaller training set, train ff 2 4. Continue. 5. On remaining data at final stage, train ff kk. Rapid classifying with a cascade If any ff jj classifies x as negative, f (x) = 1. Only if allff jj classify x as positive, f (x) = +1.

60 The implemented system Training Data 5000 faces All frontal, rescaled to 24x24 pixels 300 million non-faces 9500 non-face images Faces are normalized Scale, translation Many variations Across individuals Illumination Pose

61 System performance Training time: weeks on 466 MHz Sun workstation 38 layers, total of 6061 features Average of 10 features evaluated per window on test set On a 700 Mhz Pentium III processor, the face detector can process a 384 by 288 pixel image in about.067 seconds 15 Hz 15 times faster than previous detector of comparable accuracy (Rowley et al., 1998)

62 Output of Face Detector on Test Images

63 Other detection tasks Facial Feature Localization Profile Detection Male vs. female

64 Profile Detection

65 Profile Features

66 Beyond AdaBoost

67 Why AdaBoost Works: (1) Minimizing a loss function

68 (2) Forward stage additive model

69 (3) Exponential Loss

70 (3) Exponential loss an upper bound on classification error Training error of final classifier is bounded by: Convex upper bound Where 0/1 loss exp loss If boosting can make upper bound 0, then training error

71 Source CS7616 Pattern Recognition A. Bobick Why Boosting Works? (Cont d)

72 Loss Function In Hastie: easy to show : So AdaBoost is estimating one-half the log odds that Pr YY = 1 xx. So doing classification if greater than zero makes sense. Above also implies:

73 A better loss function: A different loss function can be derived from a different assumed form of probability (logit function in f): Binomial deviance Minimized by the same f(x) but not the same function.

74 More on Loss Functions and Classification yy ff xx is called the Margin The classification rule implies that observations with positive margin yy ii ff(xxxx) > 0 were classified correctly, but the negative margin ones are incorrect The decision boundary is given by the f(x)=0 The loss criterion should penalize the negative margins more heavily than the positive ones. But how much more

75 Loss Functions for Two-Class Classification Exponential goes very high as margin gets bad. Makes AdaBoost less robust to mislabeled data or too many outliers.

76 How to fix boosting? AdaBoost analytically minimizes exponential loss. Clean equations Good performance in good cases. But, exponential loss sensitive to outliers, misclassified points. The binomial deviance loss function is better behaved. We should be able to boost any weak learner like trees. Can we boost trees for binomial deviance? Not analytically But we can numerically gradient boosting And can be improved by other tricks: Stochastic sampling of training points at each stage Regularization or diminishing effects of each stage

77 Source CS7616 Pattern Recognition A. Bobick Loss Function (Cont d)

78 Trees Reviewed (in Hastie notation) Trees partition the feature vector (joint predictor values) into disjoint regions RR jj, jj = 1,.., JJ represented by the terminal nodes A constant γγ jj is assigned to each region, whether regression or classification The predictive/classification rule xx RR jj ff xx = γγ jj The tree is: TT xx; Θ = jj γγ jj II(xx RRRR) where Θ are the parameters of the splits RR jj and the values γγ jj We want to minimize the loss: Θ ˆ = arg min Lyγ (, ) Θ i j = J j 1 x R i j

79 Boosting Trees Finding γγ jj given RR jj : this is easy Finding RR jj : this is difficult, we typically approximate. We described the greedy top-down recursive partitioning algorithm A boosted tree, is sum of such trees, Where at each stage m we minimize the :

80 Boosting Trees (cont d) Given the regions RR jj,mm the correct γγ jj,mm is whatever minimizes the loss function: For exponential loss, if we restrict our trees to be weak learners outputing { 1, +1} then this is exactly AdaBoost and train the same way. But if we want a different loss function, need numerical method.

81 Numerical Optimization Loss function in using prediction f(x) for y is Lf ( ) = Ly (, f( x)) i= 1 The goal is to minimize L(f) w.r.t f, where f is the sum of the trees. But ignore the sum of tree constraint for now. Just think about minimization. where the parameters f are the values of the approximating function ff(xx ii ) at each of the N data points xx ii : ff = ff xx 1,, ff xx NN. Numerical optimization successively approximates ff by steps N M f = h, h R M m m m= 1 Solve it as a sum of component vectors, where ff 0 = h 0 is the initial guess and each successive ff mm is induced based on the current parameter vector ff mm 1. i i N

82 Steepest Descent 1. Choose h mm = ρ mm gg mm, where ρ mm is a scalar and gg mm is the gradient of L(f) evaluated at ff = ff mm 1 2. The step length ρ m is the (line search) solution to ρ mm = arg min ρρ LL(ff mm 1 ρρgg mm ) 3. The current solution is then updated: ff mm = ff mm 1 ρ mm gg mm

83 Gradient Boosting Forward stagewise boosting is also a very greedy algorithm The tree predictions can be thought about like negative gradients The only difficulty is that the tree components are not arbitrary They are constrained to be the predictions of a JJ mm -terminal node decision tree, whereas the negative gradient is unconstrained steepest descent Unfortunately, the gradient is only defined at the training data points and is not applicable to generalizing ff mm xx to new data

84 Gradient boosting (cont) For classification: where We re going to approximate gradient by finding a decision tree that s as close as possible to the gradient.

85 Gradient boosting

86 Gradient boosting results

87 More tweaks If we fit the best tree possible, it will yield a tree that is deep, as if it were the last tree of the boosting ensemble. Large trees permit interaction between elements. Prevent by either penalizing based upon tree size or, believe it or not, let J=6. There is the question of how many stages. If go too far can overfit (worse than Adaboost?) Need shrinkage from ML: where 0 < vv < 1. Text actually describes v=.01

88

89 Interpretation Single decision trees are often very interpretable Linear combination of trees loses this important feature We often learn the relative importance or contribution of each input variable in predicting the response Define a measure of relevance for each predictor XX ll, sum over the J-1 internal nodes of the tree

90 Interpretation (Cont d) Relevance of a predictor XX ll in a single tree (where v(t) is the feature selected): where ıı tt 2 is measure of improvement after tree is split is applied to the node. In a boosted tree the squared relevance: Since relative, make the max 100.

91 Relevance

92 Illustration (California Housing)

93 Boosting Summary Combine weak classifiers to obtain very strong classifier Weak classifier slightly better than random on training data Resulting very strong classifier can eventually provide zero training error and AdaBoost algorithm Boosting v. Logistic Regression Similar loss functions LR is single optimization v. Incrementally improving classification in Boosting Boosting is very popular for applications: Boosted decision stumps easy to build training system Very simple to implement and efficient

Classifier Case Study: Viola-Jones Face Detector

Classifier Case Study: Viola-Jones Face Detector Classifier Case Study: Viola-Jones Face Detector P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection.

More information

Face Detection and Alignment. Prof. Xin Yang HUST

Face Detection and Alignment. Prof. Xin Yang HUST Face Detection and Alignment Prof. Xin Yang HUST Many slides adapted from P. Viola Face detection Face detection Basic idea: slide a window across image and evaluate a face model at every location Challenges

More information

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Detection Recognition Sally History Early face recognition systems: based on features and distances

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Recap Image Classification with Bags of Local Features

Recap Image Classification with Bags of Local Features Recap Image Classification with Bags of Local Features Bag of Feature models were the state of the art for image classification for a decade BoF may still be the state of the art for instance retrieval

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

Skin and Face Detection

Skin and Face Detection Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost

More information

Generic Object-Face detection

Generic Object-Face detection Generic Object-Face detection Jana Kosecka Many slides adapted from P. Viola, K. Grauman, S. Lazebnik and many others Today Window-based generic object detection basic pipeline boosting classifiers face

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Learning to Detect Faces. A Large-Scale Application of Machine Learning

Learning to Detect Faces. A Large-Scale Application of Machine Learning Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P. Viola and M. Jones, International Journal of Computer

More information

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes 1 CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215

More information

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging Prof. Daniel Cremers 8. Boosting and Bagging Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T

More information

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Active learning for visual object recognition

Active learning for visual object recognition Active learning for visual object recognition Written by Yotam Abramson and Yoav Freund Presented by Ben Laxton Outline Motivation and procedure How this works: adaboost and feature details Why this works:

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Announcements Midterm on Monday May 21 (decision trees, kernels, perceptron, and comparison to knns) Review session on Friday (enter time on Piazza)

More information

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Computer Vision Group Prof. Daniel Cremers. 6. Boosting Prof. Daniel Cremers 6. Boosting Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T (x) To do this,

More information

Introduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones

Introduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones Rapid Object Detection using a Boosted Cascade of Simple Features By Paul Viola & Michael Jones Introduction The Problem we solve face/object detection What's new: Fast! 384X288 pixel images can be processed

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Machine Learning for Signal Processing Detecting faces (& other objects) in images

Machine Learning for Signal Processing Detecting faces (& other objects) in images Machine Learning for Signal Processing Detecting faces (& other objects) in images Class 8. 27 Sep 2016 11755/18979 1 Last Lecture: How to describe a face The typical face A typical face that captures

More information

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets! Image Analysis 2 Face detection and recognition Window-based face detection: The Viola-Jones algorithm Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision:

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 03/18/10 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Goal: Detect all instances of objects Influential Works in Detection Sung-Poggio

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Window based detectors

Window based detectors Window based detectors CS 554 Computer Vision Pinar Duygulu Bilkent University (Source: James Hays, Brown) Today Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Face/Flesh Detection and Face Recognition

Face/Flesh Detection and Face Recognition Face/Flesh Detection and Face Recognition Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. The Viola

More information

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade Paul Viola and Michael Jones Mistubishi Electric Research Lab Cambridge, MA viola@merl.com and mjones@merl.com Abstract This

More information

Subject-Oriented Image Classification based on Face Detection and Recognition

Subject-Oriented Image Classification based on Face Detection and Recognition 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Three Embedded Methods

Three Embedded Methods Embedded Methods Review Wrappers Evaluation via a classifier, many search procedures possible. Slow. Often overfits. Filters Use statistics of the data. Fast but potentially naive. Embedded Methods A

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Object recognition (part 1)

Object recognition (part 1) Recognition Object recognition (part 1) CSE P 576 Larry Zitnick (larryz@microsoft.com) The Margaret Thatcher Illusion, by Peter Thompson Readings Szeliski Chapter 14 Recognition What do we mean by object

More information

Object detection as supervised classification

Object detection as supervised classification Object detection as supervised classification Tues Nov 10 Kristen Grauman UT Austin Today Supervised classification Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Recognition problems. Face Recognition and Detection. Readings. What is recognition?

Recognition problems. Face Recognition and Detection. Readings. What is recognition? Face Recognition and Detection Recognition problems The Margaret Thatcher Illusion, by Peter Thompson Computer Vision CSE576, Spring 2008 Richard Szeliski CSE 576, Spring 2008 Face Recognition and Detection

More information

April 3, 2012 T.C. Havens

April 3, 2012 T.C. Havens April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate

More information

Unsupervised Improvement of Visual Detectors using Co-Training

Unsupervised Improvement of Visual Detectors using Co-Training Unsupervised Improvement of Visual Detectors using Co-Training Anat Levin, Paul Viola, Yoav Freund ICCV 2003 UCSD CSE252c presented by William Beaver October 25, 2005 Task Traffic camera taking video You

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

7. Boosting and Bagging Bagging

7. Boosting and Bagging Bagging Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

CSE 546 Machine Learning, Autumn 2013 Homework 2

CSE 546 Machine Learning, Autumn 2013 Homework 2 CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Face Detection on OpenCV using Raspberry Pi

Face Detection on OpenCV using Raspberry Pi Face Detection on OpenCV using Raspberry Pi Narayan V. Naik Aadhrasa Venunadan Kumara K R Department of ECE Department of ECE Department of ECE GSIT, Karwar, Karnataka GSIT, Karwar, Karnataka GSIT, Karwar,

More information

Object recognition. Methods for classification and image representation

Object recognition. Methods for classification and image representation Object recognition Methods for classification and image representation Credits Slides by Pete Barnum Slides by FeiFei Li Paul Viola, Michael Jones, Robust Realtime Object Detection, IJCV 04 Navneet Dalal

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Class Sep Instructor: Bhiksha Raj

Class Sep Instructor: Bhiksha Raj 11-755 Machine Learning for Signal Processing Boosting, face detection Class 7. 14 Sep 2010 Instructor: Bhiksha Raj 1 Administrivia: Projects Only 1 group so far Plus one individual Notify us about your

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Object Detection Design challenges

Object Detection Design challenges Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

Face and Nose Detection in Digital Images using Local Binary Patterns

Face and Nose Detection in Digital Images using Local Binary Patterns Face and Nose Detection in Digital Images using Local Binary Patterns Stanko Kružić Post-graduate student University of Split, Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture

More information

Detecting Faces in Images. Detecting Faces in Images. Finding faces in an image. Finding faces in an image. Finding faces in an image

Detecting Faces in Images. Detecting Faces in Images. Finding faces in an image. Finding faces in an image. Finding faces in an image Detecting Faces in Images Detecting Faces in Images 37 Finding face like patterns How do we find if a picture has faces in it Where are the faces? A simple solution: Define a typical face Find the typical

More information

Previously. Window-based models for generic object detection 4/11/2011

Previously. Window-based models for generic object detection 4/11/2011 Previously for generic object detection Monday, April 11 UT-Austin Instance recognition Local features: detection and description Local feature matching, scalable indexing Spatial verification Intro to

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Study of Viola-Jones Real Time Face Detector

Study of Viola-Jones Real Time Face Detector Study of Viola-Jones Real Time Face Detector Kaiqi Cen cenkaiqi@gmail.com Abstract Face detection has been one of the most studied topics in computer vision literature. Given an arbitrary image the goal

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Recognition Part I: Machine Learning. CSE 455 Linda Shapiro

Recognition Part I: Machine Learning. CSE 455 Linda Shapiro Recognition Part I: Machine Learning CSE 455 Linda Shapiro Visual Recognition What does it mean to see? What is where, Marr 1982 Get computers to see Visual Recognition Verification Is this a car? Visual

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG)

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG) High Level Computer Vision Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG) Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de http://www.d2.mpi-inf.mpg.de/cv

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju

Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju Background We are adept at classifying actions. Easily categorize even with noisy and small images Want

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

Face detection. Bill Freeman, MIT April 5, 2005

Face detection. Bill Freeman, MIT April 5, 2005 Face detection Bill Freeman, MIT 6.869 April 5, 2005 Today (April 5, 2005) Face detection Subspace-based Distribution-based Neural-network based Boosting based Some slides courtesy of: Baback Moghaddam,

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Viola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz

Viola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz Viola Jones Face Detection Shahid Nabi Hiader Raiz Muhammad Murtaz Face Detection Train The Classifier Use facial and non facial images Train the classifier Find the threshold value Test the classifier

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018 Lecture 4 Face Detection and Classification Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018 Any faces contained in the image? Who are they? Outline Overview Face detection Introduction

More information

Bias-Variance Analysis of Ensemble Learning

Bias-Variance Analysis of Ensemble Learning Bias-Variance Analysis of Ensemble Learning Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Outline Bias-Variance Decomposition

More information

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

RGBD Face Detection with Kinect Sensor. ZhongJie Bi RGBD Face Detection with Kinect Sensor ZhongJie Bi Outline The Existing State-of-the-art Face Detector Problems with this Face Detector Proposed solution to the problems Result and ongoing tasks The Existing

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

For Monday. Read chapter 18, sections Homework:

For Monday. Read chapter 18, sections Homework: For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron

More information

A Bagging Method using Decision Trees in the Role of Base Classifiers

A Bagging Method using Decision Trees in the Role of Base Classifiers A Bagging Method using Decision Trees in the Role of Base Classifiers Kristína Machová 1, František Barčák 2, Peter Bednár 3 1 Department of Cybernetics and Artificial Intelligence, Technical University,

More information

LARGE MARGIN CLASSIFIERS

LARGE MARGIN CLASSIFIERS Admin Assignment 5 LARGE MARGIN CLASSIFIERS David Kauchak CS 451 Fall 2013 Midterm Download from course web page when you re ready to take it 2 hours to complete Must hand-in (or e-mail in) by 11:59pm

More information

Model combination. Resampling techniques p.1/34

Model combination. Resampling techniques p.1/34 Model combination The winner-takes-all approach is intuitively the approach which should work the best. However recent results in machine learning show that the performance of the final model can be improved

More information

CGBoost: Conjugate Gradient in Function Space

CGBoost: Conjugate Gradient in Function Space CGBoost: Conjugate Gradient in Function Space Ling Li Yaser S. Abu-Mostafa Amrit Pratap Learning Systems Group, California Institute of Technology, Pasadena, CA 91125, USA {ling,yaser,amrit}@caltech.edu

More information