CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Similar documents
Viola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz

Classifier Case Study: Viola-Jones Face Detector

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face detection and recognition. Detection Recognition Sally

Learning to Detect Faces. A Large-Scale Application of Machine Learning

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Active learning for visual object recognition

Unsupervised Improvement of Visual Detectors using Co-Training

Class Sep Instructor: Bhiksha Raj

Ensemble Methods, Decision Trees

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

Introduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones

Detecting and Reading Text in Natural Scenes

ii(i,j) = k i,l j efficiently computed in a single image pass using recurrences

Face Detection and Alignment. Prof. Xin Yang HUST

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Face Detection on OpenCV using Raspberry Pi

Machine Learning for Signal Processing Detecting faces (& other objects) in images

Detecting Faces in Images. Detecting Faces in Images. Finding faces in an image. Finding faces in an image. Finding faces in an image

Three Embedded Methods

Generic Object-Face detection

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

CS6716 Pattern Recognition. Ensembles and Boosting (1)

CS294-1 Final Project. Algorithms Comparison

Recap Image Classification with Bags of Local Features

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

Detection of a Single Hand Shape in the Foreground of Still Images

STA 4273H: Statistical Machine Learning

Face and Nose Detection in Digital Images using Local Binary Patterns

Skin and Face Detection

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Project Report for EE7700

Face Tracking in Video

Window based detectors

Machine Learning (CS 567)

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018

Object detection as supervised classification

Traffic Sign Localization and Classification Methods: An Overview

Automatic Initialization of the TLD Object Tracker: Milestone Update

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

FACE DETECTION BY HAAR CASCADE CLASSIFIER WITH SIMPLE AND COMPLEX BACKGROUNDS IMAGES USING OPENCV IMPLEMENTATION

Object recognition (part 1)

Object Category Detection: Sliding Windows

Face Recognition Pipeline 1

Object recognition. Methods for classification and image representation

CS 229 Midterm Review

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju

Previously. Window-based models for generic object detection 4/11/2011

Subject-Oriented Image Classification based on Face Detection and Recognition

Boosting Simple Model Selection Cross Validation Regularization

Study of Viola-Jones Real Time Face Detector

Online Algorithm Comparison points

Face detection, validation and tracking. Océane Esposito, Grazina Laurinaviciute, Alexandre Majetniak

Design guidelines for embedded real time face detection application

CSE 546 Machine Learning, Autumn 2013 Homework 2

Machine Learning Classifiers and Boosting

Viola Jones Simplified. By Eric Gregori

April 3, 2012 T.C. Havens

Towards Enhancing the Face Detectors Based on Measuring the Effectiveness of Haar Features and Threshold Methods

Viola-Jones with CUDA

Object Detection Design challenges

Semi-supervised learning and active learning

Data Mining Lecture 8: Decision Trees


Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Ensemble Image Classification Method Based on Genetic Image Network

Object and Class Recognition I:

Lecture 2 :: Decision Trees Learning

Boosting Sex Identification Performance

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Adaptive Feature Extraction with Haar-like Features for Visual Tracking

Applying Supervised Learning

5 Learning hypothesis classes (16 points)

Neural Networks (pp )

Ensemble Tracking. Abstract. 1 Introduction. 2 Background

Sergiu Nedevschi Computer Science Department Technical University of Cluj-Napoca

Naïve Bayes for text classification

Performance Measures

CSE 446 Bias-Variance & Naïve Bayes

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Criminal Identification System Using Face Detection and Recognition

Machine Learning Techniques for Data Mining

Detecting People in Images: An Edge Density Approach

Simple Model Selection Cross Validation Regularization Neural Networks

CS228: Project Report Boosted Decision Stumps for Object Recognition

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Information Management course

A Bagging Method using Decision Trees in the Role of Base Classifiers

7. Boosting and Bagging Bagging

AdaFlock: Adaptive Feature Discovery for Human-in-the-Loop Predictive Modeling

Practice Questions for Midterm

Perceptron: This is convolution!

Trade-offs in Explanatory

Face tracking. (In the context of Saya, the android secretary) Anton Podolsky and Valery Frolov

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

For Monday. Read chapter 18, sections Homework:

Transcription:

CSE 151 Machine Learning Instructor: Kamalika Chaudhuri

Announcements Midterm on Monday May 21 (decision trees, kernels, perceptron, and comparison to knns) Review session on Friday (enter time on Piazza)

Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of ensemble methods: Bagging Boosting

Boosting Goal: Determine if an email is spam or not based on text in it From: Yuncong Chen Text: 151 homeworks are all graded... From: Work from home solutions Text: Earn money without working! Not Spam Spam Sometimes it is: * Easy to come up with simple rulesofthumb classifiers, * Hard to come up with a single high accuracy rule

Boosting Weak Learner: A simple ruleofthethumb classifier that doesn t necessarily work very well Strong Learner: A good classifier Boosting: How to combine many weak learners into a strong learner?

Boosting 1. How to get a good ruleofthumb? Depends on application e.g, single node decision trees 2. How to choose examples on each round? Focus on the hardest examples so far namely, examples misclassified most often by previous rules of thumb 3. How to combine the rulesofthumb to a prediction rule? Take a weighted majority of the rules

Some Notation Let D be a distribution over examples, and h be a classifier Error of h with respect to D is: err D (h) =Pr (X,Y ) D (h(x) = Y ) h is called a weak learner if errd(h) < 0.5 If you guess completely randomly, then the error is 0.5

Boosting Given training set S = {(x1, y1),..., (xn, yn)}, y in {1, 1} For t = 1,..., T Construct distribution Dt on the examples Find weak learner ht which has small error errdt(ht) wrt Dt Output final classifier Initially, D1(i) = 1/n, for all i (uniform) Given Dt and ht: D t1 (i) = D t(i) Z t exp( α t y i h t (x i )) where: α t = 1 1 2 ln errdt (h t ) err Dt (h t ) Zt = normalization constant

Boosting Given training set S = {(x1, y1),..., (xn, yn)}, y in {1, 1} For t = 1,..., T Construct distribution Dt on the examples Find weak learner ht which has small error errdt(ht) wrt Dt Output final classifier Initially, D1(i) = 1/n, for all i (uniform) Given Dt and ht: D t1 (i) = D t(i) Z t exp( α t y i h t (x i )) where: α t = 1 1 2 ln errdt (h t ) err Dt (h t ) Zt = normalization constant Final classifier: sign( T α t h t (x)) t=1

Boosting: Example Schapire, 2011 weak classifiers: horizontal or vertical halfplanes

Boosting: Example Schapire, 2011 weak classifiers: horizontal or vertical halfplanes

Boosting: Example Schapire, 2011 weak classifiers: horizontal or vertical halfplanes

Boosting: Example Schapire, 2011 weak classifiers: horizontal or vertical halfplanes

The Final Classifier Schapire, 2011 weak classifiers: horizontal or vertical halfplanes

How to Find Stopping Time Given training set S = {(x1, y1),..., (xn, yn)}, y in {1, 1} For t = 1,..., T Construct distribution Dt on the examples Find weak learner ht which has small error errdt(ht) wrt Dt Output final classifier Validation Error To find stopping time, use a validation dataset. Stop when the error on the validation dataset stops getting better, or when you can t find a good rule of thumb. Iterations

Boosting and Overfitting Overfitting can happen with boosting, but often doesn t Boosting single node decision trees on heart disease dataset

Typical Boosting Run The reason is that the margin of classification often increases with boosting Boosting decision trees on letter dataset

Margin Intuitively, margin of classification measures how far the labels are from the labels Small Margin For boosting, margin of example x is: Large Margin T α t h t (x) t=1

Margin Intuitively, margin of classification measures how far the labels are from the labels Small Margin Large Margin Large margin classifiers generalize more easily than small margin ones (This is also the reason why kernels work sometimes!)

Advantages and Disadvantages 1. Fast 2. Simple (easy to program) 3. No parameters to tune (other than the stopping time) 4. Provably effective, provided we can find good weak learners

Application: Face Detection (ViolaJones 00) Given a rectangular window of pixels, is there a face in it? Properties: * Easy to come up with simple rulesofthumb classifiers, * Hard to come up with a single high accuracy rule

ViolaJones Weak Learners A weak learner hf, t, s is described by: * feature f * threshold t * sign s (1 or 1) For an example x, h f,t,s (x) =1, if sf(x) t = 1, otherwise

ViolaJones: 3 Types of Features 1 Feature value = sum of pixel colors in black rectangle sum of pixel colors in white rectangle 2 Feature value = sum of pixel colors in black rectangles sum of pixel colors in white rectangle 3 Feature value = sum of pixel colors in black rectangles sum of pixel colors in white rectangles

ViolaJones Weak Learners A weak learner hf, t, s is described by: * feature f (3 types of rectangular features) * threshold t * sign s (1 or 1) For an example x, h f,t,s (x) =1, if sf(x) t = 1, otherwise

ViolaJones: Computing the Features Precompute and store the values s(x,y) for each (x, y): (x, y) s(x, y) = sum of pixel colors in the black rectangle Now each feature can be computed from adding/subtracting a constant number of s(x,y) s

ViolaJones: Procedure Given training set S = {(x1, y1),..., (xn, yn)}, y in {1, 1} For t = 1,..., T Construct distribution Dt on the examples Find weak learner ht which has small error errdt(ht) wrt Dt Output final classifier Weak learning procedure: Find the feature f, sign s, and threshold t for which the error of hf,t,s on Dt is minimum

Viola and Jones: Procedure Given training set S = {(x1, y1),..., (xn, yn)}, y in {1, 1} For t = 1,..., T Construct distribution Dt on the examples Find weak learner ht which has small error errdt(ht) wrt Dt Output final classifier Initially, D1(i) = 1/n, for all i (uniform) Given Dt and ht: D t1 (i) = D t(i) Z t exp( α t y i h t (x i )) where: α t = 1 1 2 ln errdt (h t ) err Dt (h t ) Zt = normalization constant Final classifier: sign( T α t h t (x)) t=1

Viola and Jones: Some Results

Cascades for Fast Classification!"#$%&'& ( ' )*&,&" ' & Yes!"#$%&.& (. )*&,&". & Yes!!"#$%&/& ( / )*&,&" / & Pass Examples No No No Reject Reject Reject Choose thresholds for low false negative rates Fast classifiers earlier in cascade, slower classifiers later Most examples don t get to the later stages, so system is fast on an average

Boosted Decision Trees Given training set S = {(x1, y1),..., (xn, yn)}, y in {1, 1} For t = 1,..., T Construct distribution Dt on the examples Find weak learner ht which has small error errdt(ht) wrt Dt Output final classifier Weak Learners: Single node decision trees Weak learning process: Find the single node decision tree that has the lowest error on Dt Works extremely well in practical applications

Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of ensemble methods: Bagging Boosting

In Class Problem 0 1 2 3 Weak Learners: Thresholds h0, h1, h2, h3,h0, h1, h2, h3 hi(x) =, if x > i, otherwise hi(x) =, if x < i, otherwise (1) What is the best classifier in this set? What is its error? (2) What is the error of sign(h0(x) h2(x) h1(x))?