Active learning for visual object recognition

Similar documents
Classifier Case Study: Viola-Jones Face Detector

Generic Object-Face detection

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face Detection and Alignment. Prof. Xin Yang HUST

Face Detection on OpenCV using Raspberry Pi

Face detection and recognition. Detection Recognition Sally

Face and Nose Detection in Digital Images using Local Binary Patterns

Object and Class Recognition I:

Face Tracking in Video

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

Recap Image Classification with Bags of Local Features

Introduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones

Viola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz

Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju

Object Detection Design challenges

Unsupervised Improvement of Visual Detectors using Co-Training

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

Study of Viola-Jones Real Time Face Detector

Learning to Detect Faces. A Large-Scale Application of Machine Learning

2 Cascade detection and tracking

Window based detectors

Subject-Oriented Image Classification based on Face Detection and Recognition


CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Object Category Detection: Sliding Windows

Combining adaboost with a Hill-Climbing evolutionary feature search for efficient training of performant visual object detector

Object Category Detection: Sliding Windows

Previously. Window-based models for generic object detection 4/11/2011

Detecting and Reading Text in Natural Scenes

Three Embedded Methods

ii(i,j) = k i,l j efficiently computed in a single image pass using recurrences

Automatic Initialization of the TLD Object Tracker: Milestone Update

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

AN HARDWARE ALGORITHM FOR REAL TIME IMAGE IDENTIFICATION 1

A Survey of Various Face Detection Methods

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods

A General Greedy Approximation Algorithm with Applications

Skin and Face Detection

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018

Face detection using generalised integral image features

Out-of-Plane Rotated Object Detection using Patch Feature based Classifier

Color Model Based Real-Time Face Detection with AdaBoost in Color Image

Rapid Object Detection Using a Boosted Cascade of Simple Features

Object recognition (part 1)

Detection of a Single Hand Shape in the Foreground of Still Images

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Detecting Pedestrians Using Patterns of Motion and Appearance

Face tracking. (In the context of Saya, the android secretary) Anton Podolsky and Valery Frolov

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Ensemble Methods, Decision Trees

FACE DETECTION BY HAAR CASCADE CLASSIFIER WITH SIMPLE AND COMPLEX BACKGROUNDS IMAGES USING OPENCV IMPLEMENTATION

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

REAL TIME HUMAN FACE DETECTION AND TRACKING

Viola Jones Simplified. By Eric Gregori

Face Recognition Pipeline 1

Adaboost Classifier by Artificial Immune System Model

Object detection as supervised classification

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Detecting Pedestrians Using Patterns of Motion and Appearance

DEPARTMENT OF INFORMATICS

Machine Learning for Signal Processing Detecting faces (& other objects) in images

CS6716 Pattern Recognition. Ensembles and Boosting (1)

Rapid Object Detection using a Boosted Cascade of Simple Features

Ensemble Image Classification Method Based on Genetic Image Network

Progress Report of Final Year Project

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

Object recognition. Methods for classification and image representation

Viola-Jones with CUDA

Boosting Sex Identification Performance

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Detecting People in Images: An Edge Density Approach

Ensembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

Discriminative classifiers for image recognition

Postprint.

A Parts-based Person Detector

Ensemble Tracking. Abstract. 1 Introduction. 2 Background

Window Detection in Facades

Parameter Sensitive Detectors

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Face/Flesh Detection and Face Recognition

Face Recognition Using Ordinal Features

Categorization by Learning and Combining Object Parts

A ROBUST NON-LINEAR FACE DETECTOR

Detecting Pedestrians Using Patterns of Motion and Appearance

Object detection. Asbjørn Berge. INF 5300 Advanced Topic: Video Content Analysis. Technology for a better society

Generic Object Class Detection using Feature Maps

Traffic Sign Localization and Classification Methods: An Overview

Discriminative Feature Co-occurrence Selection for Object Detection

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Boosting Simple Model Selection Cross Validation Regularization

Short Paper Boosting Sex Identification Performance

Triangle Method for Fast Face Detection on the Wild

Face detection. Bill Freeman, MIT April 5, 2005

CGBoost: Conjugate Gradient in Function Space

Parallel Tracking. Henry Spang Ethan Peters

Transcription:

Active learning for visual object recognition Written by Yotam Abramson and Yoav Freund Presented by Ben Laxton Outline Motivation and procedure How this works: adaboost and feature details Why this works: boosting the margin for discriminative classifiers How well this works: results 1

Motivation Observations: - Machine learning methods are preferred for visual object detection. - Most, if not all, require large amounts of hand labeled training data - It is hard for people to identify hard instances for training Motivation How much data is required? -Most require on the order of thousands of labeled positive examples -Viola-Jones used 4916 labeled faces for training 4916 2

Motivation How much data is required? -Most require on the order of thousands of labeled positive examples -Viola-Jones used 4916 labeled faces for training 4916 * 10(sec) Motivation How much data is required? -Most require on the order of thousands of labeled positive examples -Viola-Jones used 4916 labeled faces for training 4916 * 10(sec) = 14 hours 3

Motivation Hard examples provide more information than easy ones How do we identify hard examples? Interaction between classifier and human operator. Motivation Hard examples provide more information than easy ones How do we identify hard examples? Interaction between classifier and human operator. 4

Motivation Hard examples provide more information than easy ones How do we identify hard examples? Interaction between classifier and human operator. Motivation Hard examples provide more information than easy ones How do we identify hard examples? Interaction between classifier and human operator. We will see that the notion of margin provides a nice framework for selecting these hard examples Easy values to classify Hard examples provide the most new information 5

Procedure SEVILLE (Semi automatic Visual Learning) provides an intuitive framework to achieve these goals Combines adaboost detectors (similar to Viola- Jones) with human interaction in an iterative fashion Can be used as a good labeling framework in general Procedure First, play input images and manually collect a small number of positive and negative examples Train a classifier on these examples 6

Procedure Upon running the classifier, SEVILLE displays examples within the margins Label all of these examples and re-train a classifier Procedure Continue this process on new frames. The resulting classifier should converge to optimal 7

Outline Motivation and procedure How this works: adaboost and feature details Why this works: boosting the margin for discriminative classifiers How well this works: results SEVILLE Implementation Structured after the detector developed by Viola and Jones Uses Adaboost to boost weak classifiers Additional idea of intensity-invariant, fast, single pixel-based detectors 8

Adaboost Boosting methods produce accurate or strong classifiers by combining many weak classifiers in a principled framework. Adaboost is one algorithm for doing this: Training error approaches 0 exponentially in the number of rounds. Probabilistic upper bound for generalization error Tends not to overfit Adaboost The weak requirement: (x1,y1,1/n), (xn,yn,1/n) weak learner h1 (x1,y1,w1), (xn,yn,wn) (x1,y1,w1), (xn,yn,wn) (x1,y1,w1), (xn,yn,wn) (x1,y1,w1), (xn,yn,wn) (x1,y1,w1), (xn,yn,wn) (x1,y1,w1), (xn,yn,wn) (x1,y1,w1), (xn,yn,wn) (x1,y1,w1), (xn,yn,wn) h3 h4 h5 h6 h7 h8 h9 ht Sign[ α1 h1 + α2 h2 + ht Strong Rule: + ατ ] Slide Figure Borrowed from Yoav Freund s Into to Boosting Presentation 9

where Adaboost Intuitively Adaboost hones in on the best simple features. This is important since there is a huge number of features: For haar-like features used by Viola and Jones, with a window size of 24x24, there are 45,396 possibilities. Adaboost Given example images (x 1,y 1 ),, (x n,y n ) where y i = 0, 1 for negative and positive examples respectively. Initialize weights w 1,i = 1/(2m), 1/(2l) for training example i, where m and l are the number of negatives and positives respectively. For t = 1 T 1) Normalize weights so that w t is a distribution 2) For each feature j train a classifier h j and evaluate its error ε j with respect to w t. 3) Chose the classifier h j with lowest error. 4) Update weights according to: 1 = β ε i wt + 1, i wt, i t where e i = 0 is x i is classified correctly, 1 otherwise, and β ε t ε = t 1 t The final strong classifier is: T 1 T, 1 = ( x) h x) t= α t ht t= 2 0 otherwise ( 1 1α t α t = 1 log( ) β t 10

Viola-Jones The work by Viola and Jones extends the basic adaboost classifier in the following ways: Rediscovery of the integral image representation for fast feature calculations Use of haar-like feature basis as weak classifiers Introduction of a cascaded classifier Viola-Jones: Integral image Def: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive. Using the following two recurrences, where i(x,y) is the pixel value of original image at the given location and s(x,y) is the cumulative column sum, we can calculate the integral image representation of the image in a single pass. (0,0) x s(x,y) = s(x,y-1) + i(x,y) ii(x,y) = ii(x-1,y) + s(x,y) y (x,y) Taken from Gyozo Gidofalvi s presentation on Viola-Jones 11

Viola-Jones: Simple features Using the integral image representation one can compute the value of any rectangular sum in constant time. For example the integral sum inside rectangle D we can compute as: ii(4) + ii(1) ii(2) ii(3) As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively. Taken from Gyozo Gidofalvi s presentation on Viola-Jones Viola-Jones: Simple Features Resulting simple features offer some insight into what the classifier is looking at First discriminative feature Second discriminative feature 12

Viola-Jones: Cascaded Classifier Using a cascaded classifier can improve the running time substantially Discard easier examples early on and focus attention on harder ones. Takes the form of a degenerate decision tree: All examples Reject Accept Adaboost: Cascaded Classifier Detection and false-positive rates can now be broken down on a per-level basis: e.g. To get a 10-stage cascaded classifier with a detection rate of 90% and false positive rate of 1/1000000 13

Adaboost: Cascaded Classifier Detection and false-positive rates can now be broken down on a per-level basis: e.g. To get a 10-stage cascaded classifier with a detection rate of 90% and false positive rate of 1/1000000 Each stage must have a detection rate of.99 since.9~.99 10 Adaboost: Cascaded Classifier Detection and false-positive rates can now be broken down on a per-level basis: e.g. To get a 10-stage cascaded classifier with a detection rate of 90% and false positive rate of 1/1000000 Each stage must have a detection rate of.99 since.9~.99 10 But notice that the false positive rate for each level can be as high as 30% since.3 10 ~ 6*10-6. 14

SEVILLE: YEF features SEVILLE substitutes the integral image and haar-like features with YEF (yet even faster) features. Observe that haar-like features require: image intensity normalization ~8 array accesses per feature on integral image SEVILLE: YEF features Rather than comparing sums of pixels, YEF features are based on the statistical examination of single pixels. Each feature consists of two subsets of points, if all points in one subset have lower intensity than every point in the other, the feature outputs 1, otherwise 0. Results in ~4 image accesses per feature 15

SEVILLE: YEF Features Note there are ~10 32 possible features A genetic algorithm is used to evolve good features: Start with a first generation of 100 features Discard the 90 features with the highest error Create 50 features by slightly mutating the remaining 10 Add an addition 40 random features Repeat this process until no performance improvement is seen for 40 generations SEVILLE: YEF Features Notice that performance depends on K, the number of points in the feature 16

Outline Motivation and procedure How this works: adaboost and feature details Why this works: boosting the margin for discriminative classifiers How well this works: results Theory: Adaboost Bounding the training error: Consider binary decision problems (chance is 1/2) Given a weak classifier h t, ε t =1/2 - γ t Freund and Schapire prove that the training error of the final strong classifier is: 2 Error(H) exp 2 γ t t 17

Theory: Adaboost Bounding the generalization error: First shot: Error generalization Pr[H(x) y] + O Td m Note that this suggests that the method will overfit as the number of trials, T, becomes large Theory: Adaboost However, empirical finding show that over-fitting does not often happen in practice: This suggests an improved upper bound based on the margins 18

Theory: Adaboost The margin of an example (x,y) is defined as: y margin = t α t h t (x) Schapire et al. proved that larger margins on the training set yield better generalization: error generalization Pr[margin(x, y) θ] + O t α t d mθ 2 Theory: SEVILLE SEVILLE makes the leap of mainly choosing training examples from the margin Note that in the generalization error bound, this amounts to simultaneously increasing m and Pr[margin(x,y) < Θ] Note that Abramson makes the argument that the theoretical m is much larger than the actual value since all values far from the margin are implicitly added to the set. 19

Outline Motivation and procedure How this works: adaboost and feature details Why this works: boosting the margin for discriminative classifiers How well this works: results SEVILLE: Results The SEVILLE system was tested on a pedestrian detection task. 215 pedestrian and 37,064,791 non-pedestrian images were used for validation Videos used for training were segments clipped from 6 hours of footage taken from a car driving around Paris 20

SEVILLE: Results SEVILLE: Results 21

SEVILLE: Results Some observations: Doesn t achieve target False-Positive rate May be because features are too simple Doesn t use a cascaded classifier in practice Could be extended to allow for selection of arbitrary image regions Note: This is like co-training with a human as the second classifier References [1] Y. Abramson and Y. Freund. Active learning for visual object recognition. In CVPR, 2005. [2] P. Viola and M. Jones. Robust real-time object detection. In Proc. Of IEEE Workshop on Statistical and Computational Theories of Vision, pages 1-25, 2001. [3] Yotam Abramson and Bruno Steux. Illumination-independant visual features. [4] Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651 1686, October 1998. [5] Yoav Freund and Robert E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771--780, September, 1999. 22