Unsupervised Improvement of Visual Detectors using Co-Training

Similar documents
Unsupervised Improvement of Visual Detectors Using Co-Training

Classifier Case Study: Viola-Jones Face Detector

Active learning for visual object recognition

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Face Detection and Alignment. Prof. Xin Yang HUST

Skin and Face Detection

Generic Object-Face detection

Face detection and recognition. Detection Recognition Sally

Object recognition (part 1)

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Window based detectors

Recap Image Classification with Bags of Local Features

Object detection as supervised classification

Three Embedded Methods

Ensemble Methods, Decision Trees

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

Face Tracking in Video

Object Category Detection: Sliding Windows

Previously. Window-based models for generic object detection 4/11/2011

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Face and Nose Detection in Digital Images using Local Binary Patterns

Machine Learning for Signal Processing Detecting faces (& other objects) in images

Using the Forest to See the Trees: Context-based Object Recognition

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Subject-Oriented Image Classification based on Face Detection and Recognition

Object and Class Recognition I:

Object Detection Design challenges

Object Category Detection: Sliding Windows

Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju


Class Sep Instructor: Bhiksha Raj

Detecting Faces in Images. Detecting Faces in Images. Finding faces in an image. Finding faces in an image. Finding faces in an image

Face Detection on OpenCV using Raspberry Pi

Introduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Detecting and Reading Text in Natural Scenes

Detection of a Single Hand Shape in the Foreground of Still Images

Viola Jones Simplified. By Eric Gregori

Detecting People in Images: An Edge Density Approach

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Detecting Pedestrians Using Patterns of Motion and Appearance

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Learning to Detect Faces. A Large-Scale Application of Machine Learning

Automatic Initialization of the TLD Object Tracker: Milestone Update

Parameter Sensitive Detectors

Global Probability of Boundary

Detecting Pedestrians Using Patterns of Motion and Appearance

CS6716 Pattern Recognition. Ensembles and Boosting (1)

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

Face/Flesh Detection and Face Recognition

Mobile Face Recognization

CS228: Project Report Boosted Decision Stumps for Object Recognition

Face detection, validation and tracking. Océane Esposito, Grazina Laurinaviciute, Alexandre Majetniak

Viola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1. Jianxin Wu, S. Charles Brubaker, Matthew D. Mullin, and James M. Rehg, Member, IEEE,

Bus Detection and recognition for visually impaired people

Lecture 16: Object recognition: Part-based generative models

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Selection of Scale-Invariant Parts for Object Class Recognition

Boosting Simple Model Selection Cross Validation Regularization

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

Hybrid Transfer Learning for Efficient Learning in Object Detection

Generative and discriminative classification techniques

Detecting Pedestrians Using Patterns of Motion and Appearance

Short Paper Boosting Sex Identification Performance

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Boosting Sex Identification Performance

Chapter 4: Non-Parametric Techniques

Viola-Jones with CUDA

A robust method for automatic player detection in sport videos

Category vs. instance recognition

Haar Wavelets and Edge Orientation Histograms for On Board Pedestrian Detection

Prototype Selection for Handwritten Connected Digits Classification

Feature Selection for Image Retrieval and Object Recognition

A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM

Color Model Based Real-Time Face Detection with AdaBoost in Color Image

April 3, 2012 T.C. Havens

Boosting Object Detection Performance in Crowded Surveillance Videos

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018

Performance Measures

Machine Learning. Semi-Supervised Learning. Manfred Huber

Boosting Image Database Retrieval

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Generic Face Alignment Using an Improved Active Shape Model

Ensemble Tracking. Abstract. 1 Introduction. 2 Background

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG)

UNIVERSITY OF CALIFORNIA, SAN DIEGO. Task Specific Image Text Recognition. Computer Science. Nadav Ben-Haim

Adaptive Sampling and Learning for Unsupervised Outlier Detection

THE most popular object detectors in recent years, such

ii(i,j) = k i,l j efficiently computed in a single image pass using recurrences

Adaptive Learning of an Accurate Skin-Color Model

Unsupervised Learning

Janitor Bot - Detecting Light Switches Jiaqi Guo, Haizi Yu December 10, 2010

Semi-supervised learning and active learning

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Multiple Instance Boosting for Object Detection

DEPARTMENT OF INFORMATICS

Transcription:

Unsupervised Improvement of Visual Detectors using Co-Training Anat Levin, Paul Viola, Yoav Freund ICCV 2003 UCSD CSE252c presented by William Beaver October 25, 2005

Task Traffic camera taking video You want to identify the cars Simple pick a detector gather training data train

Training data needs Use a [Viola,Jones IJCV 2002] approach image size: 320x240 ~50,000 locations to scan per frame a false positive rate of 1 in 10,000 == ~5 false positives per frame! need a very low false positive rate many examples needed O(109 ) negative examples and several thousand labeled positive examples Viola/Jones used ~4900 positives and ~10,000 negatives for one set of s The real problem the high cost of generating labeled examples

Unlabeled Training Data Unlabeled examples are typically easy to come by Unlabeled examples have been shown useful in text classification [Nigam, et al] Assumption: unlabeled examples respect the class boundaries of the underlying distribution (typically Gaussian, two class mixtures) density of unlabeled examples low near the class boundary????????

Margin concept [Viola,Jones] use Adaboost. discriminative. maximize margin Unambiguous examples are not informative if far from the margin Need confident AND examples with small margin (or negative) How do we get these examples? H+ SVM-ish margin * H H- * Thanks Tony Jebara for the inspiration http://www1.cs.columbia.edu/~jebara/6772/index.html

Adaboost [Freund/Schapire 97] Given: Initialize: For (x 1,y 1 ),...,(x m,y m ) t = 1 : T D 1 (i) = 1/m where x i X,y i Y = { 1,+1} (Dt(i): Distribution weight on example i at time t) Train base learner using distribution Dt Get base (weak) h t : X R choose α t R update D t+1 (i) = D t(i)exp( α t y i h t (x i )) Z t α t = 1 2 ln(1 ε t ε t ) H(x) = sign( T t=1 α t h t (x))

Adaboost - margin Typically, expect Adaboost generalization error at most ˆ Pr[H(x) y] + Ō( T d m ) margin f (x,y) = y t α t h t (x) t α t [ 1,+1] positive iff H correctly classifies correctly margin magnitude is a measure of confidence [Freund,Schapire 99]

Adaboost - margin larger margin on training superior upper bound on generalization error (test) Pr[ f ( h i 0] 1 m m i 1 [ f ( h i ) θ] + O( 1 ( d log2 (m/d) m θ 2 + log(1/δ)) 1 2 ) Fraction of training samples with margin less than θ proportion of testing examples in error Pr[margin(x,y) ˆ d θ] + Ō( mθ 2 ) θ > 0 [Schapire, Freund,Bartlett,Lee 98]

Co-Training Co-Training [Blum,Mitchell COLT 98] take two views of data. there may exist set of examples with high margin using one feature and small (or negative) margin on another (there is a formal framework if people are interested.)

Co-Training [Blum,Mitchell 98] proves co-training finds a very accurate rule from very small quantity of labeled data Co-Training assumptions: 1. reasonable learning algorithm 2. underlying dist which satisfies conditional independence 3. initial weak classification rule

Co-Training Practically: train a pair of detectors examples confidently labeled by one are used to correct a mislabeling by the other margin magnitude give confidence train each with informative examples Conditional independence assumption hard in practice

Approach Using this idea, focused on margin s: 1. train two s with small dataset 2. estimate margin thresholds above/below which all training data is correctly labeled 3. use confidence to assign labels to unlabeled data 4. add these examples and retrain 5. repeat...

Details - data data from Washington State Department of Transit website 15 sec video clips 8 cameras. three weeks. 1clip every 5mins.

Classifiers Two Classifiers ( views of the data) image = what you d expect background difference = image - average background from clip.

Details - data data - training 50 cars identified in total (that s all!) images from 3 of the cameras. box drawn around car (minimize background). each cropped and scaled to 20 x 28 (aspect ratio 1.4). if smaller, discard 22,000 additional unlabeled images available.

Details - data data - validation 6 separate images. every car labeled. practical: chosen to have as many positive examples as possible. want to be sure that true positives are not labeled as negative

Initial training Each trained identically and independently input to construction: target detection rate (100%), target false positive rate (0.2%), training data (+:50 cars; -:? images ), validation dataset features: selected via LogAdaBoost [Collins et al] feature = simple linear function of rectangular sums and a threshold. each round: select feature with lowest weighted error selected feature assigned a weight based on performance

Initial training - details 320 x 240 images >50,000 locations to check. For efficiency, use a cascade of s early s constrained to use few features but have high detection rate. later s have more features and lower detection rate; only trained on true and false positives of earlier stages construction runs until rate met. Use the validation images to adjust threshold. Run and tune until reach required detection rate.

Co-Training initial s: two, 5 stage cascades. each stage has 4 features. (20 features total) First, retrain final stage of cascade to increase number of features (to 30). This increases confidence range value. scan 22k unlabeled examples. (1.1 billion locations!) sample from these scans Use classification scores - signed measure of confidence - to label

Sampling issues how to sample negatives: there are many to choose from. uniform sampling? Ignores margin! need to know which are important importance sampling: randomly selecting using a distribution proportional to the examples weight (wi) scan data once, if Nαw i < 1 select with Pr = (Nαw i ) and assign weight = 1. else // Nαw i 1 Pr = 1 and weight = Nαw i

Sampling issues how to sample positives challenge is alignment: only select peaks (local max) of scoring function. weight assigned to each peak is sum of weights above the selection threshold and nearby the peak (not well defined)

Margin plot: determine thresholds by performance on the validation set θ p (pos threshold) is max score achieved by negative examples (above is likely to be positive) θ n (neg threshold) is min score achieved by positive examples (below likely to be negative) θ p θ n θ n θ p negative example positive example

co-training new example sampled and labeled via the thresholds Then add, with label, to training set now, boost and add three new features to. resample unlabeled data again. 12 rounds gives 66 features (30 orig + (3 x 12))

cascade helping two problems: asymmetry: there are few positive examples in an image while there are thousands of negative examples. the cascade increases the ratio efficiency: do not need every example. only those which pass the early cascade stages

Testing set of hand labeled examples. not used anywhere in training. 90 images 980 positive examples test those that make it through the additional cascade on the co-trained layer

ROC green = original black = co-trained 1 1 false positives correct detection rate 0.8 0.6 0.4 0.2 correct detection rate 0.8 0.6 0.4 0.2 0 0 200 400 600 800 1000 false positives grey 0 0 1000 2000 3000 4000 false positives backsub

(a)(a)(a)(a) (a)(a)(a)(a) (a)(a)(a)(a) (b)(b)(b)(b) (b)(b)(b)(b) (b)(b)(b)(b) (c)(c)(c)(c) (c)(c)(c)(c) (c)(c)(c)(c) (d)(d)(d)(d) (d)(d)(d)(d) (d)(d)(d)(d) backsub backsub d (d)the background 5: ure edetection 5:Detection 5:Detection Detection results. results. results. results. (a)-(a)the (a)the (a) The The (b)-(b)the (b)(b)the The The co-training (c)-(c edetection 5:Detection results. (a)the (b)the ure 5:Detection Detection results. (a)the (b)the co-training 5: results. (a)the The results. (a)the (b)-(b)the (c)-(c dkground und round (d)(d)the (d)the (d)background The The background background background 5: ure e 5: Detection 5: Detection Detection results. results. results. (a)(a)the (a)the The (b)(b)(b)the The The co-training Detection results. (a)the (b)the (c)-(c round (d)the background kground (d)the background und (d)the background kground und (d)(d)the (d)the The background background background dround (d)the background

Algorithm/Process Summary - Initial setup - Train two detectors - Detector One: operates on grey images - Detector Two: operates on difference images - Each detector has 5 stages each containing 4 features. - Detection rate on training set is 100%; false positive is 0.2%. - Final stage is retrained to contain 30 features, which produces more reliable classification scores). Note: the scores assigned by this final stage are used to select additional examples for - Two thresholds are computed and. - = max score achieved by a negative patch from the validation set - = min score achieved by a positive patch from the validation set - Co-training process (run for 12 rounds) - 22,000 unlabeled images are scanned. - In each range of co-training patches are sampled from the 22,000 unlabeled images - Image patches labeled positive by the 5 stage cascade are examined - Positive patches are extracted if score is greater than - Pick local maxima in the score function (to improve alignment) - Negative patches have score less than - Since there are so many negative examples, examples are selected at random based on AdaBoost weight - The final stage of the is augmented with 3 additional features using this new training data.