Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Similar documents
Window based detectors

Generic Object-Face detection

Previously. Window-based models for generic object detection 4/11/2011

Object detection as supervised classification

Face Detection and Alignment. Prof. Xin Yang HUST

Classifier Case Study: Viola-Jones Face Detector

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face detection and recognition. Detection Recognition Sally

Discriminative classifiers for image recognition

Recap Image Classification with Bags of Local Features

Object recognition (part 1)

Object detection. Asbjørn Berge. INF 5300 Advanced Topic: Video Content Analysis. Technology for a better society

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

Skin and Face Detection

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Face/Flesh Detection and Face Recognition

Face and Nose Detection in Digital Images using Local Binary Patterns

Active learning for visual object recognition

Object Category Detection: Sliding Windows

Recognition problems. Face Recognition and Detection. Readings. What is recognition?

Face detection. Bill Freeman, MIT April 5, 2005

Object Category Detection: Sliding Windows

Face Detection on OpenCV using Raspberry Pi

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Study of Viola-Jones Real Time Face Detector

Supervised Sementation: Pixel Classification

Object Detection Design challenges


Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju

Detection of a Single Hand Shape in the Foreground of Still Images

Ensemble Methods, Decision Trees

Overview of section. Lecture 10 Discriminative models. Discriminative methods. Discriminative vs. generative. Discriminative methods.

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Face Detection using Hierarchical SVM

Categorization by Learning and Combining Object Parts

Mobile Face Recognization

Detecting and Reading Text in Natural Scenes

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

Detecting People in Images: An Edge Density Approach

2D Image Processing Feature Descriptors

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG)

Component-based Face Recognition with 3D Morphable Models

Learning to Recognize Faces in Realistic Conditions

Viola Jones Face Detection. Shahid Nabi Hiader Raiz Muhammad Murtaz

Face Tracking in Video

Object recognition. Methods for classification and image representation

Detecting Pedestrians Using Patterns of Motion and Appearance

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

Selection of Scale-Invariant Parts for Object Class Recognition

Verification: is that a lamp? What do we mean by recognition? Recognition. Recognition

An Efficient Face Detection and Recognition System

Machine Learning for Signal Processing Detecting faces (& other objects) in images

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17

Automatic Initialization of the TLD Object Tracker: Milestone Update

Beyond Bags of features Spatial information & Shape models

Object and Class Recognition I:

Image Analysis. Edge Detection

A robust method for automatic player detection in sport videos

What do we mean by recognition?

Discriminative Feature Co-occurrence Selection for Object Detection

Criminal Identification System Using Face Detection and Recognition

A Survey of Various Face Detection Methods

FACE DETECTION BY HAAR CASCADE CLASSIFIER WITH SIMPLE AND COMPLEX BACKGROUNDS IMAGES USING OPENCV IMPLEMENTATION

Local invariant features

Boosting Sex Identification Performance

Fast Learning for Statistical Face Detection

Detecting Pedestrians Using Patterns of Motion and Appearance

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods

Three Embedded Methods

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Parameter Sensitive Detectors

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Part-based Face Recognition Using Near Infrared Images

Short Paper Boosting Sex Identification Performance

An Associate-Predict Model for Face Recognition FIPA Seminar WS 2011/2012

Part-based Face Recognition Using Near Infrared Images

Human-Robot Interaction

Learning a Rare Event Detection Cascade by Direct Feature Selection

Introduction. How? Rapid Object Detection using a Boosted Cascade of Simple Features. Features. By Paul Viola & Michael Jones

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Detecting Pedestrians Using Patterns of Motion and Appearance

Subject-Oriented Image Classification based on Face Detection and Recognition

Project Report for EE7700

Image Analysis. Edge Detection

Unsupervised Improvement of Visual Detectors using Co-Training

Hybrid Face Detection System using Combination of Viola - Jones Method and Skin Detection

Human detections using Beagle board-xm

Applications Video Surveillance (On-line or off-line)

Designing Applications that See Lecture 7: Object Recognition

Postprint.

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018

EE795: Computer Vision and Intelligent Systems

Synergistic Face Detection and Pose Estimation with Energy-Based Models

Color Model Based Real-Time Face Detection with AdaBoost in Color Image

Transcription:

Image Analysis 2 Face detection and recognition Window-based face detection: The Viola-Jones algorithm Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Kristen Grauman, University of Texas at Austin. Computer Vision course by Svetlana Lazebnik, University of North Carolina at Chapel Hill. University of Ioannina - Department of Computer Science 3 Face detection and recognition 4 Consumer application: iphoto 2009 Detection Recognition Sally http://www.apple.com/ilife/iphoto/ 5 Consumer application: iphoto 2009 It can be trained to recognize pets! 6 Consumer application: iphoto 2009 iphoto decides that this is a face http://www.maclife.com/article/news/iphotos_faces_recognizes_cats 1

7 Outline Face recognition Eigenfaces Face detection The Viola and Jones algorithm. M. Turk and A. Pentland. "Eigenfaces for recognition". Journal of Cognitive Neuroscience, 3(1):71 86, 1991. Also in CVPR 1991. P. Viola and M. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2), 2004. Also in CVPR 2001. 8 Recognition by finding patterns We have seen very simple template matching methods (using filters). Some objects behave like quite simple templates Frontal face images. Strategy: Build/train model (supervised classifier). Find image windows. Correct lighting. Pass them to a statistical test (a classifier) that accepts faces and rejects non-faces with respect to the model. 9 Basic ideas in classifiers Loss: some errors may be more expensive than others e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects false positives in diagnosis are better than false negatives We discuss two class classification: L(1 2) is the loss caused by calling 1 a 2. Total risk of strategy s is the total expected loss: Rs () = Pr{1 2 }(1 sl 2) + Pr{2 1 }(2 sl 1) The optimal classifier minimizes the total risk. 10 Basic ideas in classifiers (cont.) Generally, we should classify as 1 if the expected loss of classifying as 2 is larger than calling a 2 a 1: p(1/ x) L(1 2) > p(2 / x) L(2 1) Equivalently, we should classify as 2 if p(1/ x) L(1 2) < p(2 / x) L(2 1) Using Bayes rule: px ( /1) p(1) L(1 2) > px ( /2) p(2) L(2 1) px ( /1) p(1) L(1 2) < px ( / 2) p(2) L(2 1) 11 Basic ideas in classifiers (cont.) Crucial notion: Decision boundary points where the average loss is the same for either case. 12 Basic ideas in classifiers (cont.) Some loss may be inevitable: the total risk (shaded area) is called the Bayes risk. L(1 2) = L(2 1) = 1, L(1 1) = L(2 2) = 0 L(1 2) = L(2 1) = 1, L(1 1) = L(2 2) = 0 2

13 Basic ideas in classifiers (cont.) Example: Gaussian class densities, p- dimensional measurements with common covariance Σ and different means μ 1, μ 2 and class priors π k : p 2 1 1 1 2 T 1 pxk ( / ) = Σ exp ( x μk) Σ ( x μk) 2π 2 14 Basic ideas in classifiers (cont.) Ignoring a common factor in the posteriors: p 2 1 1 1 k k k 2 T 1 pk ( / x) π Σ exp ( x μ ) Σ ( x μ ) 2π 2 The classifier boils down to choose the class minimizing T 1 ( x ) ( x ) μ Σ μ 2log π k k k The common covariance simplifies the resulting expression. 15 Basic ideas in classifiers (cont.) Cross validation to estimate the total risk. Split the data set and average over all possible splits. Leave-one-out, 10-fold, Bootstrapping to estimate better the decision boundary. Retrain the classifier with the false positives (FP) and false negatives (FN). They are the most significant in the configuration of the decision boundary. 16 Histogram based classifiers Use a histogram to represent the classconditional densities p(x 1), p(x 2), Advantage: estimates become quite good with enough data. Disadvantage: Histogram becomes big with high dimension. we may assume feature independence. 17 Example: finding skin pixels Skin has a very small range of (intensity independent) colours and little texture. Get class conditional densities p(x skin) and p(x not skin) from examples. Get priors p(skin) (ki) and p(not ( skin) from examples count the number of face/non face pixels. The classifier becomes: 18 Example: finding skin pixels (cont.) We can represent a class-conditional density using a histogram (a non-parametric distribution) P(x skin) P(x not skin) Feature x = Hue Feature x = Hue Kristen Grauman 3

19 Example: finding skin pixels (cont.) 20 Example: finding skin pixels (cont.) We can represent a class-conditional density using a histogram (a non-parametric distribution) P(x skin) posterior P ( skin x) = likelihood prior P( x skin) P( skin) P( x) Now we get a new image, and want to label each pixel as skin or non-skin. Feature x = Hue P(x not skin) P( skin x) α P( x skin) P( skin) Where does the prior come from? Feature x = Hue Kristen Grauman Why use a prior? 21 Example: finding skin pixels (cont.) For every pixel in a new image, we can estimate the probability that it is generated by skin. 22 Example: finding skin pixels (cont.) Brighter pixels higher probability of being skin Classify pixels based on these probabilities Using skin color-based face detection and pose estimation as a video-based interface. Open CV, Gary Bradski, 1998. Kristen Grauman Kristen Grauman 23 Example: finding skin pixels (cont.) Parameter θ encapsulates the relative loss. There is a family of classifiers for each value of θ. Each one has different FP and FN rates described by the Receiver Operating Characteristic (ROC) curve. 24 Window-based generic object detection framework Build/train object model Choose a representation. Learn or fit parameters of model / classifier. Generate candidates in new image. Score the candidates. 4

25 building a model 26 building a model (cont.) Pixel-based representations sensitive to small shifts. Simple holistic descriptions of image content grayscale / color histogram vector of pixel intensities Color or grayscale-based appearance description can be sensitive to illumination and intra-class appearance variation. 27 building a model (cont.) Consider edges, contours, and (oriented) intensity gradients. 28 building a model (cont.) Consider edges, contours, and (oriented) intensity gradients. Summarize local distribution of gradients with histogram Locally orderless: offers invariance to small shifts and rotations Contrast-normalization: try to correct for variable illumination 29 building a model (cont.) Given the representation, train a binary classifier. 30 Nearest neighbor Discriminative classifiers Neural networks Car/non-car Classifier 10 6 examples Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... Support Vector Machines Boosting LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 Conditional Random Fields No, Yes, not car. a car. Guyon, Vapnik Heisele, Serre, Poggio, 2001, Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006, McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003 5

31 Window-based generic object detection framework 32 generating and scoring candidates Build/train object model Choose a representation Learn or fit parameters of model / classifier Generate candidates in new image Car/non-car Classifier Score the candidates 33 Training: 1. Obtain training data 2. Define features 3. Define classifier Given new image: 1. Slide window 2. Score by classifier Summary Training examples 34 Nearest neighbor 10 6 examples Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... Support Vector Machines Discriminative classifiers Boosting Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 Conditional Random Fields Feature extraction Car/non-car Classifier Guyon, Vapnik Heisele, Serre, Poggio, 2001, Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006, McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003 35 Boosting Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier A weak learner need only do better than chance. Training consists of multiple boosting rounds During each boosting round, we select a weak learner that does well on examples that were hard for the previous weak learners. Hardness is captured by weights attached to training examples. 36 Boosting intuition Weak Classifier 1 Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999. Slide credit: Paul Viola 6

37 Boosting illustration 38 Boosting illustration Weights Increased Weak Classifier 2 39 Boosting illustration 40 Boosting illustration Weights Increased Weak Classifier 3 41 Boosting illustration 42 Boosting: training Final classifier is a combination of weak classifiers Initially, weight each training example equally. In each boosting round: Find the weak learner that achieves the lowest weighted training error. Raise weights of training examples misclassified by current weak learner. Compute final classifier as linear combination of all weak learners (weight of each learner is directly proportional to its accuracy). Exact formulas for re-weighting and combining weak learners depend on the particular boosting scheme (e.g. AdaBoost). Slide credit: Lana Lazebnik 7

43 Boosting pros and cons Advantages of boosting Integrates classification with feature selection. Complexity of training is linear instead of quadratic in the number of training examples. Flexibility in the choice of weak learners, boosting scheme. Testing is fast. Easy to implement. Disadvantages Needs many training examples. Often doesn t work as well as SVM (especially for many-class problems). 44 Challenges of face detection Sliding window detector must evaluate tens of thousands of location/scale combinations. PCA is slow for real time face detection. Faces in images are rare: 0 10 per image For computational ti efficiency, i we should try to spend as little time as possible on the non-face windows. A megapixel image has ~10 6 pixels and a comparable number of candidate face locations. To avoid having a false positive in every image, the false positive rate has to be less than 10-6. 45 The Viola/Jones Face Detector A seminal approach to real-time object detection. Training is slow but detection is very fast. Key ideas Represent local texture with efficiently computed rectangular features (integral images) within the window of interest. Select discriminative features as weak classifiers. Boosting for classification. Attentional cascade for fast rejection of non-face windows. P. Viola and M. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2), 2004. Also in CVPR 2001. 46 Image features Rectangular filters Output = (pixels in white area) (pixels in black area) 47 Image features (cont.) 48 Example For real problems, the result depends strongly on the features used. Do not employ pixel values. Use a very large set of simple functions sensitive to edges and other critical features multiple scales. 8

49 Fast computation with integral images The integral image computes a value at each pixel (x,y) that is the sum of the pixel values above and to the left of (x,y) inclusive. This can be quickly computed in one pass. (x,y) 50 Computing the integral image 51 Computing the integral image 52 Computing sum within a rectangle ii(x, y-1) s(x-1, 1 y) i(x, y) Cumulative row sum: s(x, y) = s(x 1, y) + i(x, y) Integral image: ii(x, y) = ii(x, y 1) + s(x, y) MATLAB: ii = cumsum(cumsum(double(i)), 2); Let A,B,C,D be the values of the integral image at the corners of a rectangle. The sum of original image values within the rectangle can be computed as: sum = A B C + D Only 3 additions are required for any size of rectangle! D C B A 53 Example 54 Feature selection Integral Image Feature output is different between adjacent regions. The integral image significantly reduces the number of computations. However, for a base resolution of 24x24 detection region, the number of possible rectangle features is ~160,000! 000! (scale, size, type, position, ) It is impractical to evaluate the entire feature set for training. Can we create a good classifier using just a small subset of all possible features? How to select such a subset? Use AdaBoost both to select the informative features and to form the classifier. 9

55 Boosting for face detection We seek to select the single rectangle feature and threshold that best separates positive (faces) and negative (non-faces) training examples, in terms of weighted error. Resulting weak classifier: 56 Start with uniform weights on training examples For T rounds AdaBoost Algorithm Evaluate weighted error for each feature, pick best. {x 1, x n } Outputs of a possible rectangle feature on faces and non-faces. For next round, reweight the examples according to errors, choose another filter/threshold combo. Kristen Grauman Re-weight the examples: Incorrectly classified -> more weight Correctly classified -> less weight Final classifier is combination of the weak ones, weighted according to error they had. Freund & Schapire 1995 57 Boosting for face detection (cont.) The first two features selected by boosting. 58 Boosting for face detection A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 over 14084. Not good enough! This feature combination can yield 100% detection rate and 50% false positive rate. 59 Attentional cascade Even if the filters are fast to compute, each new image has a lot of possible windows to search. How to make the detection more efficient? There is no need to compute the linear combination of all of the features in a window that is clear to be negative from the first features. Define a computational risk hierarchy. The goal of the training process is different instead of minimizing classification errors minimize the false positive rate. 60 Attentional cascade (cont.) Form a cascade with low false negative rates early on. Apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative 10

61 Viola-Jones detector: summary 62 Output of face detector on test images Train cascade of classifiers with AdaBoost Faces New image Non-faces Selected features, thresholds, and weights Train with 5K positives, 350M negatives (10k non face images). Real-time detector using 38 layer cascade. Implemented in OpenCV. 63 Other detection tasks 64 Profile Detection Facial Feature Localization Profile Detection Male vs. female 65 Profile Features 66 Application: Naming characters in TV video M. Everingham, J. Sivic, and A. Zisserman. "Hello! My name is... Buffy" - Automatic naming of characters in TV video. Proceedings of the 17th British Machine Vision Conference (BMVC 2006). 11

67 Application of sliding windows: pedestrian detection 68 Application of sliding windows: pedestrian detection (cont.) SVM on Haar wavelets. Space-time rectangle features. C. Papageorgiou and T. Poggio. A Trainable System for Object Detection. International Journal of Cmputer Vision, Vol. 38, No 1, pp. 15-22, 2000. P. Viola, M. Jones and D. Snow. Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, Vol. 63, No 2, pp. 153-161, 2005. 69 Application of sliding windows: pedestrian detection (cont.) 70 Summary: Viola-Jones detector Rectangle features. Integral images for fast computation. Boosting for feature selection. Attentional cascade for fast rejection of negative windows. SVM on histogram of oriented gradients. N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005). 12