Detector of Facial Landmarks Learned by the Structured Output SVM

Size: px

Start display at page:

Download "Detector of Facial Landmarks Learned by the Structured Output SVM"

Crystal Byrd
5 years ago
Views:

1 Detector of Facial Landmarks Learned by the Structured Output SVM Michal Uřičář, Vojtěch Franc and Václav Hlaváč Department of Cybernetics, Faculty Electrical Engineering Czech Technical University in Prague Center for Machine Perception {uricamic, xfrancv, May 15, 2012 Presented on International Conference on Computer Vision Theory and Applications (VISAPP 12)

2 Motivation State of the art methods Outline 2/20 Structured output classifier for facial landmark detection Learning the detector s parameters Evaluation procedure Results & Demo Conclusions

3 Face registration Motivation 3/20 Essential part of face recognition (identity, gender,... ). Quality of recognition and further processing depends on the quality of registration. (a) Example of detected landmarks

4 State of the art methods Active Appearance Models (AAM) [Cootes et al., 2001] Joint statistical model of appearance and shape. + Provides dense set of facial features. + Allows to extract whole contours of facial features (eyes, mouth, etc.). Requires high dimensional images for both training and testing stage. Detection leads to non-convex optimization task susceptible to local optima. Requires good initialization. 4/20 Deformable Parts Models (DPM) Given by a set of parts along with the connections between certain parts arranged in a deformable configuration. + Independent learning of appearance model and deformation cost leads to simpler task. It may not be optimal in terms of detectors accuracy. Detector of Everingham et al. state of the art publicly available DPM based detector [Everingham et al., 2008]. Appearance model is learned by a multiple instance variant of the AdaBoost algorithm with Haar-like features used as a weak classifiers. The deformation cost is expressed as a mixture of Gaussian trees.

5 Structured output classifier for facial landmark detection Input: image I (normalized image frame), M components s = (s 0,..., s M 1 ) and search space S = (S 0 S M 1 ). 5/20 Output: estimation of landmarks positions ŝ. s 5 s 1 s 2 s 6 canthus-rr canthus-rl canthus-lr canthus-ll s 0 face center s 7 nose s 3 mouth-corner-r s 4 mouth-corner-r (b) Landmarks & Components (c) Search spaces (d) Graph constraints Configuration of M landmarks is described by graph G = (V, E). The landmark positions are estimated from image I by maximizing the cost function: f(i, s) = i V q i (I, s i ) + i,j E g ij (s i, s j ).

6 Structured output classifier for facial landmark detection The cost function consist of two terms: 6/20 appearance model qi(i, si) = wqi, Ψqi (I, si), Ψqi (I, si)... LBP features. deformation cost gij(si, sj) = wgij, Ψgij(si, sj), Ψgij(si, sj)... displacement vector. Displacement vector [Felzenszwalb et al., 2009]: Ψ g i (s i, s j ) = (dx, dy, dx 2, dy 2 ) (dx, dy) = (x j, y j ) (x i, y i ). The configuration of landmarks is estimated from image I by maximizing the cost function: ŝ arg max s S ŝ = arg max s S = arg max s S f(i, s) M 1 q i (I, s i ; w) + i=0 w, Ψ(I, s) i,j E g ij (s i, s j ; w)

7 Problem formulation How to learn the weights w of the classifier? 7/20 Let p : I S R be a probability density function defined over set of images I and a set of hidden labels S. Let l : S S R+0 be a non-negative loss function such that l(s, s) = 0, s S. Our goal is to find the parameters of linear classifier which minimizes the expected risk: R exp (w) := I I p(i, s)l s S ( s, arg max w, Ψ(I, s ) ). s S p(i, s) is unknown, but we are given a fully annotated training set T = {(Ij, sj)}mj=1. We approximate the expected risk minimization by the empirical risk minimization R emp (w) = 1 m m l i=1 ( ) s i, arg max w, Ψ(I i, s) s S and by limiting the space of classifiers with the hypersphere λ 2 w 2.

8 Large margin approach to structured output learning [Tsochantaridis et al., 2005] Joint parameter vector w is given by solving the convex minimization problem: [ ] λ w = arg min w R n 2 w 2 + R(w), where R(w) = 1 m m i=1 max s S ( l(s i, s) + w, Ψ(I i, s) ) 1 m R(w)... convex upper bound on the empirical risk. m w, Ψ(I i, s i ). i=1 8/20 λ... regularization term prevents over-fitting (set experimentally during validation). Loss function (arbitrary): l(s, s ) = κ(s ) 1 M M 1 j=0 s j s j For solving the optimization task, we use our extended version of the BMRM solver [Teo et al., 2010, Lemaréchal et al., 1995, Do et al., 2009]. Standard BMRM converges in O(1/ɛ) steps to ɛ precision.

9 LFW face database [Huang et al., 2007] annotated face images. 9/20 Dimension of images px. 7 landmarks (4 canthi, 2 mouth corners and nose, face center is computed from the rest). Contains people of various ethnicity. Example images from LFW face database

10 Active Appearance Models Competing method detector build on AAM [Cootes et al., 2001]. Publicly available MATLAB code [Kroon, 2010]. Need for a different training database. 10/20 Training database for AAM IMM Face database [Nordstrøm et al., 2004], 240 faces (6 images per person). Dimension of images px. 58 manually annotated points on each face, poor variance in ethnicity. Expensive data. Examples of faces from IMM database with annotation

11 Partitioning of the LFW database: Evaluation procedure 11/20 Evaluation procedure: Data set Training Validation Testing Percentage 60% 20% 20% # of examples 6,919 2,307 2,316 Find w for λ Λ = {10, 1, 10 1, 10 2, 10 3} on TRN set. Select optimal λ = arg minλ Λ RVAL(w(λ)) according to validation risk: ( ) 1 p R VAL w(λ) = l(s i, ŝ i ) where p i=1 ŝ i = arg max w(λ), Ψ(I i, s). s S Compute the test risk on the TST set.

12 Performance measurements 12/20 ɛ 5 ɛ 1 ɛ 2 ɛ 6 l face ɛ 0 ɛ 3 ɛ 7 ɛ 4 l(s, ŝ) = l 1 ɛ 0 + +ɛ 8 face 8 l max (s, ŝ) = l 1 max{ɛ face 0,..., ɛ 8 } (e) Illustration of accuracy statistics. (f) Example detection with mean normalized deviation equal to 10% Results: [% of distance relative to l face ] R TST R max TST AAM Independent SVMs Everingham et al proposed detector

13 Results 13/20 Detail around 10%: AAM: 8.98 % Independently trained SVMs: % Everingham et al.: % proposed detector: % AAM: 0.62 % Independently trained SVMs: % Everingham et al.: % proposed detector: %

14 Demo 1 14/20 CNN Anchorwoman (resolution px).

15 Demo 2 15/20 Movie In Bruges (resolution px).

16 Demo 3 16/20 Video captured by the head camera of humanoid robot NAO (resolution px).

17 Demo 4 17/20 Live Demo

18 Main contribution Conclusions 18/20 DPM based facial landmark detector. Structured Output classification with arbitrary loss function. One-stage learning of the appearance model & deformation cost. Performance evaluation and comparison with competing methods. flandmark Open-source library implementing the proposed detector. Written in C with interface to MATLAB. Learning written solely in MATLAB (also part of flandmark). Real-time detection on a standard PC. Demo applications with OpenCV. Fully annotated LFW database. Homepage:

19 19/20 Thank You for your Attention Questions?

20 References 20/20 [Cootes et al., 2001] Cootes, T., Edwards, G. J., and Taylor, C. J. (2001). Active appearance models. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(6): [Do et al., 2009] Do, C. B., Le, Q. V., and Foo, C.-S. (2009). Proximal regularization for online and batch learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 09, pages , New York, NY, USA. ACM. [Everingham et al., 2008] Everingham, M., Sivic, J., and Zisserman, A. (2008). Willow project, automatic naming of characters in tv video. MATLAB implementation, www: [Felzenszwalb et al., 2009] Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(1). [Huang et al., 2007] Huang, G. B., Ramesh, M., Berg, T., and Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst. [Kroon, 2010] Kroon, D.-J. (2010). Active shape model (ASM) and active appearance model (AAM). MATLAB Central, www: -model-aam. [Lemaréchal et al., 1995] Lemaréchal, C., Nemirovskii, A., and Nesterov, Y. (1995). New variants of bundle methods. Mathematical Programming, 69: [Nordstrøm et al., 2004] Nordstrøm, M. M., Larsen, M., Sierakowski, J., and Stegmann, M. B. (2004). The IMM face database - an annotated dataset of 240 face images. Technical report, Informatics and Mathematical Modelling, Technical University of Denmark, DTU. [Teo et al., 2010] Teo, C. H., Vishwanthan, S., Smola, A. J., and Le, Q. V. (2010). Bundle methods for regularized risk minimization. J. Mach. Learn. Res., 11: [Tsochantaridis et al., 2005] Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y., and Singer, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:

24 s 5 s 1 s 2 s 6 canthus-rr canthus-rl canthus-lr canthus-ll s 0 face center s 7 nose s 3 mouth-corner-r s 4 mouth-corner-r

31 ɛ 5 ɛ 1 ɛ 2 ɛ 6 l face ɛ 0 ɛ 3 ɛ 7 ɛ 4 l(s, ŝ) = 1 l face ɛ 0 + +ɛ 8 8 l max (s, ŝ) = 1 l max{ɛ 0,..., ɛ 8 }

Facial Landmark Tracking by Tree-based Deformable Part Model Based Detector

Facial Landmark Tracking by Tree-based Deformable Part Model Based Detector Michal Uřičář, Vojtěch Franc, and Václav Hlaváč Department of Cybernetics, Faculty Electrical Engineering Czech Technical University