The Effect of Facial Components Decomposition on Face Tracking

Similar documents
HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Tri-modal Human Body Segmentation

CS 223B Computer Vision Problem Set 3

Human Motion Detection and Tracking for Video Surveillance

Feature Detection and Tracking with Constrained Local Models

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Adaptive Cell-Size HoG Based. Object Tracking with Particle Filter

A Fragment Based Scale Adaptive Tracker with Partial Occlusion Handling

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

A Novel Extreme Point Selection Algorithm in SIFT

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

CS 231A Computer Vision (Fall 2012) Problem Set 3

In-Plane Rotational Alignment of Faces by Eye and Eye-Pair Detection

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

The most cited papers in Computer Vision

2D Image Processing Feature Descriptors

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Face Alignment Under Various Poses and Expressions

Fragment-based Visual Tracking with Multiple Representations

Face detection in a video sequence - a temporal approach

OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES

HUMAN COMPUTER INTERFACE BASED ON HAND TRACKING

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Automated Canvas Analysis for Painting Conservation. By Brendan Tobin

Face Alignment with Part-Based Modeling

MORPH-II: Feature Vector Documentation

Implementing the Scale Invariant Feature Transform(SIFT) Method

Face Detection and Recognition in an Image Sequence using Eigenedginess

HISTOGRAMS OF ORIENTATIO N GRADIENTS

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

New Models For Real-Time Tracking Using Particle Filtering

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

A video-based theodolite simulation system

Facial Feature Extraction Based On FPD and GLCM Algorithms

Generic Face Alignment Using an Improved Active Shape Model

An Adaptive Threshold LBP Algorithm for Face Recognition

Tracking Using Online Feature Selection and a Local Generative Model

Stacked Integral Image

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Object Tracking with an Adaptive Color-Based Particle Filter

Detection of a Hand Holding a Cellular Phone Using Multiple Image Features

Enhanced Active Shape Models with Global Texture Constraints for Image Analysis

Model Based Analysis of Face Images for Facial Feature Extraction

Probabilistic Robotics

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Fingerprint Mosaicking by Rolling with Sliding

Visual Motion Analysis and Tracking Part II

Human Upper Body Pose Estimation in Static Images

FACIAL ACTION TRACKING USING PARTICLE FILTERS AND ACTIVE APPEARANCE MODELS. Soumya Hamlaoui, Franck Davoine

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Occlusion Robust Multi-Camera Face Tracking

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

Real-time TV Logo Detection based on Color and HOG Features

AK Computer Vision Feature Point Detectors and Descriptors

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

FROM VIDEO STREAMS IN THE WILD

A Real Time Facial Expression Classification System Using Local Binary Patterns

Image Segmentation and Registration

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Robotics Programming Laboratory

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Facial Expression Recognition Using Expression- Specific Local Binary Patterns and Layer Denoising Mechanism

Keeping flexible active contours on track using Metropolis updates

Disguised Face Identification Based Gabor Feature and SVM Classifier

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Object detection using non-redundant local Binary Patterns

CS 221: Object Recognition and Tracking

Head Frontal-View Identification Using Extended LLE

Unsupervised Human Members Tracking Based on an Silhouette Detection and Analysis Scheme

Vehicle Dimensions Estimation Scheme Using AAM on Stereoscopic Video

Region-based Segmentation and Object Detection

AN HARDWARE ALGORITHM FOR REAL TIME IMAGE IDENTIFICATION 1

Human pose estimation using Active Shape Models

Sparse Shape Registration for Occluded Facial Feature Localization

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Using temporal seeding to constrain the disparity search range in stereo matching

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

An Implementation on Histogram of Oriented Gradients for Human Detection

A novel template matching method for human detection

An efficient face recognition algorithm based on multi-kernel regularization learning

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

A Comparison of SIFT, PCA-SIFT and SURF

Face Recognition Using SIFT- PCA Feature Extraction and SVM Classifier

Face Recognition using SURF Features and SVM Classifier

Crowd Event Recognition Using HOG Tracker

Human detection using local shape and nonredundant

III. VERVIEW OF THE METHODS

Ensemble of Bayesian Filters for Loop Closure Detection

Active Wavelet Networks for Face Alignment

Adaptive Skin Color Classifier for Face Outline Models

Face detection and recognition. Detection Recognition Sally

Histogram of Oriented Gradients for Human Detection

Transcription:

The Effect of Facial Components Decomposition on Face Tracking Vikas Choudhary, Sergey Tulyakov and Venu Govindaraju Center for Unified Biometrics and Sensors, University at Buffalo, State University of New York (vikascho, tulyakov, govind)@buffalo.edu Abstract In this paper we investigate tracking of the facial motions under free head movement and changing expressions. We address this problem by examining the geometrical relationship within the facial components in order to improve their tracking. The likelihood scores of each component are combined to get the overall likelihood. Condensation has been used as the tracking algorithm. Histogram of Gradients (HOG) features have been introduced for facial features representation of the eyes, eye brows & lips which are individually classified using Support Vector Machine (). We experimentally show the effect of the number of tracked components on the overall accuracy of facial tracking. person, relative position of the components vary, which affects the overall performance of tracking. Facial components can be effectively represented by the distribution of local intensity gradients or edge directions. HOG have been used by [12][13] for face recognition and human detection. We split the face into components (Fig 1) and train classifier on each component. 1. Introduction Face tracking is essentially motion estimation and acts as essential pre-processing steps for high level applications in Surveillance, Biometrics (face gesture recognition), Multimedia systems (Video Games, video Conferencing) and Face recognition in videos. Face Tracking can be classified into two approaches: Global and Local. Face Tracking using Global features have been extensively studied in [1][2][3][4][5][6][7] which have used features based on Color, Contour, Shape, Texture and Probabilistic PCA. Local based approach [8][9][10][11] have used facial components, represented by features like Color, Contrast Histogram and Active Appearance model. Generally, Face tracking algorithms approach is to improve on the tracking by using different features (local or global) and different tracking algorithms. We propose to investigate the geometrical positions of facial components (eyes, eye brows) and underlie the relative significance of them in order to enhance tracking. There are two main important contributions in this paper. First, importance of the geometrical position within the facial components is highlighted. Second, we introduce HOG features for representing facial components along with s for tracking. Depending on the expression of a (a) (b) Fig1: Facial Components considered: Eyes, Eye Brows &Lips (a) contains 3-components & (b) contains 5- components. 2. Component Representation The HOG (Histogram of Oriented Gradients) feature represents the distribution of the gradient magnitude and orientations of the pixels in an image. The descriptor is implemented by dividing the image into overlapping cells, each contributing the edge orientation for the pixels within the cell,fig(2).in this paper, we assume the centre of the image as the landmarks point. Hence, the HOG descriptor is the last part of the SIFT algorithm as we do not need to find the key point. The descriptor is scale invariant as it operates on localized cells. For every each pixel I(x,y) gradient magnitude G(x,y) and orientation θ(x,y) is computed using the following equations: px(x, y) = I (x+1, y) I(x-1, y) (1) py(x, y) = I (x, y+1) I(x, y-1) (2) G(x,y) = px(x, y) 2 + py(x, y) 2 (3) θ(x,y) = arctan(px(x, y)/ py(x, y)) (4)

We divide the gradients into b = 9 bins of size 2π/9 in the range [-π, π]. Each HOG descriptor of an image is a histogram of gradients binned according their respective orientations. The combination of these histograms then represents the descriptor. In our case we have 4 overlapping cells each contributing 9 orientations. Hence, the dimensionality is 4x9 = 36. x1 xt-1 xt x1 xt-1 xt x1 xt-1 xt Fig2. The spatial lattice of a HOG descriptor centred on the lip. (4 overlapping cells). A Gaussian window is centred at the image with standard deviation half the extension of the spatial bin range. The contribution of each pixel gradient to the histogram is weighted by the gradient modulus and its distance from the Gaussian window. The contribution is tri-linearly interpolated with the surrounding bins. Tri-linear interpolation distributes the weight w of the bin h(x) into two nearest neighbours (x1&x2) as follows, where b=9. h(x1) = h(x1) + w (1 - x x1 b ) (5) h(x2) = h(x2) + w ( x x1 b ) (6) Gaussian windowing and tri-linear interpolation increases the robustness of the descriptor. To handle variance in the illumination HOG feature vector v is normalized using the following equation v = v / v 2 + ε 2 (7) ε is the small normalization constant to avoid division by zero. 3. CONDENSATION - Tracking By Parts CONDENSATION [14] stands for "Conditional Density Propagation" and is based on particle filter techniques in order to track objects. Density represents a probability distribution of the location of the object. It is based on factored sampling, a method that aims at transforming uniform densities into weighted densities, but applied iteratively. Fig 3: Bayesian Networks used in the Tracking by parts tracking algorithm. Each BN is a single chain. First BN combines the observation from eyes & brows to obtain the likelihood. Second and Third BNs compute likelihood using one component. We have divided our tracking algorithm into two parts: 3-component and 5-component based. Each component is represented by a single chain BN as shown in Fig3. State estimation of each component is independent of each other. Eyes and eye brows are tracked together by using the first BN as we consider the spatial coherence between the two components. Using the second BN, combined patch of Eye and Eye brow is tracked. Both the left and right eye & eye brows are tracked independently. To track lips we use the third BN. Tracking by parts approach has been used by [15][16][17]. In the 3 component based approach we have used 2 nd and 3 rd BN, shown in Fig3. We are using the 1 st and 3 rd BN for the 5 component based approach. Using these combinations of Bayesian Networks we intend to study the geometrical relationship between eye and eye brows. 3.1 Algorithm 1: Use AdaBoost [18] to detect the face and separately initialize the tracking algorithm for each component (each component is initialized with 4 state variables: centre x&y coordinates and width & height) 2: Generate a set p i of weighted particles for each component of the face. p i = (s i t:n, π i t:n ) (8) s i t:n is a particle and π i t:n is weighted associated to that particle.

3: Update Weight p(z t x t 1, x t 2 ) = Calculate HOG features for each component and pass them to the classifier to compute the likelihood score. The joint likelihood of the weighted particles is the combination of all particles derived from each component. 2 i i=1 p( z t x i t ) (9) Joint likelihood for eye and eye brow shown in the first BN of fig 3 is calculated using the above equation. For rest of the components, likelihood is simply represented by p(z i t x i t ). Weight update step is being shown in the dashed rectangle in fig 4. Values with greater weight span across a greater range in the cumulative distribution, thus elements with higher probability are chosen several times while others with lower probability are not chosen. CONDENSATION algorithm applies this factored sampling iteratively to successive image frames in sequence, and each iteration uses prior density as the weighted sample set of the last iteration. In our experiment we have used 100 particles for every iteration. The posterior p(x t z 1:t ) is calculated using Baye s rule: After the end of the weight update step, condensation algorithm returns a mean patch of each component which is computed using the posterior calculated using Baye s rule. We compute the HOG features of each facial component and pass it to the classifier (last two rectangles in Fig 4). 3 Score1 = k=1 p k (11) 5 Score2 = k=1 p k (12) Score1 represents the product of 3 components while Score2 combines the 5 components score. Maximum of the two scores from eq. 11 & 12 represents the optimum tracking algorithm for that frame of video. Both the tracking algorithms run independently & parallel for each frame, and whenever we lose track in any one of them we reinitialize it using the step 1 of the tracking algorithm (3.1). 4. Data Set Eyes Eye Brows Non Eyes Non Eye-Brows p(z t x t ) p(x t z 1:t 1 ) p(z t z 1:t 1 ) p(z t x t ) is computed using equation 9. Input Face Lips Eye & Brows Non-Lips Non-Eye& Brows 3 Components 5 Components Fig3: Images from Dataset used for training the. We have extracted above images from The IMM Face Database - An Annotated Dataset of 240 Face Images [19] and videos in Mark Frank Dataset [20]. For each landmark point we have extracted 250 images to capture all variations. We have trained the 2-class [21] with radial basis function kernel. Facial Component Score1 Score2 Max (Score1, Score2) Facial Component Fig 4: Flowchart of the part based tracking 5. Results We have performed a series of experiments on the videos sequences from Mark Franks Dataset used in [17] having 600 frames and the resolution is 720x480. To quantify the performance we have compared the performance of our algorithm separately with 3-component and 5-component based approach on Dudek sequence [22] shown in fig5. We measure accuracy every 10 frames by finding the area of intersection of the predicted area with ground truth.

Figure 5: Tracking results on the Dudek sequence [22].(1 st row : 3-component,2 nd row-5component,3 rd row-3&5 component) Errors are measured every 30 frames. Initially in first 3 sections the person s eye & eye brows remain geometrically close, the 3- component based approach dominates over the 5-component. In the final two sections, both components are apart, 5-component approach gives more accuracy. Overall accuracy in 3 rd row is 92.45% while in 1 st and 2 nd its 71.4% & 77.4% respectively.. accuracy increases. In these videos, we observed how their expression affects the tracking results,3- component based approach gives more accuracy for people who tend to bring eye & eye brow geometrically close in their expressions(illustrated in fig 7). Fig 6: Results of Accuracy vs. Frame number (average accuracy of 88.67%) Fig 6 compares illustrates accuracy computed over 5 videos from [17]. Initially, the accuracy is moderate, but once tracking algorithm stabilizes the (a) Fig 7: An illustration of the effect of geometrical relationship between the eye and eye brows on tracking. When they are apart, as in (a), 5-component based gives more accuracy and while they come close geometrically (b) 3-component based approach gives more accurate result. (b)

6. Conclusion We have presented a comparison of component based approach for robust face tracking under free head movement and varying expressions. We have exploited the geometrical relationship between the eyes and eye brows by comparing their combined and separate scores. The tracking problem has been formulated as Bayesian Network. Histogram of Oriented Gradients features along with has been introduced for robust tracking. In the future work, we will extend our approach by interactive collaboration of the 3-component and 5-component based methods using weight sharing. References [1] G. R. Bradski. Computer Vision Face Tracking as a Component of a Perceptural User Interface. IEEE Work. On Applic. Comp. Vis., Princeton, pp. 214-219, 1998 [2] K. Schwerdt, J. Crowley, Robust face tracking using color, in: Proceedings of the International Conference on Automatic Face and Gesture Recognition, 2000, pp. 90 95 [3] X. Bing, Y. Wei, C. Charoensak, Face contour tracking in video using active contour model, in: Proceedings of the European Conference on Computer Vision, 2004, pp. 1021 1024 [4] S. McKenna, S. Gong, R. Würtz, J. Tanner and D. Banin, Tracking facial feature points with Gabor wavelets and shape models, Audio- and Video-Based Biometric Person Authentication. First International Conference, AVBPA'97. Proceedings, 1997, pp. 35-42. [5] Kruger, V.; Happe, A.; Sommer, G.;, "Affine realtime face tracking using Gabor wavelet networks," Pattern Recognition, 2000. Proceedings. 15th International Conference on, vol.1, no., pp.127-130 vol.1, 2000 doi: 10.1109/ICPR.2000.905289 [6] Z. Liu and Y. Wang. Face detection and tracking in video using dynamic programming. In ICIP, 2000. [7] D. Ross, J. Lim, and M. H. Yang, Adaptive probabilistic visual tracking with incremental subspace update, Proceedings of European Conference on Computer Vision, Vol. 2, pp. 470-482, 2004. [8] I. Patras and M. Pantic, Particle filtering with factorized likelihoods for tracking facial features, Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp. 97-102, 2004. [9] Wen-Yan Chang; Chu-Song Chen; Yi-Ping Hung;, "Tracking by Parts: A Bayesian Approach With Component Collaboration," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol.39, no.2, pp.375-388, April 2009 doi: 10.1109/TSMCB.2008.2005417 [10] Jaewon Sung; Daijin Kim;, "Combining Local and Global Motion Estimators for Robust Face Tracking," Robot and Human interactive Communication, 2007. RO-MAN 2007. The 16th IEEE International Symposium on, vol., no., pp.345-350, 26-29 Aug. 2007 [11] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. IEEE Trans. on Pattern Recognition and Machine Intelligence, 23(6):681-685, 2001 [12] Dalal, N.; Triggs, B.;, "Histograms of oriented gradients for human detection," Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol.1, no., pp.886-893 vol. 1, 25-25 June 2005 [13] Monzo, D.; Albiol, A.; Sastre, J.;, "HOG-EBGM vs. Gabor-EBGM," Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, vol., no., pp.1636-1639, 12-15 Oct. 2008 [14] M. Isard and A. Blake, ICONDENSATION- Unifying low-level and high-level tracking in a stochastic framework, Proceedings of European Conference on Computer Vision, Vol. 1, pp. 893-908, 1998. [15] C. Yang, R. Duraiswami, and L. Davis, Fast multiple object tracking via a hierarchical particle filter, Proceedings of IEEE InternationalConference on Computer Vision, Vol. 1, pp. 212-219, 2005. [16] K. Okuma, A. Taleghani, N. De Freitas, J. Little, and D. G. Lowe, A boosted particle filter: Multitarget detection and tracking, Proceedings of European Conference on Computer Vision, Vol. 1, pp. 28-39, 2004. [17] Wen-Yan Chang; Chu-Song Chen; Yi-Ping Hung;, "Tracking by Parts: A Bayesian Approach With Component Collaboration," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol.39, no.2, pp.375-388, April 2009 doi: 10.1109/TSMCB.2008.2005417 [18] Intel Corp.,Opencv Library, World Wide Web, http://www.intel.com/technology/computing/opencv [19] The IMM Face Database - An Annotated Dataset of 240 Face Images Michael M Nordstrom, MadsLarsen, Janusz Sierakowski Mikkel B Stegmann [20] N. Bhaskaran, I. Nwogu, M. Frank, and V. Govindaraju, Lie To Me: Deceit Detection via Online Behavioral Learning, 9th IEEE Conference on Face and Gesture Recognition, Santa Barbara, CA. [21] Chih-Chung Chang and Chih-Jen Lin, LIB : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. http://www.csie.ntu.edu.tw/~cjlin/libsvm [22] Dudek Sequence: http://www.cs.toronto.edu/vis/projects/dudekfacesequen ce.html