Multiple Facial Action Unit Recognition Enhanced by Facial Expressions

Size: px

Start display at page:

Download "Multiple Facial Action Unit Recognition Enhanced by Facial Expressions"

Martina Franklin
6 years ago
Views:

2016 23rd International Conference on Pattern Recognition (ICPR) Cancún Center, Cancún, México, December 4-8, 2016 Multiple Facial Action Unit Recognition Enhanced by Facial Expressions Jiajia Yang,

edu.cn Department of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute Troy, NY, USA Email: qji@ecse.rpi.

1 rd International Conference on Pattern Recognition (ICPR) Cancún Center, Cancún, México, December 4-8, 2016 Multiple Facial Action Unit Recognition Enhanced by Facial Expressions Jiajia Yang, Shan Wu, Shangfei Wang 1,Qiang Ji School of Computer Science and Technology University of Science and Technology of China Hefei, Anhui, China Department of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute Troy, NY, USA Abstract Facial expressions and facial action units (AU) respectively describe facial behavior globally and locally. Therefore, the dependencies between expressions and AUs carry crucial information for facial action unit recognition, yet have not been thoroughly exploited. In this paper, we propose a novel facial action unit recognition method enhanced by facial expressions, which are only required during training. Specifically, we propose a three-layer restricted Boltzmann machine (RBM) to capture the probabilistic dependencies among expressions and AUs. The parameters of the RBM model are learned by maximizing the log conditional likelihood with gradient ascent. After that, the learned RBM model combines AU measurements with the AU-expression relations it captures to perform multiple AU recognition through probabilistic inference. Experimental results on three benchmark databases, i.e. the CK+ database, the ISL database and the BP4D database, demonstrate the effectiveness of our method on capturing the joint relations among AUs and expression to improve AU recognition. I. INTRODUCTION Automatical facial expression recognition has the great application potential in many fields, such as security [1], human-computer-interaction [2], driver safety [3], and healthcare [4]. Therefore, facial expression has attracted increasing attention in recent years. Facial expressions can be presented in two ways: expression categories and facial action units (AU). Expression categories describe facial behaviors in a global way, while facial action units depict the local variations on face. Almost all facial expressions consist of several certain combinations of AUs. For example, as shown in Table I, which lists the co-occurrence probability among four AUs and two expressions in the CK+ database, we find that both happiness and surprise occur along with at least two AUs. Happiness co-occur with AU6, AU12 and AU25, and surprise appears with AU1 and AU25. Different expressions may generate the same AUs. For example, AU25 could appear on both a happy face and a surprise face. AUs rarely appear alone. Generally they are always cooccurrence or mutual exclusion. For instance, from Table I, we find that AU1 and AU6 rarely occur simultaneously due to 1 This is the corresponding author. the structure of facial muscle, while AU6 and AU12, AU1 and AU25 always appear together. Therefore, iherent relationships among AUs as well as relations between expressions and AUs are important information for AU recognition. TABLE I PROBABILISTIC DEPENDENCIES AMONG AUS AND EXPRESSIONS. AU1 AU6 AU12 AU25 happy surprise AU AU AU AU Currently, the mainstream of AU recognition research classifies each AU independently or detects certain AU combinations. They either ignore the inherent dependencies a- mong multiple AUs as well as dependencies between AUs and expressions, or can not handle hundreds of variance in AU combinations. Only recently, several works exploit AU relations for AU recognition. For example, Tong et al. [5] adopted a Bayesian Network (BN) to model the semantic relationships among AUs through the structure and the conditional probabilities of the learned BN. Wang et al. [6] proposed a hierarchical Restricted Boltzman Machine model to capture the global semantic relationships among AUs through the connections between the hidden units and visible units of RBM. Zhu et al. [7] adopted a multi-task feature learning method to learn the shared features for each AU group, and a Bayesian network to model relations among action units from the target labels. Zhang et al. [8] utilized multi-task multiple kernel learning to detect fixed groups of multiple AUs simultaneously. The fixed AU groups are determined by a hierarchical model based on AU co-occurrences existing in AU labels and facial regions. Zhao et al. [9] proposed to select a sparse subset of facial patches and learn multiple AU classifiers simultaneously under the constrains of group /16/$ IEEE 4078

2 sparsity and local AU relations (i.e positive correlation and negative competition). Eleftheriadis et al. [10] proposed a multi-conditional latent variable model to combine AU label dependencies into latent space and classifier learning. Song et al. [11] proposed a Bayesian compressed sensing model to model AU sparsity and co-occurrence for AU recognition. All these works successfully exploit the dependencies inherent in multiple AUs to facilitate AU recognition. However, few of them consider the AU-expression relations to improve AU recognition. To the best of our knowledge, there is so far only three works that recognizes AUs assisted by expressions. Wang et al. [12] designed an AU recognition method with the help of expression labels as hidden knowledge under incomplete AU labeling. They constructed a BN model to capture the dependencies not only among AUs but also between AUs and expressions. During training, the image features and expression labels are used, while the AU labels may be missing. Structure EM is adopted to learn the structure and parameters of the BN. The traditional image-driven method is adopted to obtain the expression and AU measurements. During testing, the AUs are inferred by combining the measurements and the AU relations in the BN model. Due to the Markov assumption, the BN model can only handle local dependencies among AUs and expressions. Wang et al. [6] proposed to use a 3-way RBM to capture the global dependencies between expressions and AUs for AU recognition. The expression labels are used as privileged information, which is only required during training. The 3-way RBM model can be regarded as a mixture model, therefore, it models the relations between each expression and AUs independently, ignoring the shared dependencies among multiple expressions and AUs. Ruiz et al. [13] proposed a hidden-task-learning method to exploit prior knowledge about the relation between a set of hidden tasks (i.e. AUs) and visible-tasks ( i.e. facial expressions ). The relations between AUs and facial expressions are modelled based on empirical results obtained in psychological studies. Although Ruiz et al. method can learn to recognize AUs from the training data where AU labels are limited or even not available, the learned AU-expression relations from empirical results of psychological studies are incorrect. In this paper, we propose a novel facial action unit recognition method enhanced by facial expressions, which are only required during training. Specifically, we propose a hierarchy RBM model to capture the global relationships among AUs and facial expressions. Instead of modeling local dependencies between AUs and expressions like BN, the proposed RBM model can capture global dependencies among AUs and expressions through introducing a hidden layer. Unlike 3- way RBM, which captures the global dependencies between each expression and AUs independently, the proposed RBM model captures the dependencies among multiple expressions and AUs simultaneously by dividing its visible layer into two parts, expression nodes and AU nodes. During training, the parameters of the RBM model are learned by maximizing the log conditional likelihood with gradient ascent. During test, we infer the target AUs according to the learned RBM model without expression. Experiments on three benchmark databases demonstrate the superiority of our method to existing methods, further proving the necessity of exploiting the inherent relations among AUs and expressions. II. METHOD The framework of our method is shown in Fig. 1, which consists of two modules: measurement extraction and model learning. During the training phase, we obtain the measurement of AUs and expressions by a traditional image-based classifier. Then we learn the RBM model to capture the global relationships among AUs and expressions with the evidence of measurement. During the testing, we infer the final AUs through the learned RBM model without expressions. Assuming the training data is D = {X i, (λ i1,, λ in, λ in+1 )} N i=1, where N is the number of training samples, n is the number of AU labels. For the ith training sample, we denote X i as the feature vector of the ith training sample, and λ i1,, λ in as the n AU labels, and λ in+1 as the expression label respectively. The goal of this work is to construct an AU classifier with the help of expression knowledge that is only available during training. The details of our method is as follows. A. Measurement Extraction Fig. 1. The framework of our method The measurements mλ in Fig. 2 are preliminary estimation of the AU and expression labels using existing image-driven recognition method based on training data. In our work, the Support Vector Machine (SVM) is used as the classifier to obtain the measurements of AUs and expressions. B. Modeling AUs and Expressions Relationships by Three- Layer Restricted Boltzmann Machine In order to model the semantic relationships among AUs and expressions, a three-layer RBM model is used in this work. We first present our RBM model for AU recognition, then we 4079

3 provide the learning and inference algorithms of the threelayer RBM model. Fig. 2 shows the structure of the RBM model used to capture the global relationships among AUs and expressions. The middle layer is the binary visible units λ 1, λ 2,, λ n1 and E 1, E 2,, E n2, representing the state of n AUs and expressions respectively. In our model, we encode the expression labels with 0 and 1 in binary coding system. For k expression labels, we use {0, 1} n to represent each label, in which 2 n = k. For example, if there are 7 kinds of expression labels, we use 001 to represent the first kind expression label, and 111 for the 7th kind expression. The variables m 1, m 2,, m d in the bottom layer stand for the measurements of AUs and expressions obtained by a traditional classifier. In our model, each latent unit is connected to all AU and expression nodes in order to model the higher-order relationships among AUs and expressions. And the bottom layer s nodes are attached to each AU and expression variable, providing low-level evidence. Fig. 2. The model for AU recognition.in training phase we add expression nodes,in predicting phase without expression nodes The total energy of the model is defined in Equation 1, where b i and b j are the biases for the AUs and expression nodes respectively, and c k is the biases for latent nodes. The first set of parameters Wik 1 and W jk 1 measure the compatibility between each pair of latent units and visible units (AU and expression nodes), and the second set of parameters Wi 2 and Wj 2 measures the compatibility between each pair of measurement and AU and the compatibility between each pair of measurement and expression labels. E(m, λ, E, h; θ) = i k j λ i Wikh 1 k i k b i λ i c k h k Wi 2 λ i m i i E j Wjkh 1 k b j E j j k Wj 2 E j m j j Our model can be decomposed to two parts. The first part consisting of the hidden nodes and visible nodes is a RBM model, and each latent unit is connected to all the action units and expression units to model their dependencies. These two layers encode the high-level semantic relationship among (1) AUs and AU-expression, enabling us to infer the presence or absence of each AU in the top-down direction. The bottom two layers integrate the AU labels and AU measurements obtained by traditional classifier for AU recognition, providing evidence for AU recognition in a bottom-up direction. The obtained relationship between AUs and AU-expression can be implicitly inferred from the parameters Wik 1 and Wjk 1 (bias terms are omitted without loss of generality). To indicate how they are related, considering α th latent unit h α, its probabilistic dependency with each AU is measured by the pairwise energy E(h α, λ i ) = λ i Wiα 1 h α, i = 1,..., n1. And the probabilistic dependency with each expression unit is measured by the pairwise energy E(h α, ˆλ i ) = ˆλ i Wiα 1 h α, i = 1,..., n2. Apparently, the lager Wiα 1 is, the more likely AU λ i will be present. Conversely, the smaller W iα is, the more likely λ i will be absent. The entire parameters [Wiα 1 ]n1 1 obtain a specific presence and absence pattern of all the AUs and expressions. Take the parameter [Wi2 1 ]n1 1 for h 2 for example, Figure 3 and Figure 4 respectively depicts the pattern of h 2 with expressions and without expressions on the BP4D database, which has 6 AUs and 8 categories of expression. For each expression label, we encode with 4 binary numerical value. Comparing two figures, when we add the expression labels to the model, the relations among AUs change a little. In Figure 4 only two AUs: AU15 (a5) and AU23 (a6) tend to be present, AU15 (a5) is the most likely to be present. While in Figure 3 we add expression nodes into model, then AU4 (a3) become the most likely to be present, AU7 (a4) and AU15 (a5) tends to occur yet are less likely than AU4 (a3), and AU1, AU2 and AU23 (a1,a2,a6) are very likely to be absent. Since we added sadness expression nodes(a7 to a10:0010) to the model, AU4(Brow Lowerer) and AU7(Lid Tightener) became present, while AU23(Lip Tightener) became likely to be absent. Fig. 3. Dependencies among AUs and expressions In training phase, parameters are learned by maximizing the log conditional likelihood which is shown in Equation 2, where m (i) represents the measurements of the i-th training sample and λ (i) stands for the labels of AUs and expressions. It is maximized with the stochastic gradient ascent method and the gradient can be calculated with Equation 3. θ = arg max L(θ) θ N L(θ) = log p(λ (i), E (i) m (i) ; θ) i=1 (2) 4080

4 Fig. 4. Dependencies among AUs L(θ) N = ( E θ θ p(h,λ,e m (i),θ) E θ p(h λ (i),m (i),e (i),θ) ) i=1 (3) To obtain the gradient, we should calculate p(h λ, m, E, θ) and p(h, λ, E m, θ). p(h λ, m, E, θ) can be analytically calculated with Equation 4, where σ(x) = 1/(1+e x ) is a sigmoid function. We approximate p(h, λ, E m, θ) by sampling h with Equation 4, sampling λ with Equation 5, and sampling E with Equation 6. The detailed algorithm for learning the parameter W 1 is shown in Algorithm 1. Other parameters can be estimated in the similar way. P (h k = 1 λ, E, m) = σ( c k i P (λ i = 1 h, m) = σ( b i k P (E j = 1 h, m) = σ( b j k W 1 ikλ i j W 1 jke j ) (4) W 1 ikh k W 2 i m i ) (5) W 1 jkh k W 2 j m j ) (6) After parameters learning, we infer the target label based on the learned model. Given the measurement m of a testing sample, the final state of each AU ˆλ i can be obtained by maximizing its posterior probability given m with Equation 7. ˆλ i = arg max ˆλ i P (ˆλ i m) (7) Computing P (ˆλ i m) requires marginalizing over all the latent variables {h k } m k=1 and other action units and expression units λ ss =i which could be intractable. However it can be efficiently performed with the Gibbs sampling method by iteratively sampling h from P (h λ, m) and sampling λ from P (λ h, m). Sampled instances of each λ i are used to calculate the corresponding marginal probability. Algorithm 2 presents the detailed steps. A. Experimental Conditions III. EXPERIMENTS In our work, we use three widely used databases in the literature: the Extended Cohn-Kanade(CK+) [14], the BP4D Algorithm 1 Revised contrastive divergence algorithm for learning the proposed model [6] Input: Training data: {λ (i) R 1 n1, E (i) R 1 n2, m (i) R 1 d } N i=1 Output: Model parameters W 1, W 2, b, c 1: repeat 2: for each training sample (λ, E, m) do 3: D + 2 = λt m, D + 3 = ET m 4: Sample h + P (h E, λ, m) with Equation 4 5: Calculate the positive gradient D + 1 = λt h + 6: Sample λ P (λ h +, m) with Equation 5 7: D2 = λ T m 8: Sample E P (E h +, m) with Equation 6 9: D3 = E T m 10: Sample h P (h λ, E m) with Equation 4 11: Calculate the negative gradient D1 = λ T h 12: Update: 13: W 1 = W 1 + η(d 1 + D 1 ) 14: 15: W 2 = W 2 + η(d 2 + D 2 ) b = b + η(λ λ ) 16: c = c + η(λ λ ) 17: end for 18: until Convergence Algorithm 2 Inference of P (λ m) with Gibbs Sampling [6] Input: Test sample m; Parameters W 1, W 2, b, c Output: P (λ i m) for i = 1, 2,..., n 1: for chain = 1 C do 2: Randomly initialize a 0 3: 4: for t = 0 N do 5: Sample h t P (h λ, m) with Equation 4 6: Sample λ t+1 P (λ h t, m) with Equation 5 7: end for 8: end for 9: for i = 1 n do 10: Collect the last K samples of λ i from each chain 11: Calculate P (λ i m) based on the collected samples 12: end for database [15] and the RPI Intelligent Systems Lab (ISL) Image Database [16]. The CK+ database contains 593 samples collected from 118 subjects. There are 7 expression categories, i.e., anger, contempt, disgust, fear, happiness, sadness, surprise, and 13 AU labels for part of the samples. In our work, we select 327 samples with both expression category and AU labels. And only AUs whose frequencies are more than 10% are considered, resulting in 13 AUs (AU1, AU2, AU4, AU5, AU6, AU7, AU9, AU12, AU17, AU23, AU24, AU25, and AU27). The BP4D database consists of 6554 dynamic spontaneous facial expression data. We picked out 6 AUs (AU1, AU2,AU4, AU7, AU15, AU23) and 8 expression categories (happiness, sadness, surprise, embarrassment, fear, physical pain, anger, 4081

5 and disgust) in BP4D database[15] for our experiment. The ISL database[16] is collected under real-world conditions with uncontrolled illumination and background, as well as moderate head motion. The 19 frontal view video clips for 7 subjects displaying facial expressions are adopted in this work. It contains 10 AUs (AU1, AU2, AU5, AU6, AU12, AU17, AU23, AU25, AU27, and AU45) and two expressions (surprise and happiness). Leave one subject out cross-validation is employed in our experiment. Two types of metrics for multi-label classification are adopted, i.e. example-based assessment criteria: Hamming Loss, Example based accuracy, Example based Subset Accuracy, and label-based assessment criteria: Label based MacroF1. Detailed definitions of these evaluation measures can be found in [17]. B. Experimental Results and Analysis TABLE II EXPERIMENTAL RESULTS ON THREE DATABASES. Database method HL Acc SubAcc MacF1 CK+ BP4D ISL Image-driven Model-based our s Image-driven Model-based our s Image-driven Model-based our s HL= Hamming Loss, Acc=Example based accuracy, SubAcc=Example based Subset Accuracy, MacF1=Label based MacroF1 In this section, we compare our work with two methods: the image-driven method using SVM, and the model-based method using the proposed RBM with only AUs. The experimental results on three databases are shown in Table II. From this table, we find that: First, the model-based method with AUs only performs better than the image-driven method, since all four metrics of the model-based method on three databases are higher than the image-driven method in nearly all the cases. Specifically, on the ISL database, the improvements are most obvious, since both the accuracy and macro F1-score of image-driven method are lower than 50%, while the results of model-based method are higher than 80%. The image-driven method treats each AU as independently classes, while the model-based method using RBM captures the global semantic relations among AUs. These demonstrate that inherent dependencies among AUs are very useful to improve AU recognition. Second, our proposed method of modeling the relations among AUs and expressions achieves the best results among three methods with lowest Hamming Loss and highest accuracy and F1 score in most situations. On the CK+ database, the accuracy and the F1-score are increased by 12% and 5%; on the BP4D database, the improvement on the accuracy and the subset accuracy are about 9% and 12% separately; on the ISL database, the subset accuracy and the F1-score are raised from 75% to 79% and 95% to 96% respectively. Therefore our method enhances the performance of AU recognition by capturing relationship among AUs and expressions. Finally, the improvement of the model-based method and the proposed method is more obviously on the BP4D database and the ISL databases than that on the CK+ database. As mentioned in Section III-A, the CK+ database is a posed dataset, and other two databases are spontaneous dataset. It demonstrates that the effectiveness of our method on recognizing real-world data. C. Comparison with Related Work We compare our work with recent state-of-the-art approaches. On CK+ database, we compare our work with [12] and [6]. Both work employed expression as a privileged information to facilitate AU recognition. Wang et al. [12] proposed to use BN to capture the relations among AUs and expression. Wang et al. [6] proposed a three-way RBM model to incorporate expression during training to facilitate the estimation of the AU dependencies. On BP4D database, we compare with [18]. Gudi et al. [18] proposed a model-based method by using a deep CNN consisting of 3 convolutional layers, 1 sub-sampling layer and 1 fully connected layer to predict the occurrence and intensity of Facial AUs. The results are listed in Table III. To our knowledge, no work hasbeen done to capture AU relationship on ISL database, so we don t compare our work with any related works on this database. From Table III, we find that our method performs better than [12] and [6]. Compared with [12], our method lowers the Hamming loss by 0.03 and increases the macro F1-score by 13%. Due to the Markov assumption, the relationships captured by BN are local pairwise. In contrast, our method captures a global relations among multiple AUs and expressions through a hidden layer. When comparing with [6], the macro F1-score of our method is 93%, which is 11% higher than [6]. The relations in [6] are expression independent, yet our method captures the joint relations among AUs and expressions through the shared weights among multiple expressions. These demonstrate that our model by learning a joint relationships among AUs and expressions is stronger to model the complicated relations among multiple AUs than the state-of-the-art approaches. TABLE III COMPARISON WITH RELATED WORKS Database Method HL MacF1 CK+ BP4D [12] [6] ours [18] ours HL= Hamming Loss, MacF1=Label based MacroF1 Table IV lists some results of recent AU recognition work using AU relations only on the CK+ database. As mentioned in Section I, Song et al. [11] modeled AU sparsity and co-occurrence using a Bayesian compressed sensing model. 4082

6 Eleftheriadis et al. [10] proposed a multi-conditional latent variable model for simultaneous facial feature fusion and detection of facial action units. Zhu et al. [7] proposed to learn the shared features through a multi-task learning process and model AU relations by a Bayesian network. Zhang et al. [8] proposed to use a multi-task multiple kernel learning to detect relations among fixed groups of multiple AUs. Since the specific AUs and condition is not completely accordant, we make a rough comparison with them. From Table IV, we can see that compared with MC-LVM, MCF, H-MTMKL and BGCS, our method achieves a higher macro F1-score. However, the accuracy of our method is worse than BGCS and DBN. According to the definition of accuracy, it is easily influenced by the number of negative classes, while F1-score makes a balance between precision and recall, so it is more reliable. Furthermore, our method outperforms [7] on both macro F1-score and accuracy. Therefore, compared with the related work listed in Table IV, our method achieves a more reliable and balanced result, demonstrating the effectiveness of our method. TABLE IV REFERENCE TO RELATED WORKS Database Related Works Method MacF1 Acc CK+ MCF [11] BGCS DBN [10] MC-LVM [7] MTFL+BN [8] H-MTMKL ours our method IV. CONCLUSIONS Although the dependencies between expressions and AUs carry crucial information for facial action unit recognition, only a very few works exploited such dependencies. In this paper, we propose a novel AU recognition method assisted with expressions, which is only required during training. Specifically, we proposed a three-layer RBM to capture the global dependencies among AUs as well as among AUs and expressions by introducing a hidden layer and dividing its visible layer into two parts: expression nodes and AU nodes. Experimental results on three benchmark databases demonstrate the effectiveness of our method on capturing the joint relations among AUs and expressions and its superior performance on AU recognition. V. ACKNOWLEDGMENT This work has been supported by the National Science Foundation of China (Grant No , , ), and the project from Anhui Science and Technology Agency ( SMF223). REFERENCES [1] Andrew Ryan, Jeffery F Cohn, Simon Lucey, Jason Saragih, Patrick Lucey, Fernando De la Torre, and Adam Rossi. Automated facial expression recognition system. In Security Technology, rd Annual 2009 International Carnahan Conference on, pages IEEE, [2] Alessandro Vinciarelli, Maja Pantic, and Hervé Bourlard. Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12): , [3] Esra Vural, Müjdat Çetin, Aytül Erçil, Gwen Littlewort, Marian Bartlett, and Javier Movellan. Automated drowsiness detection for improved driving safety [4] Patrick Lucey, Jeffrey Cohn, Simon Lucey, Iain Matthews, Sridha Sridharan, and Kenneth M Prkachin. Automatically detecting pain using facial actions. In Affective Computing and Intelligent Interaction and Workshops, ACII rd International Conference on, pages 1 8. IEEE, [5] Yan Tong and Qiang Ji. Learning bayesian networks with qualitative constraints. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pages 1 8. IEEE, [6] Ziheng Wang, Yongqiang Li, Shangfei Wang, and Qiang Ji. Capturing global semantic relationships for facial action unit recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages , [7] Yachen Zhu, Shangfei Wang, Lihua Yue, and Qiang Ji. Multiple-facial action unit recognition by shared feature learning and semantic relation modeling. In Pattern Recognition (ICPR), nd International Conference on, pages , Aug [8] Xiao Zhang and M.H. Mahoor. Simultaneous detection of multiple facial action units via hierarchical task structure learning. In Pattern Recognition (ICPR), nd International Conference on, pages , Aug [9] Kaili Zhao, Wen-Sheng Chu, Fernando De la Torre Frade, Jeffrey Cohn, and Honggang Zhang. Joint patch and multi-label learning for facial action unit detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June [10] Stefanos Eleftheriadis, Ognjen Rudovic, and Maja Pantic. Multiconditional latent variable model for joint facial action unit detection. In Proceedings of the IEEE International Conference on Computer Vision, pages , [11] Yale Song, Daniel McDuff, Deepak Vasisht, and Ashish Kapoor. Exploiting sparsity and co-occurrence structure for action unit recognition. In Automatic Face and Gesture Recognition (FG), th IEEE International Conference and Workshops on, volume 1, pages 1 8. IEEE, [12] Jun Wang, Shangfei Wang, and Qiang Ji. Facial action unit classification with hidden knowledge under incomplete annotation. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pages ACM, [13] Adria Ruiz, Joost Van de Weijer, and Xavier Binefa. From emotions to action units with hidden and semi-hidden-task learning. In Proceedings of the IEEE International Conference on Computer Vision, pages , [14] Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pages IEEE, [15] Xing Zhang, Lijun Yin, Jeffrey F Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M Girard. Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, 32(10): , [16] Qiang Ji. Rpi intelligent systems lab (isl) image databases. ecse.rpi.edu/homepages/cvrl/database/database.html. [17] Mohammad S Sorower. A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis, [18] Amogh Gudi, H Emrah Tasli, Tim M den Uyl, and Andreas Maroulis. Deep learning based facs action unit occurrence and intensity estimation. In Automatic Face and Gesture Recognition (FG), th IEEE International Conference and Workshops on, volume 6, pages 1 5. IEEE,

Learning the Deep Features for Eye Detection in Uncontrolled Conditions

2014 22nd International Conference on Pattern Recognition Learning the Deep Features for Eye Detection in Uncontrolled Conditions Yue Wu Dept. of ECSE, Rensselaer Polytechnic Institute Troy, NY, USA 12180