Video-Based Facial Expression Recognition Using Local Directional Binary Pattern

Similar documents
Local Quaternary Patterns and Feature Local Quaternary Patterns

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

A Binarization Algorithm specialized on Document Images and Photos

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Facial Expressions Recognition in a Single Static as well as Dynamic Facial Images Using Tracking and Probabilistic Neural Networks

Gender Classification using Interlaced Derivative Patterns

Detection of an Object by using Principal Component Analysis

Combination of Color and Local Patterns as a Feature Vector for CBIR

Face Detection with Deep Learning

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A COMBINED APPROACH USING TEXTURAL AND GEOMETRICAL FEATURES FOR FACE RECOGNITION

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Cluster Analysis of Electrical Behavior

Scale Selective Extended Local Binary Pattern For Texture Classification

Facial Expression Recognition Using Sparse Representation

Classifier Selection Based on Data Complexity Measures *

Extraction of Texture Information from Fuzzy Run Length Matrix

Face Recognition using 3D Directional Corner Points

Combination of Local Multiple Patterns and Exponential Discriminant Analysis for Facial Recognition

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

An Image Fusion Approach Based on Segmentation Region

Learning a Class-Specific Dictionary for Facial Expression Recognition

Edge Detection in Noisy Images Using the Support Vector Machines

TN348: Openlab Module - Colocalization

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Lecture 5: Multilayer Perceptrons

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection

Real-time Motion Capture System Using One Video Camera Based on Color and Edge Distribution

Facial Expression Recognition Based on Local Binary Patterns and Local Fisher Discriminant Analysis

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

On Modeling Variations For Face Authentication

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

The Research of Support Vector Machine in Agricultural Data Classification

Computer Aided Drafting, Design and Manufacturing Volume 25, Number 2, June 2015, Page 14

Fast Feature Value Searching for Face Detection

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Histogram of Template for Pedestrian Detection

Face Recognition Based on SVM and 2DPCA

Palmprint Feature Extraction Using 2-D Gabor Filters

Face Recognition by Fusing Binary Edge Feature and Second-order Mutual Information

Recognition of Facial Expressions Based on Salient Geometric Features and Support Vector Machines

Detection of Human Actions from a Single Example

Integrated Expression-Invariant Face Recognition with Constrained Optical Flow

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

A Background Subtraction for a Vision-based User Interface *

Mathematics 256 a course in differential equations for engineering students

Image Matching Algorithm based on Feature-point and DAISY Descriptor

Classifying Acoustic Transient Signals Using Artificial Intelligence

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Analysis of Facial Expression Recognition Techniques

User Authentication Based On Behavioral Mouse Dynamics Biometrics

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Recognizing Faces. Outline

Robust Shot Boundary Detection from Video Using Dynamic Texture

Analysis of Continuous Beams in General

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Object-Based Techniques for Image Retrieval

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity

Modular PCA Face Recognition Based on Weighted Average

An AAM-based Face Shape Classification Method Used for Facial Expression Recognition

Detection of hand grasping an object from complex background based on machine learning co-occurrence of local image feature

Support Vector Machines

Machine Learning: Algorithms and Applications

Multiple Frame Motion Inference Using Belief Propagation

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Querying by sketch geographical databases. Yu Han 1, a *

Feature Reduction and Selection

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

A Dynamic Curvature Based Approach for Facial Activity Analysis in 3D Space

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

Adaptive Silhouette Extraction and Human Tracking in Dynamic. Environments 1

Recognition of Handwritten Numerals Using a Combined Classifier with Hybrid Features

Learning-based License Plate Detection on Edge Features

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

The Improved K-nearest Neighbor Solder Joints Defect Detection Meiju Liu1, a, Lingyan Li1, b *and Wenbo Guo1, c

Facial Expression Recognition Based on Complexity Perception Classification Algorithm

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Medical X-ray Image Classification Using Gabor-Based CS-Local Binary Patterns

Collaboratively Regularized Nearest Points for Set Based Recognition

Vol. 5, No. 3 March 2014 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

SVM Based Forest Fire Detection Using Static and Dynamic Features

WIRELESS CAPSULE ENDOSCOPY IMAGE CLASSIFICATION BASED ON VECTOR SPARSE CODING.

Multi-view 3D Position Estimation of Sports Players

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Tracking Using Motion-Guided Dynamic Template Matching

Robust visual tracking based on Informative random fern

Enhanced Face Detection Technique Based on Color Correction Approach and SMQT Features

The Research of the Facial Expression Recognition Method for Human-Computer Interaction Based on the Gabor Features of the Key Regions

Hybrid Non-Blind Color Image Watermarking

Transcription:

Vdeo-Based Facal Expresson Recognton Usng Local Drectonal Bnary Pattern Sahar Hooshmand, Al Jamal Avlaq, Amr Hossen Rezae Electrcal Engneerng Dept., AmrKabr Unvarsty of Technology Tehran, Iran Abstract Automatc facal expresson analyss s a challengng ssue and nfluenced so many areas such as human computer nteracton. Due to the uncertantes of the lght ntensty and lght drecton, the face gray shades are uneven and the expresson recognton rate under smple Local Bnary Pattern s not deal and promsng. In ths paper we propose two stateof-the-art descrptors for person-ndependent facal expresson recognton. Frst the face regons of the whole mages n a vdeo sequence are modeled wth Volume Local Drectonal Bnary pattern (VLDBP), whch s an extended verson of the LDBP operator, ncorporatng movement and appearance together. To make the survey computatonally smple and easy to expand, only the co-occurrences of the Local Drectonal Bnary Pattern on three orthogonal planes (LDBP-TOP) are debated. After extractng the feature vectors the K-Nearest Neghbor classfer was used to recognze the expressons. The proposed methods are appled to the vdeos of the Extended Cohn-Kanade database (CK+) and the expermental outcomes demonstrate that the offered technques acheve more accuracy n comparson wth the classc and tradtonal algorthms. Keywords- Ch-Square Statstcs; Facal Expresson Recognton; Feature Vector; K-Nearest Neghbor Classfer; Local Drectonal Bnary Pattern I. INTRODUCTION Facal expresson recognton s one of the most controversal topcs n computer vson, snce t brngs n a new dmenson of human-computer nteracton. Although, much growth has been acheved [1,2,3], but due to the facal expressons varablty, a hgh accuracy recognton sn t easly attaned. There are two prevalent procedures to extract facal features: geometrc feature-based methods and appearancebased methods [4]. Geometrc features offer the shape and locatons of facal components, whch are extracted to form a feature vector that ndcates the face geometry. Appearancebased methods utlze some mage flters such as Gabor wavelets, and they are nvolved wth ether the whole-face or specfc face-regons to extract the appearance changes of the face. In ths work, we use the Extended Cohn-Kanade facal expresson database (CK+) [5], whch s consst of 593 sequences across 123 subjects whch are FACS coded at the peak frame. All sequences are from the neutral face to the peak expresson. The last frame represents one of the basc moods whch are lsted as below: Anger, Contempt, Dsgust, Fear, Happness, Sadness and Surprse. It should be noted that facal expresson recognton has two mportant aspects: feature extracton and classfer desgnng. In ths paper we expermentally provde a survey on Local Bnary Pattern (LBP) features [1], [6,7,8] for facal expresson recognton and we compare t wth a newer method by the name of Local Drectonal Bnary Pattern (LDBP) [9]. Then we dscuss about two novel methods, VLDBP and LDBP-TOP whch are the extended form of LDBP for vdeo processng. Local Bnary Pattern was frst descrbed n 1994 [1]. It has snce been found to be a powerful feature for texture classfcaton; and recently have been ntroduced to represent faces n facal mages analyss. The most mportant factors about ths pattern are the computatonal smplcty, robustness aganst llumnaton changes and head rotatons [1]. Much progress has been made n the last decade and Local Bnary Patterns mproved so much. We are gong to have a bref revew on these mprovements and compare the result of recognton rates on dfferent methods whch are based on LBP and LDBP. As the second vtal aspect of facal expresson recognton, we used a K-nearest neghbor classfer to determne the expresson. K-nearest neghbor (KNN) s an nstance-based classfcaton method, frstly ntroduced by Cover and Hart [11]. The KNN classfcaton algorthm tres to fnd the K nearest neghbors of the current sample and uses a majorty vote to determne the class label of the current sample [12]. It should be noted that f nsuffcent features are used, even the best classfers would face falure. The remander of ths paper s structured as follows: We present a bref revew of related work n the next secton. The Extended Cohn-Kanade database s ntroduced n Secton III. Facal expresson analyss by Sngle mage processng s dscussed n Secton IV. The man contrbuton of ths paper s appeared n secton V, VI and VII. The frst two parts dscuss about facal expresson recognton usng VLDBP and LDBP-TOP features and Secton VII explans the classfcaton method usng K-nearest neghbor classfer. Also the prorty of these novel procedures over some other LBP-based methods s debated. Fnally, Secton VIII concludes the paper. www.jascse.org Page 7

Fgure 1. The sample facal expresson mages from the Extended Cohn Kanade database. II. RELATED WORK There has been consderable effort toward solvng the problem of Facal expresson recognton (FER). In human to human nteracton, t has been dscovered that verbal hnts provde 7% of the meanng of the message; vocal cues, 38%; and facal expressons, 55%. Thus facal expresson provdes more nformaton about the nteracton than the spoken words [13]. Due to the varous applcatons, FER makes many researchers be nterested n ths feld. Much mprovement has been made through the last decade and here we brefly revew some prevous work n order to put our work n context. In some exstng work the appearance changes of faces are modeled. General spatal analyss ncludng Lnear Dscrmnate Analyss (LDA) [14], and Gabor wavelet analyss [15] have been appled to prvleged face regons to extract the facal appearance changes. Among these methods, Gabor wavelet analyss s wdely used n facal expresson recognton due to ts superor performance [16] but t s actually memory and tme consumng. In some other work [2], Facal geometry analyss has been wdely appled on facal components to extract ther shapes and locatons. In mage sequences, the facal motons can be determned by stackng up the geometrcal dsplacement of facal feature ponts between the current frame and the prmary frame. Another method to recognze facal expresson s usng acton unts. Facal Acton Codng System (FACS) s a system to taxonomes human facal movements by ther appearance on the face. Due to subjectvty and tme consumpton ssues, FACS has been establshed as a computed automated system that detects faces n vdeos, extracts the geometrcal features of the faces, and then produces temporal profles of each facal movement. Recently, Valstar and Pantc [17] presented a fully automatc AU detecton system that can automatcally concentrate on facal ponts n the very frst frame and dentfy AU temporal segments usng a proper subset of most erudte spatotemporal features selected by AdaBoost. However, the geometrc feature-based method generally requres precse facal feature detecton and a trustworthy trackng, whch s dffcult to embed n many stuatons. Recently Local Bnary Patterns have been ntroduced as effectve appearance features for facal mage analyss. In ths work we are gong to compare our novel methods by dfferent LBP-based methods because LBP has rased great methods for ts smple calculaton process and stronger antnterference ablty n comparson wth the methods mentoned n ths part. Several methods have been used to classfy facal expressons such as Support Vector Machne (SVM) [18] and Bayesan Network (BN) [19]. There have been several ntentons to track and recognze facal expressons based on optcal flow analyss. Tan, Kanade and Cohn presented a Neural Network to recognze facal acton unts n mage sequences [2]. Hdden Markov Models (HMMs) have been generally used to model the temporal behavors of facal expressons from mage sequences [21]. In 21, Schmdt, Schels and Scheweker proposed an HMM classfer for facal expresson recognton n mage sequences [22]. But HMMs can t deal wth dependences n observaton. Recently Le An, Kafa and Bhanu propose a Dynamc Bayesan Network (DBN) to merge the nformaton from dfferent cameras as well as the temporal ndcaton from frames n a vdeo sequence [23]. The advantage of usng DBN s that f features from one camera were not extracted due to mage capture falure, ths nformaton can stll be nferred by DBN and, therefore, recognton may not fal. III. EXTENDED COHN-KANADE DATABASE (CK+) The Cohn-Kanade (CK) database was publshed for the dea of promotng research nto automatcally detectng person-ndependent facal expressons n 2 [24]. Snce then, the CK database has become one of the most commonly used datasets. But some lmtatons were obvous, such as lack of the Standard protocols for most of the databases of that tme. To solve such concerns, Kanade et al. presented the Extended Cohn-Kanade (CK+) database [5]. www.jascse.org Page 8

Fgure 2. The basc LBP operator Partcpants were 18 to 5 years of age, 69% female, 81%, Euro-Amercan, 13% Afro-Amercan, and 6% other groups. The number of sequences s ncreased by 22% and the number of subjects by 27%. The vdeo fles are 593 sequences across 123 subjects. All sequences are from the neutral face to the peak expresson and they represent one of the basc moods whch are Anger, Contempt, Dsgust, Fear, Happy, Sadness and Surprse. The target expresson for each sequence s fully FACS coded and emoton labels have been modfed and accredted. In addton to ths, non-posed sequences for varous types of smles and ther metadata have been ncluded. Examples of facal expressons n CK+ database are gven n Fg. 1. IV. FACIAL EXPRESSION ANALYSIS BY SINGLE IMAGE PROCESSING One of the most common and effectve methods n pattern recognton s usng a smple algorthm by the name of Local Bnary Pattern. Ths operator tags the pxels of a gray-scale mage by gettng done over a 3 3 neghborhood of each pxel wth a central value and brng up a result as a bnary number (refer to Fg. 2).Then a 256-bn hstogram of the LBP labels would be computed [1].The lmtaton of the basc LBP operator s ts small 3 3 neghborhood (9 gray values) whch can not capture premere features wth large scale nstructons. So t s mportant to crop the mages to select the face parts. The Operator LBP (P, R) produces 2 P dfferent values as outputs, correspondng to the 2 P dfferent bnary patterns that can be fgured out by the P pxels n the neghbor area. Usng crcular neghborhoods, allows us to have any radus and number of pxels n the neghborhood. Therefore, there are dfferent knds of extended LBP whch encompass several values for R and P. LBP x, y S( g g )2 7 c c c k 1 1 x S x x Where g c denotes the ntensty value of the center pxel (x c, y c ), g ( =, 1,, 7) are the gray values of the surroundng eght pxels. It should be mentoned that to consder the local nformaton of face components, mages wll be dvded nto small regons (Fg. 3) and a hstogram wll be computed for each area. Then all the hstograms wll be concatenated [6]. The LBP hstogram contans nformaton about the dstrbuton of the local mcro-patterns, k Fgure 3. A face mages s dvded nto small regons and a hstogram s computed for each area. Then all the hstograms are concatenated. such as edges, spots and flat areas, over the whole mage, so can be used to statstcally descrbe mage characterstcs [6]. Face mages can be seen as the combnatons of mcropatterns whch can be effectvely demonstrated by the LBP hstograms. Referrng Fg. 4, you wll notce the mcropatterns whch are shown by a central pxel and eght neghborng ponts. An LBP hstogram computed over the whole face porton of an mage, encodes only the ncdences of the mcro-patterns wthout any hnt about ther locatons. As a matter of fact, the smple LBP can not dentfy the rotaton of the black and whte crcles n several mcropatterns. By rotatng the neghborng ponts around the central pxel, the LBP code of the mcro-pattern wll reman the same. To solve ths problem, we are gong to use an algorthm by the name of Local Drectonal Bnary Pattern (LDBP), whch s ntroduced n [9] n 214. Ths code has much better performance n comparson wth the smple LBP. Another mportant subject to dscuss s that, most of the facal expresson recognton algorthms are based on the sngle mage analyss. The operaton of these methods outstandngly depends on llumnaton varatons (gray-scale changes), head rotatons or translaton. Therefore, the vdeo-based facal expresson recognton has attracted much attenton n recent years. Ths knd of recognton utlzes the nformaton of all vdeo frames and provdes more robustness aganst the problems mentoned above. Generally, the vdeo-based analyss, results a hgher recognton rate. The dfference between a sngle mage and vdeo frames s that, the vdeo s extended n a spatotemporal doman. So the moton and appearance are combned. Hence the man contrbuton of ths paper s to develop the effectve LDBP algorthm to utlze t n vdeo processng. The next two parts are about the novel algorthms we appled on vdeos of CK+ Database. V. VOLUME LOCAL DIRECTIONAL BINARY PATTERN (VLDBP) As you notced n prevous sectons, we have to ntroduce an effectve algorthm whch can process the facal expressons by hgh accuracy rate. For ths purpose, we extend the Local Drectonal Bnary Pattern [9], and we reach to a novel method whch has an acceptable recognton rate. A VLDBP hstogram computed over the whole face mage of a vdeo frame, encodes the ncdences of the mcropatterns wth a hnt about the rotaton of neghborng ponts www.jascse.org Page 9

Fgure 4. Examples of mcro patterns (Whte crcles represent ones & black crcles represent zeroes) around the central pxel. VLDBP s a bnary code whch s assgned to each pxel of an nput face mage. The pattern s calculated by comparng the relatve edge values of a pxel (m ( =, 1... 7)) n dfferent drectons. So, by consderng the drectons, the pattern can detect the rotatons n mcropatterns. We calculate eght drectonal edge response value of a partcular pxel usng Krsch templates whch are ntroduced n Fg. 5 n eght dfferent orentatons (m ~m 7 ). It s obvous that the matrx m 2 s obtaned by 9 degree rotatng of m. Fgure 6 shows the complete computng procedure for VLDBP (1, 4, 1). As we don t know the moton drecton, we also select the neghborng ponts n a crcle and not only n a drect lne. We begn by makng four non-overlappng patches on each sequence and samplng eght neghborng ponts for each pxel n all frames of the volume. Then, n element-byelement form, we multply the samplng ponts and the Krsch matrces. Then we sum them up to form the Krsch values. It should be mentoned that, the central value s the medan (or mean) of the other eght Krsch values. In these matrces, we select the four neghborng ponts, and then every four ponts n the Krsch values would be compared wth the value of the central pxel of the mddle frame to get a bnary value. Fnally, we produce the VLDBP code by multplyng the bnary values wth weghts gven to the correspondng pxel and we sum up the results. Fnally we transform the bnary number nto decmal to menton the value of VLDBP (L, P, R). It should be mentoned that P s the number of local neghborng ponts n a crcle by radus R, around the central pxel n one frame. And actually the L s the tme nterval. In equaton (2), K s the Krsch value n prevous, mddle and posteror frames. These values are determned by colored cells. And Kc s the central value of the mddle frame, whch demonstrates the Krsch values n fgure 6. 1 3P 1 VLDBP( L, P, R) S( K Kc) 2 S x x x Let s magne an X Y T vdeo volume n whch: (x c Є{,., X c -1}, y c Є{,., Y c -1}, tc Є{,., T c -1}). For calculatng VLDBP (L, P, R) feature for ths volume, the central part s only debated, because an adequately wde neghborhood can t be used on the borders n ths 3D space. The VLDBP code s calculated for each pxel n every patch of the volume, and the dstrbuton of the code s used as a feature vector, whch s demonstrated by D. The jont dstrbuton of the gray levels s denoted as v. Now we reach to a 256 bns hstogram and t must be normalzed to get a coherent explanaton. VLDBP combnes the movement and appearance to descrbe vdeos. Fgure 5. Krsch operator template www.jascse.org Page 1

Fgure 6. Procedure of VLDBP (1, 4, 1) VI. D v( LDBP ( x, y, t)) L, P, R x R, X 1 R y R, Y 1 R t L, T 1 L LOCAL DIRECTIONAL BINARY PATTERN FROM THREE ORTHOGONAL PLANES In the proposed VLDBP, the parameter P specfes the number of features. A large P produces a long hstogram. On the other hand a small P makes the feature vector shorter but also causes losng more nformaton. As we mentoned before, the number of patterns for VLDBP s 2 3P+2. So when the number of neghborng ponts ncreases, the number of patterns wll become very large, Due to ths rapd growth, t s tough to extend VLDBP to have a large number of neghborng ponts and ths restrcts ts applcatons. Lkewse, the neghborng frames wth a tme nterval less than L wll be dropped. To solve these problems, we offer a smple method by concatenatng LDBP on three orthogonal planes: XY, XT, and YT, dscussng only the co-occurrences n these three planes (Shown n Fgure 7(a)). www.jascse.org Page 11

Fgure 7. (a) Three orthogonal planes to extract neghborng ponts. (b) LDBP hstogram from each plane. (c) Concatenated feature hstogram The XY plane ncludes the space nformaton, whereas the XT and YT planes provde nformaton about the spacetme transtons. Wth ths approach, the number of bns s only 3 2 p, much smaller than 2 3P+2, as shown n fgure 6, whch provdes a smpler propagaton to many neghborng ponts and also decreases the complexty of computatons. There are two prncpal dfferences between VLDBP and LDBP-TOP. Frst, the VLDBP uses three parallel frames of whch only the mddle one contans the central pxel. On the other hand, the LDBP-TOP uses three orthogonal planes that cross n the central pxel. Second, VLDBP consders the cooccurrences of all neghborng ponts from three parallel frames, whch causes to a long feature vector. LDBP-TOP collects the nformaton from three separate planes and then concatenates them together (Shown n Fg. 9). So the feature vector would be much shorter when the number of neghborng ponts grows and the number of bns would be reasonable. Actually, the rad n axes X, Y, and T and the number of neghborng ponts n the XY, XT, and YT planes can be dfferent, whch can be marked as R X, R Y, and R T. But we consder the R=1 for all planes. Compared wth VLDBP, not all the nformaton of the vdeo (mage volume), but only the features from three planes are exerted. Fgure 7(a) llustrates three orthogonal planes. Fgure 7(b) shows the mage hstogram n the XY, XT and YT planes, The XY plane only contans the nformaton about the space, whereas the XT plane, gave the percepton of one row changng n tme. On the other hand the YT plane explans the movement of one column n temporal space. The LDBP- TOP code s extracted from the XY, XT, and YT planes, whch are defned as XY _ LDBP, XT _ LDBP, and YT _ LDBP for all pxels and then they would be concatenated nto a sngle hstogram, as you notce n fgure7(c). As you notced n Fg. 8, to extract the LDBP-TOP feature, we begn wth makng an mage volume. In ths paper, we contnue our survey by makng four nonoverlappng patches on each sequence. Then the LDBP features are extracted for each vdeo on three orthogonal planes, whch nclude the space nformaton n XY plane and the spatotemporal data on XT and YT planes. As you know, for extractng the LDBP feature [9], we begn by samplng eght neghborng ponts for each pxel n all frames of the volume and then, n element-by-element form, we multply the samplng ponts and the Krsch matrces. Then we sum them up to form the Krsch values. The Krsch values are gong to be compared wth the central value (medan or mean of the eght Krsch values) of the correspondng frame. We do ths procedure n every three planes of the volume. Let s magne an X Y T vdeo volume n whch: (xc Є{,., Xc-1}, yc Є{,., Yc-1}, tc Є{,., Tc-1}). In calculatng LDBP _ TOP PXY ;PXT ;PYT ;RX;RY ;RT feature for ths volume, the central part s only debated, because an adequately wde neghborhood can t be used on the borders n ths 3D space. The hstogram of the volume can be explaned as: H I{ f ( x, y, t) }, j x, y, t,, nj 1 ; j,1,2 In whch n j s the number of dfferent labels produced by the LDBP operator n the jth plane (j=: XY, 1: XT, 2: YT), expresses the LDBP code of central pxel (x,y,t) n the jth 1 f Astrue plane and I A f As false Fgure 8. Facal expresson representaton- Makng patches and extract features n each block volume. www.jascse.org Page 12

TABLE I. COMPARISON BETWEEN SEVERAL LBP-BASED ALGORITHMS AND OUR NOVEL METHODS; USING THE DIFFERENT VALUES OF K methods Recognton rate for dfferent values of K K=1 K=3 K=5 K=1 K=15 K=18 LDBP-TOP 74.3 75.2 79.2 81.6 81. 81.9 VLDBP 73.2 75.1 77.2 8.7 81.1 8.9 CLBP-TOP 72.2 74.4 76.8 8. 8.2 8.3 EVLBP 7.9 71.3 77.2 79.2 79.8 78.2 LBP-TOP 68.7 68.9 71.4 74. 72.6 71. VLBP 65.4 66.8 7.3 72.1 72.8 73.3 TABLE II. CONFUSION MATRIX OF 7-CLASS FACIAL EXPRESSION RECOGNITION; USING VLDBP FEATURES AND K-NEAREST NEIGHBOR CLASSIFIER (K=1) (%) Surprse Happness Sadness Fear Dsgust Contempt Angry Surprse 85.1 14.9 Happness 8.2 6.1 2.9 3.5 7.3 Sadness 6.8 2.7 72.9 4.1 13.5 Fear 1.2 23.2 2.8 65.1 4.2 3.5 Dsgust 2.7 14.5 79.3 3.5 Contempt 19.3 3.8 1.4 7.7 2.8 Angry 14.3 6.2 4.5 75.5 TABLE III. CONFUSION MATRIX OF 7-CLASS FACIAL EXPRESSION RECOGNITION; USING LDBP-TOP FEATURES AND K-NEAREST NEIGHBOR CLASSIFIER (K=1) (%) Surprse Happness Sadness Fear Dsgust Contempt Angry Surprse 86.9 13.1 Happness 83.8 5.1 2.6 3.7 4.8 Sadness 7.1 2.4 74.6 3.8 12.1 Fear 21.5 2.9 68. 4.8 2.8 Dsgust 17.6 82.4 Contempt 16.3 6.5 73.1 4.1 Angry 13.2 4.6 3.6 78.6 The hstograms must be normalzed to get a coherent explanaton: H, j N, j n j 1 H k k, j In ths hstogram, a descrpton of vdeo s effectvely establshed based on LDBP from three orthogonal planes. The features from the XY plane nclude nformaton about the appearance, and, feature from the XT and YT planes that contan co-occurrence of movement n horzontal and vertcal drectons. These three hstograms are www.jascse.org Page 13

concatenated to form a global explanaton of a vdeo volume wth the spatotemporal features. VII. VLDBP & LDBP-TOP DESCRIPTORS FOR FACIAL VIDEO ANALYSIS In ths secton, we accomplsh a person-ndependent facal expresson recognton usng LDBP features. Frst, the face portons of all mages n a vdeo volume were detected by Vola-jones algorthm and then they were cropped. Then we dvded them nto small regons from whch VLDBP and LDBP-TOP hstograms were extracted and concatenated nto a sngle, spatotemporally enhanced feature hstogram. Some parameters can be optmzed for better feature extracton. For example, one of them s the number of patches or dvded regons. We selected the 256-bn VLDBP(L,P,R) = VLDBP(1,4,1) and LDBP _ TOP PXY ;PXT ;PYT ;RX;RY ;RT =LDBP-TOP 4;4;4;1;1;1 operators. After makng a vdeo volume, we dvded the whole face mages nto 2 2 pxels regons, wth four 1 1 patches, gvng a good trade-off between recognton rate and feature vector length. Accordng to FACS codes folders of the Extended Cohn- Kanade dataset, we dvded the whole database to seven dfferent moods whch are: Anger, Contempt, Dsgust, Fear, Happy, Sadness and Surprse. After extractng all feature vectors of the whole vdeos of database, a K-nearest neghbor classfer s used to match the nput vdeo wth the closest vdeo volumes and classfes the nput vdeo to n the related group based on the smlarty wth each mood. Followng [25], we also selected the Chsquare statstc (χ2) as the dssmlarty crteron for hstograms, where S and M are two VLDBP or LDBP-TOP hstograms as mentoned n equaton (6). The parameter K n K-nearest neghbor classfcaton demonstrates the number of the vdeos whch are the closest ones to the nput vdeo and are gong to compare wth t. 2 SM, The best choce of K depends upon the data; generally, larger values of K reduce the effect of nose on the classfcaton, but make boundares between classes less dstnct. The specal case where the class s predcted to be the class of the closest tranng sample (K=1), s called the nearest neghbor algorthm. Ths classfcaton acheved an excellent performance n ths 7-class task for both VLDBP and LDBP-TOP algorthms. We compared the results wth that reported n [1], [7], [8], where the authors used VLBP, LBP-TOP, CLBP and EVLBP. As we can not drectly compare these several methods, we selected the same values for common parameters and the same preprocessng steps for all methods. All together, the comparson n Table I, demonstrates that our novel methods usng VLDBP and LDBP-TOP features provde the better performances confuson matrx of ths 7- class recognton s demonstrated n Table II and Table III, S S M M 2 whle usng K=1. We can observe that some moods can be recognzed wth a hgh accuracy, but on the other hand, some of them are easly confused whle recognzng. The par sadness and anger s dffcult to recognze even for a human bengs. The dstncton between happness and contempt faled. Because these expressons have a smlar mouth moton. VIII. CONCLISION A newfound approach to facal expresson recognton was presented. A Volume LDBP method was developed to merge the moton and appearance together. A smpler LDBP-TOP operator based on concatenated LDBP hstograms calculated from three orthogonal planes was also offered, makng t easer to extract co-occurrence features from a larger number of neghborng ponts. Experments on Extended Cohn-Kanade database wth a comparson to the other methods results prove that our method s effcent for facal expresson recognton. By usng the K-nearest neghbor Classfer and selectng the K=1, we reach to the rates of 8.7% usng VLDBP and 81.6% usng LDBP-TOP. We used these vdeo-base methods because not only ths knd of recognton s computatonally smple but also,utlzes the nformaton of all vdeo frames and provdes more robustness aganst the problems such as llumnaton varatons (gray-scale changes), head rotatons or translaton. Generally, n comparson wth the sngle mage processng, the vdeobased analyss results a hgher recognton rate and guarantees a promsng outcome for real-world applcatons. The results ganed from VLDBP and LDBP-TOP are better than those establshed n earler studes. Furthermore; no gray-scale normalzaton s needed due to applyng our descrptors to the face mages. ACKNOWLEDGMENT The authors would lke to thank Professors Jeffery F. Cohn and Takeo Kanade for use of the Extended Cohn Kanade facal expresson database. And we apprecate the helpful comments and suggestons of the anonymous revewers. REFERENCES [1] G. Zhao, M. Petkänen, Dynamc Texture Recognton Usng Local Bnary Patterns wth an Applcaton to Facal Expressons, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 29, June 27, pp. 915-928, do:1.119/tpami.27.111. [2] M. Pantc, I. Patras, Dynamcs of Facal Actons and Ther Temporal Segments from Face Profle Image Sequences, Systems, Man, and Cybernetcs, Part B: Cybernetcs, IEEE Transactons, vol. 36, Apr. 26, pp. 433-449, do:1.119/tsmcb.25.85975. [3] X. Lu, A. Jan, Deformaton Modelng for Robust 3D Face Matchng, Pattern Analyss and Machne Intellgence, IEEE Transactons, vol. 3, Aug. 28, pp. 1346-1357, do:1.119/tpami.27.7784. [4] Y. Tan, T. Kanade, J. Cohn, Handbook of Face Recognton, Sprnger, 25, pp. 247-277. [5] P. Lucey, J. F. Cohn, T. Kanade, J. Saragh, Z. Ambadar, The Extended Cohn-Kanade Dataset (CK+), a Complete Dataset for www.jascse.org Page 14

Acton Unt and Emoton-specfed Expresson, Computer Vson and Pattern Recognton Workshops (CVPRW), 21 IEEE Computer Socety Conference, June 21, pp. 94-11, do:1.119/cvprw.21.5543262. [6] C. Shan, Sh. Gong, P. W. Mcowan, Facal Expresson Recognton Based on Local Bnary Patterns: A Comprehensve Study, Image and Vson Computng on Elsever, vol. 27, May 29, pp. 83-816, do:1.116/j.mavs.28.8.5. [7] Zh. Guo, L. Zhang, D. Zhang, A Complete Modelng of Local Bnary Pattern Operator for Texture Classfcaton, Image Processng, IEEE Transactons, vol. 19, Mar. 21, pp. 1657-1663, do:1.119/tip.21.244957. [8] A. Hadd, M. Petkänen, Learnng Personal Specfc Facal Dynamcs for Face Recognton from Vdeos, Sprnger Berln Hedelberg, Analyss and Modelng of Faces and Gestures, Lecture Notes n Computer Scence, vol. 4778, 27, pp. 1-15, do:1.17/978-3-54-7569-3_1. [9] Y. Wang, G. HE, Expresson Recognton Algorthm Based on Local Drectonal Bnary Pattern, Green Computng and Communcatons (GreenCom), 213 IEEE and Internet of Thngs (Thngs/CPSCom), IEEE Internatonal Conference on and IEEE Cyber, Physcal and Socal Computng, Aug. 213, pp. 1458-1462, do:1.119/greencom-thngs-cpscom.213.257. [1] D. C. He, L. Wang, Texture Unt, Texture Spectrum and Texture Analyss, Geoscence and Remote Sensng, IEEE Transactons on, vol. 28, July 199, pp. 59-512, do:1.119/tgrs.199.572934. [11] T. Cover, P. Hart, Nearest neghbor pattern classfcaton, Informaton Theory, IEEE Transactons on, vol. 13, Jan. 1967, pp. 21-27, do:1.119/tit.1967.153964. [12] S. Zhang, X. Zhao, B. le, Facal Expresson Recognton Usng Sparse Representaton, WSEAS Transactons on Systems, vol. 11, 212, pp. 44-452. [13] S. Moore, R. Bowden, Local Bnary Pattern for Mult-vew Facal Expresson Recognton, Computer Vson and Image Understandng on Elsever, vol. 115, Apr. 211, pp. 541-558, do:1.116/j.cvu.21.12.1. [14] P. N. Belhumeur,J. P. Hespanha, D. J Kregman, Egen faces vs. fsher faces: Recognton Usng Class Specfc Lnear Projecton, Pattern Analyss and Machne Intellgence, IEEE Transactons on, vol. 19, July 1997, pp. 711-72, do:1.119/34.598228. [15] M. J. Lyons, J. Budynek, S. Akamatsu, Automatc Classfcaton of Sngle Facal Images, Pattern Analyss and Machne Intellgence, IEEE Transactons on, vol: 21, Dec. 1999, pp. 1357-1362, do:1.119/34.817413. [16] M. S. Bartlett, G. Lttlewort, M. Frank, C. Lanscsek, I. Fasel, J. Movellan, Recognton Facal Expresson: Machne Learnng and Applcaton to Spontaneous Behavor, Computer Vson and Pattern Recognton, 25. CVPR 25. IEEE Computer Socety Conference on, vol. 2, June 25, pp.568-573, do:1.119/cvpr.25.297. [17] M. Valstar, M. Pantc, Fully Automatc Facal Acton Unt Detecton and Temporal Analyss, IEEE Computer Vson and Pattern Recognton Workshop, CVPRW '6. Conference on, June 26, pp. 149, do:1.119/cvprw.26.85. [18] Y. La, Y. Zhang, Usng SVM to Desgn Facal Expresson Recognton for Shape and Texture Features, Machne Learnng and Cybernetcs (ICMLC), 21 Internatonal Conference on, vol. 5, July 21, pp. 2697-274, do:1.119/icmlc.21.558938. [19] C. Shan, Sh. Gong, P.W. Mcowan, Dynamc Facal Expresson Recognton Usng a Bayesan Temporal Manfold Model, Brtsh Machne Vson Conference, 26. [2] Y. Tan, T. Kanade, J. Cohn, Recognzng Acton Unts for Facal Expresson Analyss, Pattern Analyss and Machne Intellgence, IEEE Transactons on, vol. 23, Feb. 21, pp. 97-115, do:1.119/34.98962. [21] M. Yeasn, B. Bullot, R. Sharma, Facal Expresson to Level of Interests: A Spatotemporal Approach, Computer Vson and Pattern Recognton, 24. CVPR 24. Proceedngs of the 24 IEEE Computer Socety Conference on, vol. 2, June 24, pp. II- 922 - II-927, do:1.119/cvpr.24.1315264. [22] M. Schemdt, M. Schels, F. Scheweker, A Hdden Markov Model Based Approach for Facal Expresson Recognton n Image Sequences, Artfcal Neural Networks n Pattern Recognton Lecture notes n Computer Scence, vol. 5998, 21, pp. 149-16, do:1.17/978-3-642-12159-3_14. [23] L. An, M. Kafa, B. Bhanu, Dynamc Bayesan Network for Unconstraned Face Recognton n Survellance Camera Networks, Emergng and Selected Topcs n Crcuts and Systems, IEEE Journal on, vol. 3, Apr. 213, pp. 155-164, do:1.119/jetcas.213.2256752. [24] T. Kanade, J. F. Cohn, Y. Tan, Comprehensve Database for Facal Expresson Analyss, Automatc Face and Gesture Recognton, 2. Proceedngs. Fourth IEEE Internatonal Conference on, Mar. 2, pp. 46-53, do:1.119/afgr.2.84611. [25] T. Ahonen, A. Hadd, M. Petkänen, Face Recognton Wth Local Bnary Patterns, Computer Vson - ECCV 24 Lecture Notes n Computer Scence, vol. 321, May 24, pp. 469-481, do:1.17/978-3-54-2467-1_36. www.jascse.org Page 15