Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature

Similar documents
DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse

Multiple Kernel Learning for Emotion Recognition in the Wild

IMAGE-BASED FACIAL EXPRESSION RECOGNITION USMAN TARIQ DISSERTATION. Urbana, Illinois

Exploring Bag of Words Architectures in the Facial Expression Domain

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

CS 231A CA Session: Problem Set 4 Review. Kevin Chen May 13, 2016

Classification of Face Images for Gender, Age, Facial Expression, and Identity 1

Facial expression recognition using shape and texture information

Real time facial expression recognition from image sequences using Support Vector Machines

Dynamic facial expression recognition using a behavioural model

An Associate-Predict Model for Face Recognition FIPA Seminar WS 2011/2012

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Tri-modal Human Body Segmentation

Facial Expression Recognition Using Non-negative Matrix Factorization

Sparse coding for image classification

A Real Time Facial Expression Classification System Using Local Binary Patterns

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

String distance for automatic image classification

Recognition of facial expressions in presence of partial occlusion

Using 3D Models to Recognize 2D Faces in the Wild

Facial-component-based Bag of Words and PHOG Descriptor for Facial Expression Recognition

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting

New approaches to pattern recognition and automated learning

Facial Expression Recognition Based on Local Directional Pattern Using SVM Decision-level Fusion

Generic Face Alignment Using an Improved Active Shape Model

Large-scale Video Classification with Convolutional Neural Networks

HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION

Recognizing Micro-Expressions & Spontaneous Expressions

CS 231A Computer Vision (Fall 2011) Problem Set 4

Action recognition in videos

arxiv: v3 [cs.cv] 3 Oct 2012

Object Classification Problem

An Algorithm based on SURF and LBP approach for Facial Expression Recognition

TA Section: Problem Set 4

AUTOMATIC VIDEO INDEXING

FACIAL EXPRESSION RECOGNITION USING ARTIFICIAL NEURAL NETWORKS

Face Recognition A Deep Learning Approach

Facial Expression Recognition with Emotion-Based Feature Fusion

Classification and Detection in Images. D.A. Forsyth

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li

Facial Emotion Recognition using Eye

TIED FACTOR ANALYSIS FOR FACE RECOGNITION ACROSS LARGE POSE DIFFERENCES

Learning to Recognize Faces in Realistic Conditions

Automatic Facial Expression Recognition based on the Salient Facial Patches

Cross-pose Facial Expression Recognition

Deep condolence to Professor Mark Everingham

COMPOUND LOCAL BINARY PATTERN (CLBP) FOR PERSON-INDEPENDENT FACIAL EXPRESSION RECOGNITION

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Stacked Denoising Autoencoders for Face Pose Normalization

Facial Expression Recognition using Principal Component Analysis with Singular Value Decomposition

FGnet - Facial Expression and. Emotion Database

Category-level localization

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Emotion Classification

Toward Retail Product Recognition on Grocery Shelves

Robotics Programming Laboratory

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Facial expression recognition is a key element in human communication.

Facial Expression Detection Using Implemented (PCA) Algorithm

Recognition of Non-symmetric Faces Using Principal Component Analysis

An efficient face recognition algorithm based on multi-kernel regularization learning

Robust Facial Expression Classification Using Shape and Appearance Features

Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction

Duck Efface: Decorating faces with duck bills aka Duckification

Facial Expression Classification with Random Filters Feature Extraction

Affine-invariant scene categorization

Analyzing the Relationship Between Head Pose and Gaze to Model Driver Visual Attention

Computationally Efficient Serial Combination of Rotation-invariant and Rotation Compensating Iris Recognition Algorithms

CS 223B Computer Vision Problem Set 3

ImageCLEF 2011

LBP Based Facial Expression Recognition Using k-nn Classifier

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Multi-Task Learning of Facial Landmarks and Expression

Face Recognition Markus Storer, 2007

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Bilevel Sparse Coding

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Personal style & NMF-based Exaggerative Expressions of Face. Seongah Chin, Chung-yeon Lee, Jaedong Lee Multimedia Department of Sungkyul University

An Approach for Reduction of Rain Streaks from a Single Image

arxiv: v1 [cs.lg] 20 Dec 2013

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Automated Canvas Analysis for Painting Conservation. By Brendan Tobin

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Chapter 3 Image Registration. Chapter 3 Image Registration

arxiv: v2 [cs.cv] 17 Nov 2014

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

2666 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12, DECEMBER A Dictionary Learning-Based 3D Morphable Shape Model

3D facial expression recognition using SIFT descriptors of automatically detected keypoints

Enhanced Facial Expression Recognition using 2DPCA Principal component Analysis and Gabor Wavelets.

Face Alignment Across Large Poses: A 3D Solution

Texture Features in Facial Image Analysis

Apparel Classifier and Recommender using Deep Learning

Transcription:

0/19.. Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature Usman Tariq, Jianchao Yang, Thomas S. Huang Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign {utariq2, jyang29, huang}@ifp.illinois.edu October 13, 2012

1/19. Outline. 1

1/19. Outline. 1

2/19. Motivation It is not trivial in real application to always have a frontal face A detailed analysis of the effect of large variations in pose (both pan and title angles) on expression recognition performance is needed, e.g. for positioning the cameras Bulk of the existing literature assumes frontal or near-frontal face view manually/automatically detected key points and/or presence of a neutral face

3/19. Highlights This work deals with single image multi-view facial expression recognition It works with translation invariant sparse coding to obtain features and then does linear classification This work, despite achieving state of the art results, does an extensive analysis of effect of variation in pan and tilt angles

4/19. Database The database used in this work is the publicly available BU3D-FE database. It has 3D face scan of a 100 subjects, each performing 6 expressions at 4 intensity levels Facial expressions in the database: anger (AN), disgust (DI), fear (FE), happy (HA), sad (SA), surprise (SU) (and neutral) Out of 100 subject, 56 are females. The dataset is quite diverse and contains subjects with various racial ancestries. Views with seven pan angles (0, ±15, ±30, ±45 ) and five tilt angles (0, ±15, ±30 ) are generated for each subject with each expression-intensity combination, resulting into a dataset with 84000 images.

. Database - expressions and intensity levels Face images with different expressions and intensities Level 1 Level 2 Level 3 Level 4 AN DI FE HA SA SU Facial Expressions Rendered facial images of a subject with various expressions and intensity levels. The intensity levels are labeled from 1 to 4 with 4 being the most intense. 5/19

. Database - pan and tilt angles Face images with different pan and tilt angles 30 Tilt angles 15 0 +15 +30 45 30 15 0 +15 +30 +45 Pan angles Rendered facial images of a subject with various pan and tilt angles. 6/19

6/19. Outline. 1

7/19. Translation Invariant Sparse Coding (ScSPM) Bag of Features (BoF) model Spatial Pyramid Matching (SPM) framework the given image is partitioned then the BoF histograms are extracted for each of those partitions these histograms are then concatenated together When we relax the SPM cardinality constraint by solving a lasso problem and instead of histogram representation, use max-pooling, we approach at the ScSPM setting. it is robust to translation misalignments, since it is computed in an SPM framework. it gives significant improvement over SPM

7/19. Translation Invariant Sparse Coding (ScSPM) Bag of Features (BoF) model Spatial Pyramid Matching (SPM) framework the given image is partitioned then the BoF histograms are extracted for each of those partitions these histograms are then concatenated together When we relax the SPM cardinality constraint by solving a lasso problem and instead of histogram representation, use max-pooling, we approach at the ScSPM setting. it is robust to translation misalignments, since it is computed in an SPM framework. it gives significant improvement over SPM

7/19. Translation Invariant Sparse Coding (ScSPM) Bag of Features (BoF) model Spatial Pyramid Matching (SPM) framework the given image is partitioned then the BoF histograms are extracted for each of those partitions these histograms are then concatenated together When we relax the SPM cardinality constraint by solving a lasso problem and instead of histogram representation, use max-pooling, we approach at the ScSPM setting. it is robust to translation misalignments, since it is computed in an SPM framework. it gives significant improvement over SPM

7/19. Translation Invariant Sparse Coding (ScSPM) Bag of Features (BoF) model Spatial Pyramid Matching (SPM) framework the given image is partitioned then the BoF histograms are extracted for each of those partitions these histograms are then concatenated together When we relax the SPM cardinality constraint by solving a lasso problem and instead of histogram representation, use max-pooling, we approach at the ScSPM setting. it is robust to translation misalignments, since it is computed in an SPM framework. it gives significant improvement over SPM

7/19. Translation Invariant Sparse Coding (ScSPM) Bag of Features (BoF) model Spatial Pyramid Matching (SPM) framework the given image is partitioned then the BoF histograms are extracted for each of those partitions these histograms are then concatenated together When we relax the SPM cardinality constraint by solving a lasso problem and instead of histogram representation, use max-pooling, we approach at the ScSPM setting. it is robust to translation misalignments, since it is computed in an SPM framework. it gives significant improvement over SPM

8/19. ScSPM - Learning the code-book Suppose X is a matrix whose columns are image features. Now let X = [x 1,...,x N ] R p N. We solve the following in an alternative iterative fashion to come up with V, N min W,V n=1 x n Vw n 2 2 + λ w n 1 subject to v k 2 1, k 1,2,...,K (1) Here, V = [v 1,...,v K ] R p K is the code-book or dictionary, W = [w 1,...,w N ] R K N, and λ is a regularization parameter, whose value controls the sparseness in the solution w n

9/19. ScSPM - Coding In the ScSPM framework, to extract image level features, firstly the patches are densely sampled from a given image Then some low-level features like SIFT, are extracted on these patches These features are then sparsely coded using the code-book V, by solving min w n x n Vw n 2 2 + λ w n 1

10/19. ScSPM - Pooling These sparse vectors are then pooled in an SPM framework z = Φ(W) for instance, z i = max{ w i1,..., w in } The resulting image level feature vectors are extracted by concatenation of the vectors obtained for various partitions/levels

11/19. Experimental Setup Dense SIFT features with 3 pixel shift are extracted V R 128 1024 is used for sparse coding in an SPM framework, followed by pooling Experiments are done in 5 subject independent fold cross-validation setting on the 84000 images extracted from the BU-3DFE database. A universal is adopted approach for classification Linear SVMs are used for single image expression recognition L2-regularized logistic regression is used to get probability estimates for fusion

11/19. Outline. 1

12/19. Confusion matrix Class confusion matrix for over-all recognition performance (69.1%) averaged over all poses and expression intensity levels Overall classification Predicted AN DI FE HA SA SU Ground Truth AN 64.2 8.4 4.1 2.2 18.1 3.1 DI 10.9 70.1 5.8 3.9 5.2 4.3 FE 7.5 9.5 51.1 13.7 9.5 8.7 HA 2.1 4.3 9.4 81.2 1.7 1.4 SA 19.6 5.2 7.2 2.3 63.4 2.3 SU 1.8 3.0 4.7 3.0 2.6 85.0

13/19. Recognition performance for different intensities 100 90 80 Average recognition rates for various expressions of different intensities Intensity 1 (min) Intensity 2 Intensity 3 Intensity 4 (max) Average Percentage Recognition Rate 70 60 50 40 30 20 10 0 AN DI FE HA SA SU Expression Labels Recognition performance for various expressions with different intensities

14/19. Performance vs pan and tilt angle variations (exp.) Average recognition rates vs pan angles for various expressions Average recognition rates vs tilt angles for various expressions 90 90 80 80 70 70 Percentage Recognition Rate 60 50 40 Percentage Recognition Rate 60 50 40 30 20 10 45 30 15 0 +15 +30 +45 AN DI FE HA SA SU Average 30 20 10 30 15 0 +15 +30 AN DI FE HA SA SU Average Pan Angles Tilt Angles Effect of change in pan angles on the recognition performance of various expressions Effect of change in tilt angles on the recognition performance of various expressions

15/19. Performance vs pan and tilt angle variations (int.) Average recognition rates vs pan angles for various expression intensties Average recognition rates vs tilt angles for various expression intensties 90 90 80 80 70 70 Percentage Recognition Rate 60 50 40 Percentage Recognition Rate 60 50 40 30 30 20 10 Intensity 1 (min) Intensity 2 Intensity 3 Intensity 4 (max) Average 45 30 15 0 +15 +30 +45 20 10 Intensity 1 (min) Intensity 2 Intensity 3 Intensity 4 (max) Average 30 15 0 +15 +30 Pan Angles Tilt Angles Effect of change in pan angles on the recognition performance of various expression intensity levels Effect of change in tilt angles on the recognition performance of various expression intensity levels

16/19. Performance vs simultaneous pan and tilt variations Average recognition rates for various combinations of pan and tilt angles 30 72 71 15 70 69 Tilt angles 0 68 67 +15 66 65 +30 64 45 30 15 0 +15 +30 +45 Pan angles Effect of change in both pan and tilt angles on the overall single image expression recognition performance.

17/19. Comparison with State-of-the-Art Zheng et al. and Tang et al. follow the same experimental setting by restricting themselves to only the strongest expression intensity level. Hence, their image dataset consists of 100 6 7 5 = 21000 images. To compare, we repeat the experiments in the same experimental setting Performance comparison with earlier works on strongest expression intensity in terms of the percentage recognition rates Zheng et al Tang et al. Ours 68.2% 75.3% 76.1%

18/19. Concluding Remarks Our work sets a new state-of-the-art for multi-view facial expression recognition on the BU3D-FE database Unlike many other works, our method neither requires any key point detection nor does it need a neutral face A significant analysis of variations in expression recognition with changes in a range of pan angles, tilt angles or both is done The most subtle expression are the most difficult to recognize We do not find any conclusive evidence that non-frontal views give significantly better performance than frontal view

18/19. Concluding Remarks Our work sets a new state-of-the-art for multi-view facial expression recognition on the BU3D-FE database Unlike many other works, our method neither requires any key point detection nor does it need a neutral face A significant analysis of variations in expression recognition with changes in a range of pan angles, tilt angles or both is done The most subtle expression are the most difficult to recognize We do not find any conclusive evidence that non-frontal views give significantly better performance than frontal view

18/19. Concluding Remarks Our work sets a new state-of-the-art for multi-view facial expression recognition on the BU3D-FE database Unlike many other works, our method neither requires any key point detection nor does it need a neutral face A significant analysis of variations in expression recognition with changes in a range of pan angles, tilt angles or both is done The most subtle expression are the most difficult to recognize We do not find any conclusive evidence that non-frontal views give significantly better performance than frontal view

18/19. Concluding Remarks Our work sets a new state-of-the-art for multi-view facial expression recognition on the BU3D-FE database Unlike many other works, our method neither requires any key point detection nor does it need a neutral face A significant analysis of variations in expression recognition with changes in a range of pan angles, tilt angles or both is done The most subtle expression are the most difficult to recognize We do not find any conclusive evidence that non-frontal views give significantly better performance than frontal view

18/19. Concluding Remarks Our work sets a new state-of-the-art for multi-view facial expression recognition on the BU3D-FE database Unlike many other works, our method neither requires any key point detection nor does it need a neutral face A significant analysis of variations in expression recognition with changes in a range of pan angles, tilt angles or both is done The most subtle expression are the most difficult to recognize We do not find any conclusive evidence that non-frontal views give significantly better performance than frontal view

19/19 The End