Emotion Classification

Similar documents
Appearance Manifold of Facial Expression

Dimension Reduction CS534

Face Recognition using Laplacianfaces

Cross-pose Facial Expression Recognition

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Facial Expression Recognition Using Non-negative Matrix Factorization

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse

Facial Expression Detection Using Implemented (PCA) Algorithm

Facial Expression Recognition using Principal Component Analysis with Singular Value Decomposition

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Locality Preserving Projections (LPP) Abstract

A Real Time Facial Expression Classification System Using Local Binary Patterns

Locality Preserving Projections (LPP) Abstract

Facial Expression Recognition with Emotion-Based Feature Fusion

The Analysis of Parameters t and k of LPP on Several Famous Face Databases

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

FACE RECOGNITION USING SUPPORT VECTOR MACHINES

Laplacian Faces: A Face Recognition Tool

Toward Part-based Document Image Decoding

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Contextual priming for artificial visual perception

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Facial Expression Recognition

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

A Keypoint Descriptor Inspired by Retinal Computation

Does the Brain do Inverse Graphics?

CS6716 Pattern Recognition

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

Data fusion and multi-cue data matching using diffusion maps

Classification of Face Images for Gender, Age, Facial Expression, and Identity 1

A Taxonomy of Semi-Supervised Learning Algorithms

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Multi-output Laplacian Dynamic Ordinal Regression for Facial Expression Recognition and Intensity Estimation

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Large-Scale Face Manifold Learning

Facial Expression Classification with Random Filters Feature Extraction

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature

What will we learn? Neighborhood processing. Convolution and correlation. Neighborhood processing. Chapter 10 Neighborhood Processing

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

Tracking system. Danica Kragic. Object Recognition & Model Based Tracking

CS 231A Computer Vision (Autumn 2012) Problem Set 1

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

FACE RECOGNITION FROM A SINGLE SAMPLE USING RLOG FILTER AND MANIFOLD ANALYSIS

Diagonal Principal Component Analysis for Face Recognition

Lecture 19: Generative Adversarial Networks

CS143 Introduction to Computer Vision Homework assignment 1.

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

Face Recognition for Different Facial Expressions Using Principal Component analysis

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Face Recognition for Mobile Devices

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Head Frontal-View Identification Using Extended LLE

Lecture 10 CNNs on Graphs

On-line handwriting recognition using Chain Code representation

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Supervised texture detection in images

Image Processing. Image Features

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Breaking it Down: The World as Legos Benjamin Savage, Eric Chu

Applications Video Surveillance (On-line or off-line)

Robotics Programming Laboratory

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Subspace Clustering. Weiwei Feng. December 11, 2015

Community Detection. Community

Manifold Alignment. Chang Wang, Peter Krafft, and Sridhar Mahadevan

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Function approximation using RBF network. 10 basis functions and 25 data points.

2D Image Processing Feature Descriptors

Abayesian approach to human activity recognition

Panoramic Image Stitching

SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION

Celebrity Identification and Recognition in Videos. An application of semi-supervised learning and multiclass classification

Enhanced Facial Expression Recognition using 2DPCA Principal component Analysis and Gabor Wavelets.

CS 195-5: Machine Learning Problem Set 5

Semi-supervised Data Representation via Affinity Graph Learning

Expectation Maximization (EM) and Gaussian Mixture Models

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Unsupervised learning in Vision

Selecting Models from Videos for Appearance-Based Face Recognition

Edge and corner detection

Lecture Topic Projects

Transcription:

Emotion Classification Shai Savir 038052395 Gil Sadeh 026511469 1. Abstract Automated facial expression recognition has received increased attention over the past two decades. Facial expressions convey non-verbal cues, which play an important role in interpersonal relations. The objective of this project is to demonstrate the feasibility of a camera-based application to recognize, in real-time, a face in an image and to analyze the facial emotional expression. Potential outcomes could be a tool for blind and autistic people who are not able, or have difficulties in recognizing emotions based on facial expressions, it will give them feedback on what the person in front of them is feeling in real time. It could also be used as an instructional tool which can train people with emotional recognition disorders and improve their ability to recognize and express emotional expressions. 2. Introduction 2.1. Motivation Face recognition is a task that humans perform routinely and effortlessly in their daily lives. Face recognition is a visual pattern recognition problem where a three-dimensional object is to be identified based on its two-dimensional image. In recent years, significant progress has been made in this area. Thus the research is now focusing more on facial expression recognition. The motivation to focus on facial expressions, come from many aspects of our day to day lives: Nonverbal information prevails over words themselves in human communication. Ubiquitous and universal use of computational systems, requires improved humancomputer interaction. Humanize computers. More human-like human-computer and human robot interaction. Treatment for people with psycho-affective illnesses (autism). Helping the blind experience what people with eyesight take for granted. 2.2. Project Purpose & Possible Applications Our project s goal is to demonstrate the feasibility of a camera-based application to recognize, in real-time, a face in an image and to analyze the facial emotional expression. We tried achieving this goal by using new techniques discussed at [1] and attempting to improve them. Possible Applications could be a tool for blind and autistic people who are not able, or have difficulties in recognizing emotions based on facial expressions, it will give them feedback on

what the person in front of them is feeling in real time. It could also be used as an instructional tool which can train people with emotional recognition disorders and improve their ability to recognize and express emotional expressions. 2.3. Locality preserving projection (LPP) The LPP method, as described in [3], is a way to embed image sequences of facial expression from the high dimensional appearance feature space to a low dimensional manifold. The goal of creating such a manifold is to discover a latent space in which topology of the input features x, sometimes also informed by labels of x, is preserved. Such data representation may be more discriminative and better suited for modeling of dynamic ordinal regression. A face image with N pixels can be considered as a point in the N-dimensional image space, and the variations of face images can be represented as low dimensional manifolds. It would be desired to analyze facial expressions in the low dimensional subspace rather than the ambient space. Given a set of N pixel images, we will find a transformation matrix that will map these points to ( ), such that represents, where. LPP is a linear approximation of the nonlinear Laplacian Eigenmap. It is achieved by the following algorithm: 1. Constructing an adjacency graph we will construct a node graph, with edges between nodes and if is close to. Closeness can be determined by Euclidian Norm, w.r.t parameter, or by considering the nearest neighbors, w.r.t parameter. 2. Weighting the edges is the weight matrix, where having the weight of the edge joining nodes and and if there is no edge joining them. There are two variations in choosing the weights: a., if nodes and are connected. b., if nodes and are connected (parameter ). 3. Eigenmaps we compute the eigenvectors and eigenvalues of. Where D is a diagonal matrix whose entries are column sums of W. is the Laplacian matrix. X is a matrix with columns. We then take eigenvectors and compute, where ( ) is the projection matrix. The basic idea behind this algorithm is trying to minimize the objective function ( ) which is the criterion we will use for choosing a good map. Further details can be found in [8]. The advantages of this method is that the mapping preserves location w.r.t the objective function, it is much easier to analyze facial expressions in the low dimensional subspace, and this method is linear and therefore fast and suitable for practical applications.

2.4. Supervised locality preserving projection (S-LPP) For a data set containing images of typical expressions from different subjects, as appearance varies a lot cross different subjects, there is significant overlapping among different expression classes. Therefore, the original LPP, which performs in an unsupervised manner, fails to embed the data set in low dimensional space in which different expression classes are well clustered. The Supervised Locality Preserving Projections (S-LPP) method, described in [4], solves this problem by not only preserving local structure, but also encoding class information in the process. The local neighborhood of a sample from class should be composed of samples belonging to class only. This can be achieved by increasing the distances between samples belonging to different classes by the following: ( ) ( ) ( ) Where ( ) is the distance between and, ( ), and ( ) if and belong to different classes, and 0 otherwise. is a parameter that determines the supervising extent. When we obtain unsupervised LPP. When we obtain fully supervised LPP where distances between samples from different classes will be larger than the maximum distance in the entire data set. Varying between 0 and 1 gives us a partially supervised LPP, this creates some separation between classes. By applying S-LPP to the data set of images of typical expressions, a subspace is derived, in which different expression classes are well clustered and separated. The subspace provides global coordinates for the manifolds of different subjects, which are aligned on one generalized manifold. Image sequences representing facial expressions from beginning to apex are mapped on the generalized manifold as the curves from the neutral faces to the cluster of the typical expressions. 3. Our Solution 3.1. Main idea The main idea of our solution was to form a large image database of several subjects displaying different emotions, focusing on three basic emotions: Happiness, Anger and Surprise. From each image we extract important features which contain information about the displayed emotion (we will elaborate on the features we decided to use later on) and form a feature vector for each image. After collecting a large database of feature vectors we will use a variation of the S-LPP method to obtain a projection that will both project our feature vectors into a lower dimension (important for real-time performance) and obtain the optimal separation between the vector features of the different classes (emotions).

After finding this optimal projection, we will calculate the mean and covariance matrix of each class, and under the assumption of Gaussian distribution we will achieve the normal probability density function (PDF) which is an approximation of the actual PDF of each class. After doing all the described above offline we can perform real-time face tracking and extract the features (the same way we did to collect our database) and project the feature vector using the optimal projection that we found. And since we managed to achieve a good separation between classes we will now calculate the PDF of each class in the location of the projected feature vector. After calculating the three PDF s we will normalize the values such that they will sum up to one to achieve the probability that the current frame is expressing each emotion. 3.2. Detailed description In our project we decided to use the Kinect camera because we believe that more accurate face tracking and facial feature extraction can be made with such cameras, thanks to the additional depth information they supply. We were also assisted by Microsoft Kinect Developer Toolkit which supplied us with a C++ algorithm for face tracking [2]. In our project we managed to sync between the C++ algorithm and matlab command window, which means we use the face tracking algorithm for real-time face tracking and then convert the information we need (such as x-y locations of important facial points) to matlab mex variables and send them to the matlab workspace. The rest of our project is implemented in matlab. The face tracking algorithm also finds 98 facial points (as shown in figure[1]), and returns their 2D location (x & y coordinates). We took these points to create a large feature vector (feature vector size 4950) of distances between every two points, thinking that the distances between points contain information about the emotion displayed. Figure 1: Tracked Points We built a database containing 4 subjects. Each subject was filmed for a short 10 seconds film for each emotion, and every film we sampled to achieve many images. For each image we extracted the 98 facial points locations and constructed the distances feature vector. After constructing the feature vectors of the entire database we needed to find the optimal projection that will best separate our 3 classes. We did so using a variation of the S-LPP method. We

divided the feature vectors into segments of size 100 (of the 4950 feature vectors) and performed the S-LPP method on them separately, and from each segment we took the two dimensions who gave the best separation. Doing so we got 49 projection matrices of size -. Now in order to project a feature vector we need to take each segment and multiply it by the corresponding projection matrix, and then by concatenating the projected products we receive a 98 sized projected feature vector. After building the projected database we calculated the mean and covariance matrix of each class, and this data in addition to the projection matrices is saved and loaded by the real-time algorithm. This algorithm takes the images and facial points from the FaceTrackingVisualization algorithm and creates the feature vector and projected feature vector of each image. Then we calculate the approximated PDF function at the projected feature point for every class, and after normalizing these three results we achieve the probability that the image belongs to each class. These results are normalized such that they will sum up to one. In order to smooth our results we also calculated distance from mean of each class, normalized the distances, and concluded the probability calculation as following: ( ) ( ) ( ) ( ( ) Where ( ) stands for the guassian probability density distribution of class i, with mean and covariance matrix that we extract from the projected database. - the projected feature vector Probability normalization factor. ( ) Distance normalization factor. - Steepness parameter ( ). 4. Experiments and results To test our results we ran the database on our algorithm and checked what our success probability is. The results of the hit miss percentage is summarize in the following table: Happy Angry Surprise Hit Percentage 98.9744 98.1061 95.4670 Miss Percentage 1.0256 1.8939 4.5330 We also viewed the separation of the different classes on different dimensions. The graphs below show the separation in 4 different dimensions:

0.2 0.08 0.15 0.06 0.1 0.04 0.05 0.02 0 0-0.05-0.02-0.1-0.04-0.15-0.06-0.2-0.1-0.08-0.06-0.04-0.02 0 0.02 0.04 0.06 0.08-0.08-0.2-0.15-0.1-0.05 0 0.05 0.1 0.15 0.2 The pictures below give an idea on how the system works in real time. The results obtained for the three emotions give a good separation and a good determination of the emotion. However we do have some points we still need to work on: Add more emotions. Dealing with three classes is significantly easier than dealing with a wider range of classes. We need to implement a function that will rotate the face in such a way that even if the face has some 3D shift we will be able to straighten it and this way our algorithm will be invariant to rotation of the face in a wider range. 5. Conclusions and Future Ideas We managed to achieve satisfactory results considering that our database is based on only 4 different people. In order to achieve better results we will need to make our algorithm invariant to facial rotations, to enlarge our database (significantly) and possibly improve our variation to the S-LPP method in order to achieve separation on larger number of classes. 6. References [1] C. Shan, S. Gong, and P.W. Mcowan. Appearance manifold of facial expression. Lecture Notes in Comp. Science, 3766:221 230, 2005. [2] Microsoft Developer Toolkit Browser 1.6 FaceTrackingVisualization.