Towards automatic annotation of Sign Language dictionary corpora

Similar documents
Local Binary Pattern based features for Sign Language Recognition

Occluded Facial Expression Tracking

Automatic Modelling Image Represented Objects Using a Statistic Based Approach

Active Appearance Models

Adaptive Skin Color Classifier for Face Outline Models

Generic Face Alignment Using an Improved Active Shape Model

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Introduction to behavior-recognition and object tracking

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

A HYBRID APPROACH BASED ON PCA AND LBP FOR FACIAL EXPRESSION ANALYSIS

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

Facial Feature Extraction Based On FPD and GLCM Algorithms

Multiple-Person Tracking by Detection

Image Processing. Image Features

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV

3D Face and Hand Tracking for American Sign Language Recognition

Face Detection and Recognition in an Image Sequence using Eigenedginess

Real Time Face Detection using VJCMS

Elliptical Head Tracker using Intensity Gradients and Texture Histograms

Face analysis : identity vs. expressions

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Eye Detection by Haar wavelets and cascaded Support Vector Machine

A Study on Similarity Computations in Template Matching Technique for Identity Verification

Designing Applications that See Lecture 7: Object Recognition

Postprint.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Human Detection and Motion Tracking

Object detection using non-redundant local Binary Patterns

Face detection and recognition. Detection Recognition Sally

Categorization by Learning and Combining Object Parts

Face Objects Detection in still images using Viola-Jones Algorithm through MATLAB TOOLS

Detecting and Identifying Moving Objects in Real-Time

COMPARING SIMULATION OF SANDWICH MATERIAL IN COMPARE WITH EXPERIMENT SVOČ FST 2016

Face detection, validation and tracking. Océane Esposito, Grazina Laurinaviciute, Alexandre Majetniak

Facial Feature Detection

Image and Multidimensional Signal Processing

Feature Detection and Tracking with Constrained Local Models

Face/Flesh Detection and Face Recognition

Visual Tracking. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Introduction to Medical Imaging (5XSA0)

Eye and Mouth Openness Estimation in Sign Language and News Broadcast Videos

Operators-Based on Second Derivative double derivative Laplacian operator Laplacian Operator Laplacian Of Gaussian (LOG) Operator LOG

Face Recognition for Mobile Devices

FACE detection and tracking find applications in areas like

A Multi-Stage Approach to Facial Feature Detection

Object recognition (part 1)

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

Image Based Feature Extraction Technique For Multiple Face Detection and Recognition in Color Images

ELL 788 Computational Perception & Cognition July November 2015

Automatic Initialization of the TLD Object Tracker: Milestone Update

Combining Audio and Video for Detection of Spontaneous Emotions

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS

Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Robust Face Alignment Based on Hierarchical Classifier Network

Face Detection CUDA Accelerating

Defining a Better Vehicle Trajectory With GMM

Online multi-object tracking by detection based on generative appearance models

Robust Shape Retrieval Using Maximum Likelihood Theory

Detecting and Segmenting Humans in Crowded Scenes

Local Image Features

Motion Tracking and Event Understanding in Video Sequences

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT

Segmentation

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Sage: Symmetry and Asymmetry in Geometric Data Version 1.21 (compiled 03/1114)

Video analysis of human dynamics a survey

Region-based Segmentation

An Introduction to Content Based Image Retrieval

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Disguised Face Identification Based Gabor Feature and SVM Classifier

ECG782: Multidimensional Digital Signal Processing

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi

One image is worth 1,000 words

Face Analysis using Curve Edge Maps

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

Face Alignment Under Various Poses and Expressions

Partial Face Recognition

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Human pose estimation using Active Shape Models

Estimating Human Pose in Images. Navraj Singh December 11, 2009

CHAPTER 3 FACE DETECTION AND PRE-PROCESSING

Evaluating Error Functions for Robust Active Appearance Models

Segmentation, Classification &Tracking of Humans for Smart Airbag Applications

A Robust Hand Gesture Recognition Using Combined Moment Invariants in Hand Shape

CS664 Lecture #21: SIFT, object recognition, dynamic programming

Thermal Face Recognition Matching Algorithms Performance

Local Image Features

Applications Video Surveillance (On-line or off-line)

Classification of extreme facial events in sign language videos

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Image features. Image Features

Transcription:

Towards automatic annotation of Sign Language dictionary corpora Marek Hrúz Zdeněk Krňoul Pavel Campr Luděk Müller Univ. of West Bohemia, Faculty of Applied Sciences, Dept. of Cybernetics Univerzitní 8, 306 14 Pilsen, Czech Republic September 4, 2011

1 Introduction and Motivation 2 Data 3 Hand/head tracking 4 Face tracking 5 Experiments with categorization 6 Conclusion

Introduction Sign Language Form of communication, mainly used by deaf or hearing impaired people. Consists of non-manual and manual component. Manual component (MC) hand shape, palm orientation, arm movement Non-manual component (NMC) face expression, body pose, lip movement

Motivation The need of bi-directional translation is very important to enable communication between users of different languages It is hard to make a good bi-directional dictionary from SL to spoken/written form Currently the translation is provided by human interpreters Our goal is automatic categorization of videos in the dictionary That should enable a better search of signs in the dictionary

Data Data for our experiment are selected signs from on-line dictionary (http://signs.zcu.cz/) They consist of pairs of synchronized video files capturing one speaker from two different views The first view captures the signer s entire body, the second view captures a detail of the face The conditions are: constant lightning, uniform background and clothing, long sleeves In total we processed 213 video files

Hand/head tracking - MC Is based on skin-color segmentation (it is ok, because of the character of the data) The resulting objects (blobs) are filtered - only probable blobs remain (1-3 blobs: left, right hand and head) We track the blobs using discriminative measures and probability models We employ 3 trackers each with 4 different models (m 1...m 4 ), each model is modeling different situation Last state / new state Not occluded occluded Not occluded m 1 m 2 Occluded m 3 m 4 The models are GMMs, trained on annotated data

In every frame the detected blobs are compared with the last known blobs - all combinations The comparison yields from 7D vectors of differences Feature vector 1 normalized correlation between gray scale intensities of blob and tracked blob 2 normalized distance between their contours (computed from Hu moments) 3 relative difference between their bounding box areas 4 relative difference between their perimeters 5 relative difference between their areas 6 relative difference between their velocity 7 relative difference between their location Using Heteroscedastic Linear Discriminant Analysis (HLDA) we reduce the dimension to 5D - the models (m 1 m 4 ) are then 5D GMMs with 4 components each This enables us to compute the probability between the new objects and the last known objects

Introduction and Motivation Data Hand/head tracking Face tracking Experiments with categorization Conclusion We can just choose the most probable solution for each tracker independently, but often the alone-standing models are not powerful enough The same blob can be identified as both hands (in occlusion) and the other blob is left unlabeled Therefor we take into account the configuration of the head and the hands - we test the number of detected blobs 5 configuration of head/hand configuration In the end we choose the most probable configuration based on the number of detected body parts. The result of tracking is the trajectory of all body parts and the contours of the blobs.

Face expression tracking - NMC Generative parametric models are commonly used to track human faces in images 2D multi-resolution combined active appearance model (AAM) AAM is based in PCA and combines model of shape and texture (appearance) Shape model provides geometric features very useful for categorization Texture model ensures robust tracking of face expression

Training of AAM Training set consists of 51 random selected images of the face to cover maximum variation in the face We manually identify the mesh in each training image 9 shape and 41 texture principal components preserve 97.5% of total variance Figure: Example of first four shape principal components (±150% of standard deviation)

Contribution of the shape principal components into acceptable categories in consequence of the used PCA is not evident For example, the first shape parameter describes the opening of the mouth, however the remaining shape parameters incorporate the partial opening of the mouth as well For each frame of the input video, we use the AAM only for identification of the face shape We extract following three face measures:

Processing of video files: AAM is sensitive to the initial shape and can end in local minima We have to use initial localization of face in the first frame of each processed video file For this purpose, the most likely area showing the speaker s face is detected by Viola-Jones adaboost face detector Finally, we have trajectories of the face measures for all frames and lexical signs Figure: Final fitting of AAM, from the left: fitted appearance of three consecutive input frames and incorrect tracking with occlusion.

Categorization of lexical sign - Experiments Linguists have not yet established an universal categorization We chose more abstract categories so that we can build on them in the future For experiment, we consider following categories: Table: The sign categories chosen for the experiment. Hand movement Body contact Hand location Head one handed no contact at waist mouth open two handed head and right hand at chest mouth closed symmetric head and left hand at head lip pressed together non-symmetric contact of hands above head lip pucker contact of everything eyes closing

For MC, we have 2D trajectories of the contours and centroids of the blobs 1 The sum of variance of centroids determines confidence factor for one and two handed variant 2 For symmetry, we consider a sum of absolute values of correlation coefficients for the left and right hand - if the trajectories are correlated enough the sign is symmetric 3 The confidence factors of body contact categories are directly extracted from GMM models of hand/head trackers 4 5 bins histogram from relative position hands and head determines category for location of the sign

The NMC categories are derived from face measures For each category, we considers one simple Gaussian model defined for relevant face measure only These models are trained on extracted frames from manually labeled video files Trained models provide likelihood of the categories for all frames of the input video We propose the final NMC confidence factor of one category is maxima over likelihoods of all categorized frames

Conclusion We proposed new framework for automatic tracking and categorization of video files capturing lexical signs interpreted one signing human and with predefined conditions Proposed tracking method for MC has a 94.45% success rate against manually annotated video files For NMC, tracking was successful approximately in 95% of signs, the algorithm fails if the hands occlude significant parts of the face In the experiment, we consider only such categories that can be automatically extracted from video frames In conclusion, the proposed automatic categorization provides additional annotation about lexical signs and extends the potential of searching and translation in the SL dictionaries

Thank you!