A Two-stage Scheme for Dynamic Hand Gesture Recognition

Similar documents
To appear in ECCV-94, Stockholm, Sweden, May 2-6, 1994.

Hand Gesture Recognition using Depth Data

Simultaneous Tracking of Both Hands by Estimation of Erroneous Observations

Fingertips Tracking based on Gradient Vector

Face Detection and Recognition in an Image Sequence using Eigenedginess

Hand Gesture Recognition System

Automatic Gait Recognition. - Karthik Sridharan

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Hand Gesture Modelling and Recognition involving Changing Shapes and Trajectories, using a Predictive EigenTracker

Keywords Binary Linked Object, Binary silhouette, Fingertip Detection, Hand Gesture Recognition, k-nn algorithm.

Gesture Recognition Using Image Comparison Methods

Detection of Small-Waving Hand by Distributed Camera System

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

Automatic visual recognition for metro surveillance

GESTURE RECOGNITION SYSTEM

COMS W4735: Visual Interfaces To Computers. Final Project (Finger Mouse) Submitted by: Tarandeep Singh Uni: ts2379

MIME: A Gesture-Driven Computer Interface

Human Hand Gesture Recognition Using Motion Orientation Histogram for Interaction of Handicapped Persons with Computer

A Robust Hand Gesture Recognition Using Combined Moment Invariants in Hand Shape

Subject-Oriented Image Classification based on Face Detection and Recognition

Human Motion Detection and Tracking for Video Surveillance

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

I. INTRODUCTION. Figure-1 Basic block of text analysis

FAST AND RELIABLE RECOGNITION OF HUMAN MOTION FROM MOTION TRAJECTORIES USING WAVELET ANALYSIS

Scale Space Based Grammar for Hand Detection

Task analysis based on observing hands and objects by vision

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

A Hand Gesture Recognition Method Based on Multi-Feature Fusion and Template Matching

THE preceding chapters were all devoted to the analysis of images and signals which

CLASSIFICATION OF BOUNDARY AND REGION SHAPES USING HU-MOMENT INVARIANTS

Scale Invariant Detection and Tracking of Elongated Structures

Mouse Pointer Tracking with Eyes

A Real-Time Hand Gesture Recognition for Dynamic Applications

Radially Defined Local Binary Patterns for Hand Gesture Recognition

A Performance Evaluation of HMM and DTW for Gesture Recognition

Tilt correction of images used for surveillance

Locating Salient Object Features

A new approach to reference point location in fingerprint recognition

Research on Recognition and Classification of Moving Objects in Mixed Traffic Based on Video Detection

Adaptative Elimination of False Edges for First Order Detectors

Dynamic Stroke Information Analysis for Video-Based Handwritten Chinese Character Recognition

A Computer Vision System for Graphical Pattern Recognition and Semantic Object Detection

Optical Flow-Based Person Tracking by Multiple Cameras

HAND-GESTURE BASED FILM RESTORATION

I D I A P RECOGNITION OF ISOLATED COMPLEX MONO- AND BI-MANUAL 3D HAND GESTURES R E S E A R C H R E P O R T

Contactless Hand Gesture Recognition System

Unsupervised Human Members Tracking Based on an Silhouette Detection and Analysis Scheme

Occluded Facial Expression Tracking

Expanding gait identification methods from straight to curved trajectories

Comparing Gesture Recognition Accuracy Using Color and Depth Information

Motion Estimation for Video Coding Standards

Detection of a Single Hand Shape in the Foreground of Still Images

Static Gesture Recognition with Restricted Boltzmann Machines

Object Detection in Video Streams

RECOGNITION OF ISOLATED COMPLEX MONO- AND BI-MANUAL 3D HAND GESTURES USING DISCRETE IOHMM

COMPARATIVE STUDY OF IMAGE EDGE DETECTION ALGORITHMS

KeyVideoObjectPlaneSelectionbyMPEG-7VisualShapeDescriptor for Summarization and Recognition of Hand Gestures

A Robust Method for Circle / Ellipse Extraction Based Canny Edge Detection

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Object Detection in a Fixed Frame

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Recognition of Finger Pointing Direction Using Color Clustering and Image Segmentation

COMPARATIVE ANALYSIS OF EYE DETECTION AND TRACKING ALGORITHMS FOR SURVEILLANCE

A Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks.

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Dynamic skin detection in color images for sign language recognition

Effects Of Shadow On Canny Edge Detection through a camera

The SIFT (Scale Invariant Feature

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

Hand shape based gesture recognition in hardware

AN EXAMINING FACE RECOGNITION BY LOCAL DIRECTIONAL NUMBER PATTERN (Image Processing)

A Real-time Application of Hand Gesture Recognition in Video Games

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

HMM and IOHMM for the Recognition of Mono- and Bi-Manual 3D Hand Gestures

Determination of 3-D Image Viewpoint Using Modified Nearest Feature Line Method in Its Eigenspace Domain

Face Detection Using Color Based Segmentation and Morphological Processing A Case Study

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

Scale Invariant Segment Detection and Tracking

Indian Institute of Technology Kanpur District : Kanpur Team number: 2. Prof. A. K. Chaturvedi SANKET

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

An Edge-Based Approach to Motion Detection*

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

DETECTION OF CHANGES IN SURVEILLANCE VIDEOS. Longin Jan Latecki, Xiangdong Wen, and Nilesh Ghubade

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

MATLAB Based Interactive Music Player using XBOX Kinect

An Efficient Character Segmentation Based on VNP Algorithm

Image Matching Using Run-Length Feature

Layout Segmentation of Scanned Newspaper Documents

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Logical Templates for Feature Extraction in Fingerprint Images

Clustering and Visualisation of Data

Exploring Curve Fitting for Fingers in Egocentric Images

Human Action Recognition Using Independent Component Analysis

Gesture Recognition using Temporal Templates with disparity information

Aircraft Tracking Based on KLT Feature Tracker and Image Modeling

Kinematics of Machines Prof. A. K. Mallik Department of Mechanical Engineering Indian Institute of Technology, Kanpur. Module 10 Lecture 1

Object Recognition 1

Object Recognition 1

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views

Transcription:

A Two-stage Scheme for Dynamic Hand Gesture Recognition James P. Mammen, Subhasis Chaudhuri and Tushar Agrawal (james,sc,tush)@ee.iitb.ac.in Department of Electrical Engg. Indian Institute of Technology, Bombay, India-76 Abstract In this paper a scheme is presented for recognizing hand gestures using the output of a hand tracker which tracks a rectangular window bounding the hand region. A hierarchical scheme for dynamic hand gesture recognition is proposed based on state representation of the dominant feature trajectories using an a priori knowledge of the way in which each gesture is performed.. Introduction Hand gesture recognition is essential in a host of applications like haptic interfaces for large-screen multimedia and virtual reality environments [], robot programming by demonstration, sign language recognition, human computer interaction, telerobotic applications, etc. Previous attempts at recognizing similar gestures have used a variety of methods. Davis and Shah [] used markers for tracking finger tips and used the fingertip trajectories for recognizing seven gestures. In [], the hand s position in the image, velocity, values obtained by eigen analysis, etc. are used as features and words from the ASL are recognized using an HMM based scheme. In [], the concept of motion energy is used to estimate the dominant motion of the hand and the gestures are recognized by fitting finite state models of gestures. In [], a gesture is classified as a sequence of postures using Principal Component Analysis and recognized using Finite State Machines. Multiple cameras are used in [6] to extract the D pose of the human body. Instead of using all the parameters describing the pose as features, the trajectory obtained by a projection into a D eigenspace is used for gesture recognition. In [7], each feature trajectory is split into sub-trajectories and recognition is achieved by maximizing the probability of it being a particular gesture in the eigenspaces of each these sub-trajectories. An incremental recognition strategy that is an extension of the condensation algorithm is proposed in [8] to recognize gestures based on the D hand trajectory. Gestures are modeled as velocity trajectories and the condensation algorithm is used to incrementally match the gesture models to the input data. A robust hand tracker proposed in [9] is used for tracking the hand in order to extract features which can be used for recognizing the gestures. At the first level of classification, the gesture to be recognized is assigned to one of the five classes based on its dominant feature trajectories. In the next level, the gesture is recognized using a sequence of states which are obtained from the dominant feature trajectories. The representation of gestures as a sequence of states overcomes the problem due to variation in the speed at which a gesture is performed, thus avoiding time warping of the data sequence.. Selection of Features The proposed gesture recognition system is intended to be used as an interface for a telerobotic system. Dynamic manipulative hand gestures being most suited to such an application, we select the ten gestures listed in Table to form our gesture vocabulary. The hand tracker mentioned in the previous section tracks the change in position and shape of a rectangular window bounding the palm region of the hand performing the gesture. Hence the coordinates of the centroid of the hand region should serve as good features. The change in area of the hand region can capture the variation in the shape of the palm region, and moreover, it is also indicative of the variation in hand position along the axis of the camera. Thus the feature vector for the n th frame in the video sequence is selected to be f (n) =[X(n) Y (n) A(n)] T In order to have values which can be meaningful from one video sequence to another, we scale the x and y- coordinates of the centroid by the width of the hand region during the start position and the area by the initial area, i.e. A() =. This provides invariance to magnification due to the distance from the camera at which the gesture is performed and also with respect to changes in hand size from person to person. te that the features are not made invariant to the starting position. This is not required in our scheme as the representation of trajectories as a sequence of states will not depend on this. Thus we have X(n) =m(n)=m() Y (n) =m(n)=m() () A(n) =m(n)=m()

(a) Figure : Smoothing of the feature trajectories for the Move Right gesture. (a)features as obtained from the tracker, and (b) Feature trajectories after smoothing. (a) Move Up (c) Go Away (b) (b) Move Down (d) Come Closer Figure : (a-d) Smoothed feature trajectories of various gestures. where m pq (n) is the pq th moment of the hand region obtained by tracking in the n th frame. The raw data obtained by tracking may be noisy and hence we smoothen the data using an averaging filter. Fig. (a) shows the raw features as obtained from the tracker for the Move Right (RIGHT) gesture clearly depicting their noisy nature. The result of smoothing this using an averaging filter of length 7 is given in Fig. (b). The feature trajectories of each unknown gesture are smoothed in this manner before any further stages of processing in order to eliminate noise. Smoothed feature trajectories of gestures performed by a single user have been shown in Fig... Initial Classification of Gestures based on the Dominant Features We observe that although features have been selected, the information conveyed by the gestures is not simultaneously captured by all the features. For example, in the case of Move Right (RIGHT) gesture, the information is contained in the horizontal motion which is captured by the Y (n) feature in our case. The other two features do not show meaningful variations in this case. We call these features which capture the information conveyed by the gesture as the dominant features corresponding to the gesture. By defining S X ;S Y and S A to be sets of gestures whose dominant features are X(n);Y(n) and A(n), respectively, we obtain the Venn diagram representation of Fig.. It shows the relationship between the gestures and their dominant features. For example, the Clockwise (CW) and Counterclockwise (CCW) gestures where both X(n)andY (n) convey information are shown to belong to both sets S X and S Y. The seven non-overlapping regions in Fig. show that for features, we may have at most 7 classes, based on which of the features are dominant. Our set of gestures, does not cover all the seven classes. The subset (S X S A ) (S X S Y S A ) is empty in our case. For the Move Left (LEFT), Move Right (RIGHT), Move Up (UP), Move Down (DOWN), Move Counterclockwise (CCW) and Move Clockwise (CW) gestures, either X(n) or Y (n) or both are the dominant features. In the Push (PUSH) and Pull (PULL) gestures, A(n) is the dominant feature as the change in the area of the hand conveys the information. The remaining two gestures, viz., Go Away (AWAY) and Come Closer (CLOSER), show very little motion and small variation in area. Hence at the first level, both are included in the same class where the range of variation in X(n) and Y (n) is small and the range of variation of A(n) is comparable to that of X(n) and/or Y (n). Thus in the first stage of classification we have classes, each containing gestures as shown in Fig.. As the dominant feature or features can be determined based on the range of variation of each feature trajectory, we obtain the decision tree based scheme as shown in Fig. for the first level of classification. In terms of the features this assumes the following form. First we smooth each feature trajectory as mentioned earlier in order to suppress tracking noise. Thereafter we obtain the range of variation of feature i, i =max(i(n)) min(i(n)) where i = X; Y; A. A (n) =CA(n), where C is a factor used to make the range of change in area comparable to that of change in position. In this study we select C =.w we assign the gesture to be recognized to one of the classes

UP DOWN SX CW CCW AWAY PUSH PULL SY CLOSER LEFT RIGHT Gesture to be recognized Is variation in X large and variation in Y small? Is variation in Y large and variation in X small? Class I Class II S A Figure : Gesture sets based on dominant features. Are variations in X and Y both large? Class III Gesture Set Is A the feature with max variation and is the ratio of max to min large? Class IV Class: I II III IV V Class V UP DOWN LEFT RIGHT CCW CW PUSH PULL CLOSER AWAY Figure : Initial classification of the gesture set based on dominant features. in the first stage as follows. We define, f i = arg max i i and the classification proceeds as follows. ffl Class I: If ( f i = X )&( X )&( Y < X= ) ffl Class II: If ( f i = Y )&( Y ) &( X < Y =) ffl Class III: If (( f i = X)&( X > )&( Y > X=)) OR (( f i = Y )&( Y > )&( X > Y =)) ffl Class IV: If ( f i = A )&(max(a(n))= min(a(n)) > :) ffl Class V: If none of the above classes where & is the logical AND operator and and are appropriate thresholds for determining whether the motion is significant ( =:; = in this study). A threshold of means that we consider a motion greater than times the initial hand width ( m() ) to be large motion. The threshold for class III is lesser than because of the fact that when there is significant motion in both the horizontal and vertical directions, the range of variation in each direction is naturally reduced compared to the case in which there is large motion in one direction alone. Figure : Decision tree for the first stage of classification.. Trajectory Based Classification We need to analyse the dominant feature trajectories within each class in order to recognize the gestures. We observe that each gesture consists of portions of motion in which the feature dynamics are similar and distinct. As the information characterizing the gesture is contained in the sequence of these portions of similar dynamics, we propose a definition of a gesture which incorporates this information. We define a gesture to be a sequence of states fs; :::; s n g, where the states correspond to portions of the dominant feature trajectory having similar dynamics. For each class a sequence of states can be defined such that it characterizes and distinguishes between the gestures in that class. The resulting sequence of states obtained from the trajectory can be used for gesture recognition as explained below. Class I This class consists of UP and DOWN gestures. Due to the presence of only one dominant feature, viz. X(n), the states are scalars in this case. For the UP gesture, initially the hand shows a small downward movement followed by a long upward motion as shown in Fig. 6. Hence ideally X(n) would initially increase from the time index (frame) up to some index k and then decrease all the way until the end of the sequence, say m. Hence it should be split into two states representing this. We select the states to be the change in the feature over portions of uniform dynamics i.e. portions which consist of motion in the same direction. Thus, ideally this gesture would result

X(n) X k m n s Figure 6: Obtaining states from the dominant trajectory s Y S S S S [ -] T [- -] T [- ] T [ ] T (-,-) (-,) Y (,-) (,) X in the states s = X(k) X()and s = X(m) X(k). The frame k would correspond to a maxima where the direction of motion changes and hence changing the sign of the derivative of X(n). In order to determine whether it is the UP gesture, we use its most dominant characteristic, i.e. the upward motion, resulting in the recognition criterion ±s i <. The DOWN gesture is similar except that the direction is reversed. Thus, we may summarize the criterion for classification as follows. If ±s i < :UP; Else : DOWN. Class II The gestures in this class are LEFT and RIGHT. This class and its gestures are analogous to that of class I. The only difference is that instead of X(n) being the dominant feature as in class I, Y (n) is the dominant feature here. As a result the same strategy as in class I can be used. If ±s i < :RIGHT ; Else : LEFT. Class III The gestures in this class are CCW and CW. As this class has two dominant features, X(n) and Y (n), the states are constructed using both of them. The method of selecting the states is depicted in Fig. 7(a). Both the trajectories are split individually at points of maxima or minima and later both of these are merged together to obtain states for the combined D trajectory. Since our purpose is to differentiate between the directions of rotation of the hand, we select the directions of change of X(n) and Y (n) as the elements of the D state vectors. An increase is denoted by and a decrease by. Thus for the trajectories shown in Fig. 7, the states would be s = [ ] T, s = [ ] T, s = [ ] T and s = [ ] T. In the case of CCW and CW gestures, ideally we would expect a sequence of states due to the hand moving in a circle and returning back to the same position. However, the hand may rotate a bit more or less re- (a) Merging for D states (b) Mapping from states to directions Figure 7: Recognition of gestures in Class III sulting in the number of states N s being not equal to exactly in some cases. Hence for recognizing the gesture, we use a scheme based on determining the direction of rotation. We define a mapping f : S! D, i.e. from the D states to directions as shown in Fig. 7(b) and defined as. f ([ ] T ) = ; f ([ ] T ) = ; f([ ] T ) = ; f ([ ] T )=. For a clockwise rotation of the hand the resultant sequence, f (s); f(s); :::;f(s n ) would be a subsequence of the periodic sequence ; ; ; ; ; ; ;::: The direction of rotation would be given by the sign of sum of the cyclic differences P n ff N = i= f (s i)ψf (s i ) where Ψ denotes a K-cyclic difference mapping defined as follows: 8 a; b ffl f;:::;k g; 8 < a Ψ b = : a b if K<a b<k a b = K a b = K In our case K = : Thus we recognize the gesture as follows. If ff N > : CW ; Else : CCW. Class IV This class has two gestures, PUSH and PULL, which have only one dominant feature A (n). Both would ideally result in a single state in which the area either increases or decreases monotonically. Thus, similar to class I, we define the state to be s = A (m) A (). Hence to recognize the gestures we check whether the number of states N s is and differentiate between them based on the direction of change, i.e. increase or decrease of area. The resultant scheme is summarized below.

If N s =& (If s > :PUSH ; Class V Else : PULL). Both the gestures of this class, AWAY and CLOSER, are similar in the sense that both have A (n) as a dominant feature showing similar variation. Hence this feature cannot be used to discriminate between them. However, X(n) shows very different characteristics for both. For the CLOSER gesture since there is no horizontal motion of the hand, X(n) remains almost constant, whereas for the AWAY gesture, there is a distinct horizontal movement first towards the left and then towards the right. Y (n) also shows similar behavior for both gesture. Hence we use only X(n) to form the states for the AWAY gesture, using the same method as used for classes I and II which should ideally result in states. The recognition criteria are listed below. If X < Y = :CLOSER ; Else If N s =& s > :AWAY A gesture which does not fall into any of these categories results in an unrecognized gesture.. Experimental Results The hierarchical gesture recognition scheme based on dominant features was tested on a data set of 8 gestures performed by different users. Each gesture sequence is of different length varying form to 6 frames depending on the time taken to perform the gestures. The gestures were captured in a natural office environment with a cluttered background. Out of the 8 gestures, were correctly recognized, while there were two false recognition. This was due to the manner in which the gesture was performed. There was significant motion in both x and y-directions which led to false recognition. There were 6 unrecognized gestures and it was found that this was due to false skin color detection resulting in faulty tracking results and this has nothing to do with the accuracy of the proposed recognition scheme. Table summarizes the results. 6. Conclusions In this paper, a technique for gesture recognition using the data available from a hand tracker developed earlier has been proposed. We use a hierarchical scheme where in the first stage we classify based on the dominant features and in the second stage recognize the gesture based on states describing the dominant feature trajectory. Our recognition technique is unaffected by the rate at which gestures are performed and the initial position of the hand in the image due to the state based approach. The representation of gestures as a sequence of states obtained by splitting the dominant feature trajectories into perceptually important segments results in a fast and simple recognition scheme.. Gestures Instances True False Unrecog. LEFT 6 6 RIGHT 9 8 UP DOWN 9 CCW 6 CW 7 7 7 PUSH 8 PULL 9 AWAY CLOSER Total 8 6 References Table : Summary of experimental results [] R. Sharma V. Pavlovic and T. S. Huang, Visual interpretation of hand gestures for human-computer interaction: A review, IEEE Trans. on PAMI, vol. 9, no. 7, pp. 677 69, 997. [] J. Davis and M. Shah, Visual gesture recognition, in IEE Proc. - Vision, Image, Signal Processing, April 99, vol., pp.. [] J. Weaver T. Starner and A. Pentland, Real-time american sign language recognition using desk and wearable computer based video, in IEEE Trans. on PAMI, Dec 998, pp. 7 7. [] M. Yeasin and S. Chaudhuri, Visual understanding of dynamic hand gestures, Pattern Recognition, vol., pp. 8 87,. [] J. Martin and J.L. Crowley, An appearance-based approach to gesture-recognition, in Proc. of ICIAP, Sep 997. [6] H. Ohno and M. Yamamoto, Gesture recognition using character recognition technique on d eigenspace, in Proc. of IEEE ICCV, Sep 999, vol., pp.. [7] D. Hall Martin and J. L. Crowley, Statistical recognition of parameter trajectories for hand gestures and face expressions, in Proc. of ECCV, 998. [8] M. J. Black and A. D. Jepson, Recognizing temporal trajectories using the condensation algorithm, in Proc. of IEEE Int. Conf on Automatic Face and Gesture Recognition, 998. [9] James P. Mammen, S. Chaudhuri and Tushar Agrawal, Simultaneous tracking of both hands by estimation of erroneous observations, in Proc. of BMVC, Manchester, Sep.