Hand Tracking and Gesture Recognition for Human-Computer Interaction

Similar documents
Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

Detection and Recognition of Alert Traffic Signs

An Unsupervised Segmentation Framework For Texture Image Queries

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension

Controlled Information Maximization for SOM Knowledge Induced Learning

Optical Flow for Large Motion Using Gradient Technique

IP Network Design by Modified Branch Exchange Method

Positioning of a robot based on binocular vision for hand / foot fusion Long Han

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks

Color Correction Using 3D Multiview Geometry

EYE DIRECTION BY STEREO IMAGE PROCESSING USING CORNEAL REFLECTION ON AN IRIS

3D Hand Trajectory Segmentation by Curvatures and Hand Orientation for Classification through a Probabilistic Approach

Frequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters

Fifth Wheel Modelling and Testing

Obstacle Avoidance of Autonomous Mobile Robot using Stereo Vision Sensor

A New and Efficient 2D Collision Detection Method Based on Contact Theory Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai MIAO, Jian XUE

Haptic Glove. Chan-Su Lee. Abstract. This is a final report for the DIMACS grant of student-initiated project. I implemented Boundary

Recognizing Primitive Interactions by Exploring Actor-Object States

A Neural Network Model for Storing and Retrieving 2D Images of Rotated 3D Object Using Principal Components

(a, b) x y r. For this problem, is a point in the - coordinate plane and is a positive number.

MULTI-TEMPORAL AND MULTI-SENSOR IMAGE MATCHING BASED ON LOCAL FREQUENCY INFORMATION

3D Reconstruction from 360 x 360 Mosaics 1

Extract Object Boundaries in Noisy Images using Level Set. Final Report

Development and Analysis of a Real-Time Human Motion Tracking System

= dv 3V (r + a 1) 3 r 3 f(r) = 1. = ( (r + r 2

Prof. Feng Liu. Fall /17/2016

A Two-stage and Parameter-free Binarization Method for Degraded Document Images

A Novel Automatic White Balance Method For Digital Still Cameras

Modelling, simulation, and performance analysis of a CAN FD system with SAE benchmark based message set

Module 6 STILL IMAGE COMPRESSION STANDARDS

An Extension to the Local Binary Patterns for Image Retrieval

CSE 165: 3D User Interaction

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers

A Mathematical Implementation of a Global Human Walking Model with Real-Time Kinematic Personification by Boulic, Thalmann and Thalmann.

Improved Fourier-transform profilometry

ISyE 4256 Industrial Robotic Applications

Topological Characteristic of Wireless Network

ADDING REALISM TO SOURCE CHARACTERIZATION USING A GENETIC ALGORITHM

Image Registration among UAV Image Sequence and Google Satellite Image Under Quality Mismatch

Mobility Pattern Recognition in Mobile Ad-Hoc Networks

A ROI Focusing Mechanism for Digital Cameras

A Memory Efficient Array Architecture for Real-Time Motion Estimation

Goal. Rendering Complex Scenes on Mobile Terminals or on the web. Rendering on Mobile Terminals. Rendering on Mobile Terminals. Walking through images

Image Enhancement in the Spatial Domain. Spatial Domain

Topic -3 Image Enhancement

Illumination methods for optical wear detection

17/5/2009. Introduction

Voting-Based Grouping and Interpretation of Visual Motion

Cellular Neural Network Based PTV

Free Viewpoint Action Recognition using Motion History Volumes

Mono Vision Based Construction of Elevation Maps in Indoor Environments

View Synthesis using Depth Map for 3D Video

AUTOMATED LOCATION OF ICE REGIONS IN RADARSAT SAR IMAGERY

Detection and tracking of ships using a stereo vision system

Effects of Model Complexity on Generalization Performance of Convolutional Neural Networks

Visual Servoing from Deep Neural Networks

Assessment of Track Sequence Optimization based on Recorded Field Operations

Cardiac C-Arm CT. SNR Enhancement by Combining Multiple Retrospectively Motion Corrected FDK-Like Reconstructions

Towards Adaptive Information Merging Using Selected XML Fragments

HISTOGRAMS are an important statistic reflecting the

Augmented Reality. Integrating Computer Graphics with Computer Vision Mihran Tuceryan. August 16, 1998 ICPR 98 1

Adaptation of Motion Capture Data of Human Arms to a Humanoid Robot Using Optimization

A modal estimation based multitype sensor placement method

POMDP: Introduction to Partially Observable Markov Decision Processes Hossein Kamalzadeh, Michael Hahsler

Optimal Adaptive Learning for Image Retrieval

Accurate Diffraction Efficiency Control for Multiplexed Volume Holographic Gratings. Xuliang Han, Gicherl Kim, and Ray T. Chen

Title. Author(s)NOMURA, K.; MOROOKA, S. Issue Date Doc URL. Type. Note. File Information

Shortest Paths for a Two-Robot Rendez-Vous

A Shape-preserving Affine Takagi-Sugeno Model Based on a Piecewise Constant Nonuniform Fuzzification Transform

Lecture # 04. Image Enhancement in Spatial Domain

Multi-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples

On Error Estimation in Runge-Kutta Methods

Kalman filter correction with rational non-linear functions: Application to Visual-SLAM

Transmission Lines Modeling Based on Vector Fitting Algorithm and RLC Active/Passive Filter Design

3D Periodic Human Motion Reconstruction from 2D Motion Sequences

Input Layer f = 2 f = 0 f = f = 3 1,16 1,1 1,2 1,3 2, ,2 3,3 3,16. f = 1. f = Output Layer

Dense pointclouds from combined nadir and oblique imagery by object-based semi-global multi-image matching

Several algorithms exist to extract edges from point. system. the line is computed using a least squares method.

3D inspection system for manufactured machine parts

Lecture 27: Voronoi Diagrams

Desired Attitude Angles Design Based on Optimization for Side Window Detection of Kinetic Interceptor *

A New Finite Word-length Optimization Method Design for LDPC Decoder

Real-Time Speech-Driven Face Animation. Pengyu Hong, Zhen Wen, Tom Huang. Beckman Institute for Advanced Science and Technology

Conservation Law of Centrifugal Force and Mechanism of Energy Transfer Caused in Turbomachinery

COLOR EDGE DETECTION IN RGB USING JOINTLY EUCLIDEAN DISTANCE AND VECTOR ANGLE

An Optimised Density Based Clustering Algorithm

Proactive Kinodynamic Planning using the Extended Social Force Model and Human Motion Prediction in Urban Environments

RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES

9-2. Camera Calibration Method for Far Range Stereovision Sensors Used in Vehicles. Tiberiu Marita, Florin Oniga, Sergiu Nedevschi

Color Interpolation for Single CCD Color Camera

A Novel Image-Based Rendering System With A Longitudinally Aligned Camera Array

vaiation than the fome. Howeve, these methods also beak down as shadowing becomes vey signicant. As we will see, the pesented algoithm based on the il

DEADLOCK AVOIDANCE IN BATCH PROCESSES. M. Tittus K. Åkesson

n If S is in convex position, then thee ae exactly k convex k-gons detemined by subsets of S. In geneal, howeve, S may detemine fa fewe convex k-gons.

Clustering Interval-valued Data Using an Overlapped Interval Divergence

Ego-Motion Estimation on Range Images using High-Order Polynomial Expansion

A Minutiae-based Fingerprint Matching Algorithm Using Phase Correlation

Improvement of First-order Takagi-Sugeno Models Using Local Uniform B-splines 1

OPTIMUM DESIGN OF 3R ORTHOGONAL MANIPULATORS CONSIDERING ITS TOPOLOGY

Machine Learning for Automatic Classification of Web Service Interface Descriptions

Transcription:

Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 Hand Tacking and Gestue Recognition fo Human-Compute Inteaction Cistina Manesa, Javie Vaona, Ramon Mas and Fancisco J. Peales Unidad de Gáficos y Visión po Computado Depatamento de Matemáticas e Infomática Univesitat de les Illes Baleas Edificio Anselm Tumeda, Cta. Valldemossa km 7.5 07122 Palma de Malloca - España Received 4 Febuay 2005; accepted 18 May 2005 Abstact The poposed wok is pat of a poject that aims at the contol of a videogame based on hand gestue ecognition. This goal implies the estiction of eal-time esponse and the use of unconstained envionments. In this pape we pesent a new algoithm to tack and ecognise hand gestues fo inteacting with a videogame. This algoithm is based on thee main steps: hand segmentation, hand tacking and gestue ecognition fom hand featues. Fo the hand segmentation step we use the colou cue due to the chaacteistic colou values of human skin, its invaiant popeties and its computational simplicity. To pevent eos fom hand segmentation we add the hand tacking as a second step. Tacking is pefomed assuming a constant velocity model and using a pixel labeling appoach. Fom the tacking pocess we extact seveal hand featues that ae fed into a finite state classifie which identifies the hand configuation. The hand can be classified into one of the fou gestue classes o one of the fou diffeent movement diections. Finally, the system s pefomance is evaluated by showing the usability of the algoithm in a videogame envionment. Key Wods: Hand Tacking, Gestue Recognition, Human-Compute Inteaction, Peceptual Use Intefaces. 1 Intoduction Nowadays, the majoity of human-compute inteaction (HCI) is based on mechanical devices such as keyboads, mouses, joysticks o gamepads. In ecent yeas thee has been a gowing inteest in methods based on computational vision due to its ability to ecognise human gestues in a natual way [1]. These methods use the images acquied fom a camea o fom a steeo pai of cameas as input. The main goal of these algoithms is to measue the hand configuation at each time instant. To facilitate this pocess many gestue ecognition applications esot to the use of uniquely coloued gloves o makes on hands o finges [2]. In addition, using a contolled backgound makes it possible to locate the hand efficiently, even in eal-time [3]. These two conditions impose estictions on the use and on the inteface setup. We have specifically avoided solutions that equie coloued gloves o makes and a Coespondence to: <cistina.manesa@uib.es> Recommended fo acceptance by <Peales F., Dape B.> ELCVIA ISSN: 1577-5097 Published by Compute Vision Cente / Univesitat Autonoma de Bacelona, Bacelona, Spain

97 Manesa et al. / Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 contolled backgound because of the initial equiements of ou application. It must wok fo diffeent people, without any complement on them and also fo unpedictable backgounds. Ou application uses images fom a low-cost web camea placed in font of the wok aea, whee the ecognised gestues act as the input fo a compute 3D videogame. The playes, athe than pessing buttons, must use diffeent hand gestues that ou application should ecognise. This fact, inceases the complexity since the esponse time must be vey fast. Uses should not appeciate a significant delay between the instant they pefom a gestue o motion and the instant the compute esponds. Theefoe, the algoithm must povide eal-time pefomance fo a conventional pocesso. Most of the known hand tacking and ecognition algoithms do not meet this equiement and ae inappopiate fo visual inteface. Fo instance, paticle filteing-based algoithms can maintain multiple hypotheses at the same time to obustly tack the hands but they need high computational demands [4]. Recently, seveal contibutions fo educing the complexity of paticle filtes have been pesented, fo example, using a deteministic pocess to help the andom seach [5]. Also in [6], we can see a multi-scale colou featue fo epesenting hand shape and paticle filteing that combines shape and colou cues in a hieachical model. The system has been fully tested and seems obust and stable. To ou knowledge the system uns at about 10fames/second and does not conside seveal hand states. Howeve, these algoithms only wok in eal-time fo a educed size hand and in ou application, the hand fills most of the image. In [7], shape econstuction is quite pecise, a high DOF model is consideed, and in ode to avoid self-occlusions infaed othogonal cameas ae used. The authos popose to apply this technique using a colou skin segmentation algoithm. In this pape we popose a eal-time non-invasive hand tacking and gestue ecognition system. In the next sections we explain ou method which is divided in thee main steps. The fist step is hand segmentation, the image egion that contains the hand has to be located. In this pocess, the use of the shape cue is possible, but they vay geatly duing the natual hand motion[8]. Theefoe, we choose skin-colou as the hand featue. The skin-colou is a distinctive cue of hands and it is invaiant to scale and otation. The next step is to tack the position and oientation of the hand to pevent eos in the segmentation phase. We use a pixel-based tacking fo the tempoal update of the hand state. In the last step we use the estimated hand state to extact seveal hand featues to define a deteministic pocess of gestue ecognition. Finally, we pesent the system s pefomance evaluation esults that pove that ou method woks well in unconstained envionments and fo seveal uses. 2 Hand Segmentation Citeia The hand must be located in the image and segmented fom the backgound befoe ecognition. Colou is the selected cue because of its computational simplicity, its invaiant popeties egading to the hand shape configuations and due to the human skin-colou chaacteistic values. Also, the assumption that colou can be used as a cue to detect faces and hands has been poved useful in seveal publications [9,10]. Fo ou application, the hand segmentation has been caied out using a low computational cost method that pefoms well in eal time. The method is based on a pobabilistic model of the skin-colou pixels distibution. Then, it is necessay to model the skin-colou of the use s hand. The use places pat of his hand in a leaning squae as shown in Fig. 1. The pixels esticted in this aea will be used fo model leaning. Next, the selected pixels ae tansfomed fom the RGB-space to the HSL-space and the choma infomation is taken: hue and satuation. Figue 1: Application inteface and skin-colou leaning squae.

Manesa et al. / Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 98 We have encounteed two poblems in this step that have been solved in a pe-pocessing phase. The fist one is that human skin hue values ae vey nea to ed colou, that is, thei value is vey close to 2π adians, so it is difficult to lean the distibution due to the hue angula natue that can poduce samples on both limits. To solve this inconvenience the hue values ae otated π adians. The second poblem in using HSL-space appeas when the satuation values ae close to 0, because then the hue is unstable and can cause false detections. This can be avoided discading satuation values nea 0. Once the pe-pocessing phase has finished, the hue, H, and satuation, S, values fo each selected pixel ae used to infe the model, that is, x = ( x 1,..., x n ), whee n is the numbe of samples and a sample is x i = ( H i, Si ). A Gaussian model is chosen to epesent the skin-colou pobability density function. The values fo the paametes of the Gaussian model (mean, x, and covaiance matix, Σ ) ae computed fom the sample set using standad maximum likelihood methods [11]. Once they ae found, the pobability that a new pixel, x = (H, S), is skin can be calculated as 1 P( x ( 2 ) ( ) ) is skin) = 1 ( -1 x x Σ x x e T. (1) 2 ( 2π ) Σ Finally, we obtain the blob epesentation of the hand by applying a connected components algoithm to the pobability image, which goups pixels into the same blob. The system is obust to backgound changes and low light conditions. If the system gets lost, you can initialise it again by going to the hand stat state. Fig. 2 shows the blob contous found by the algoithm fo diffeent envionment conditions whee the system has been tested. Figue 2: Hand contous fo diffeent backgounds (1 st ow) and diffeent light conditions (2 nd ow). 3 Tacking Pocedue USB cameas ae known fo the low quality images they poduce. This fact can cause eos in the hand segmentation pocess. In ode to make the application obust to these segmentation eos we add a tacking algoithm. This algoithm ties to maintain and popagate the hand state ove time.

99 Manesa et al. / Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 We epesent the hand state in time t, s (t), by means of a vecto, s( t) = ( p( t), w( t), α( t)), whee p = p x, p ) is the hand position in the 2D image, the hand size is epesented by w = ( w, h), whee w is ( y the hand width and h is the hand height in pixels, and, finally, α is the hand s angle in the 2D image plane. Fist, fom the hand state in time t we built a hypothesis of the hand state, h = ( p( t + 1), w( t), α(t)), fo time t +1 applying a simple second-ode autoegessive pocess to the position component p( t + 1) p( t) = p( t) p( t 1). (2) Equation (2) expesses a dynamical model of constant velocity. Next, if we assume that at time t, M blobs have been detected, B = { b1, K, b j, K, b M }, whee each blob b j coesponds to a set of connected skin-colou pixels, the tacking pocess has to set the elation between the hand hypothesis, h, and the obsevations,, ove time. b j x = In ode to cope with this poblem, we define an appoximation to the distance fom the image pixel, ( x, y), to the hypothesis h. Fist, we nomalize the image pixel coodinates n = R t ( x p( +1) ), (3) whee R is a standad 2D otation matix about the oigin, α is the otation angle, and n = ( n x, n y ) ae the nomalized pixel coodinates. Then, we can find the cossing point, c = c x, c ), between the hand hypothesis ellipse and the nomalized image pixel as follows ( y c c x y = w cosϑ, (4) = h sinϑ whee ϑ is the angle between the nomalized image pixel and the hand hypothesis. Finally, the distance fom an image pixel to the hand hypothesis is d,. (5) ( x h) = n c This distance can be seen as the appoximation of the distance fom a point in the 2D space to a nomalized ellipse (nomalized means centeed in oigin and not otated). Fom the distance definition of (5) it tuns out that its value is equal o less than 0 if x is inside the hypothesis h, and geate than 0 if it is outside. Theefoe, consideing the hand hypothesis h and a point x belonging to a blob b, if the distance is equal o less than 0, we conclude that the blob b suppots the existence of the hypothesis h and it is selected to epesent the new hand state. This tacking pocess could also detect the pesence o the absence of the hand in the image [12].

Manesa et al. / Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 100 Figue 3: Gestue alphabet and valid gestue tansitions. 4 Gestue Recognition Ou gestue alphabet consists in fou hand gestues and fou hand diections in ode to fulfil the application s equiements. The hand gestues coespond to a fully opened hand (with sepaated finges), an opened hand with finges togethe, a fist and the last gestue appeas when the hand is not visible, in pat o completely, in the camea s field of view. These gestues ae defined as Stat, Move, Stop and the No-Hand gestue espectively. Also, when the use is in the Move gestue, he can cay out Left, Right, Font and Back movements. Fo the Left and Right movements, the use will otate his wist to the left o ight. Fo the Font and Back movements, the hand will get close to o futhe fom the camea. Finally, the valid hand gestue tansitions that the use can cay out ae defined in Fig. 3. The pocess of gestue ecognition stats when the use s hand is placed in font of the camea s field of view and the hand is in the Stat gestue, that is, the hand is fully opened with sepaated finges. In ode to avoid fast hand gestue changes that wee not intended, evey change should be kept fixed fo 5 fames, if not the hand gestue does not change fom the pevious ecognised gestue. To achieve this gestue ecognition, we use the hand state estimated in the tacking pocess, that is, s = ( p, w, α). This state can be viewed as an ellipse appoximation of the hand whee p = ( p x, p y ) is the ellipse cente and w = ( w, h) is the size of the ellipse in pixels. To facilitate the pocess we define the majo axis lenght as M and the mino axis lenght as m. In addition, we compute the hand s blob contou and its coesponding convex hull using standad compute vision techniques. Fom the hand's contou and the hand s convex hull we can calculate a sequence of contou points between two consecutive convex hull vetices. This sequence foms the so-called convexity defect (i.e., a finge concavity) and it is possible to compute the depth of the ith-convexity defect,. Fom these depths it is possible to compute the depth d i

101 Manesa et al. / Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 v m M p x, p y h u d α Figue 4: Extacted featues fo the hand gestue ecognition. In the ight image, u and and the end points of the ith-convexity defect, the depth, the convensity defect to the convex hull segment. d i i indicate the stat, is the distance fom the fathemost point of vi aveage, d, as a global hand featue, see (6), whee contou, see Fig. 4. n is the total numbe of convexity defects in the hand s d = 1 n d i i= 0.. n. (6) The fist step of the gestue ecognition pocess is to model the Stat gestue. The aveage of the depths of the convexity defects of an opened hand with sepaated finges is lage than in an open hand with no sepaated finges o in a fist. This featue is used fo diffeentiating the next hand gestue tansitions: fom Stop to Stat; fom Stat to Move; and fom No-Hand to Stat. Howeve, fist it is necessay to compute the Stat gestue featue, T. Once the use is coectly placed in the camea s field of view with the hand stat widely opened the skin-colou leaning pocess is initiated. The system also computes the Stat gestue featue fo the n fist fames, ( t) 1 n d t = 0.. n T stat =. (7) 2 Once the Stat gestue is identified, the most pobable valid gestue change is the Move gestue. Theefoe, if the cuent hand depth is less than T the system goes to the Move hand gestue. If the cuent hand gestue is Move the hand diections will be enabled: Font, Back, Left and Right. stat If the use does not want to move in any diection, he should set his hand in the Move state. The fist time that the Move gestue appeas, the system computes the Move gestue featue, T, that is an aveage of the appoximated aea of the hand fo n consecutive fames, ( ) T = 1 move n M t m t) t = 0.. n move (. (8) In ode to ecognize the Left and Right diections, the calculated angle of the fitted ellipse is used. To pevent non desied jitte effects in oientation, we intoduce a pedefined constant T. Then, if the angle jitte

Manesa et al. / Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 102 of the ellipse that cicumscibes the hand, α, satisfies α the ellipse that cicumscibes the hand, α, satisfies α > T jitte, Left oientation will be set. If the angle of < T jitte, Right oientation will be set. In ode to contol the Font and Back oientations and to etun to the Move gestue the hand must not be otated and the Move gestue featue is used to diffeentiate these movements. If T C < M m succeeds the hand oientation will be Font. The Back oientation will be achieved if move font back > m M C. The Stop gestue will be ecognised using the ellipse s axis. When the hand is in a fist, the fitted ellipse is almost like a cicle and m and M ae pactically the same, that is, when M m < C. C,C and C stop ae pedefined constants established duing the algoithm pefomance evaluation. Finally, the No- Hand state will appea when the system does not detect the hand, the size of the detected hand is not lage enough o when the hand is in the limits of the camea s field of view. The next possible hand state will be the Stat gestue and it will be detected using the tansition pocedue fom Stop to Stat explained ealie on. stop font back Some examples of gestue tansitions and the ecognised gestue esults can be seen in Fig. 5. These examples ae chosen to show the algoithm obustness fo diffeent lighting conditions, hand configuations and uses. We ealize that a coect leaning of the skin-colou is vey impotant. If not, some poblems with the detection and the gestue ecognition can be encounteed. One of the main poblems with the use of the application is the hand contol, maintaining the hand in the camea s field of view and without touching the limits of the captue aea. This poblem has been shown to disappea with use s taining. Figue 5: Gestue ecognition examples fo diffeent lighting conditions, uses and hand configuations.

103 Manesa et al. / Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 5 System's pefomance evaluation In this section we show the accuacy of ou hand tacking and gestue ecognition algoithm. The application has been implemented in Visual C++ using the OpenCV libaies [13]. The application has been tested on a Pentium IV unning at 1.8 GHz. The images have been captued using a Logitech Messenge WebCam with USB connection. The camea povides 320x240 images at a captue and pocessing ate of 30 fames pe second. Nº of tests 400 350 300 250 200 150 100 50 0 Gestue Recognition S M L R F B P N Hand gestues Total of gestues Coect Gestues S : START M: MOVE L : LEFT R: RIGHT F: FRONT B: BACK P: STOP N: NO HAND Figue 6: System's pefomance evaluation esults. Fo the pefomance evaluation of the hand tacking and gestue ecognition, the system has been tested on a set of 40 uses. Each use has pefomed a pedefined set of 40 gestues and theefoe we have 1600 gestues to evaluate the application esults. It is natual to think that the system s accuacy will be measued contolling the pefomance of the desied use movements fo managing the videogame. This sequence included all the application s possible states and tansitions. Figue 6 shows the pefomance evaluation esults. These esults ae epesented using a gaph with the application states, such as Stat o Move, as columns and the numbe of appeaances of the gestue as ows. The columns ae paied fo each gestue: the fist column is the numbe of tests of the gestue that has been coectly identified; the second column is the total numbe of times that the gestue has been caied out. As it can be seen in Fig. 6, the hand ecognition gestue woks well fo a 98% of the cases. 6 Conclusions In this pape we have pesented a eal-time algoithm to tack and ecognise hand gestues fo humancompute inteaction within the context of videogames. We have poposed an algoithm based on skin colou hand segmentation and tacking fo gestue ecognition fom extacted hand mophological featues. The system s pefomance evaluation esults have shown that the uses can substitute taditional inteaction metaphos with this low-cost inteface. The expeiments have confimed that continuous taining of the uses esults in highe skills and, thus, bette pefomances. Also the system has been tested in indoo laboatoy with changing backgound scenaio and low light conditions. In these cases the systems un well, with the logical exception of simila skin backgound situations o seveal hands intesecting in the same space and time. The system must be impoved to discad bad classifications situations due to the segmentation pocedue. But, in this case, the use can estat the system only going to the Stat hand state.

Manesa et al. / Electonic Lettes on Compute Vision and Image Analysis 5(3):96-104, 2005 104 Acknowledgements The pojects TIC2003-0931 and TIC2002-10743-E of MCYT Spanish Govenment and the Euopean Poject HUMODAN 2001-32202 fom UE V Pogam-IST have subsidized this wok. J.Vaona acknowledges the suppot of a Ramon y Cajal fellowship fom the spanish MEC. Refeences [1] V.I. Pavlovic, R. Shama, T.S Huang, Visual intepetation of hand gestues fo human-compute inteaction: a eview, IEEE Patten Analysis and Machine Intelligence, 19(7): 677 695, 1997. [2] R. Bowden, D. Windidge, T. Kadi, A. Zisseman, M. Bady, A Linguistic Featue Vecto fo the Visual Intepetation of Sign Language, in Tomas Pajdla, Jii Matas (Eds), Poc. Euopean Confeence on Compute Vision, ECCV04, v. 1: 391-401, LNCS3022, Spinge-Velag, 2004. [3] J. Segen, S. Kuma, Shadow gestues: 3D hand pose estimation using a single camea, Poc. of the Compute Vision and Patten Recognition Confeence, CVPR99, v. 1: 485, 1999. [4] M. Isad, A. Blake, ICONDENSATION: Unifying low-level and high-level tacking in a stochastic famewok, Poc. Euopean Confeence on Compute Vision, ECCV98, pp. 893-908, 1998. [5] C. Shan, Y. Wei, T. Tan, F.Ojadias, Real time hand tacking by combining paticle filteing and mean shift, Poc. Sixth IEEE Automatic Face and Gestue Recognition, FG04, pp: 229-674, 2004. [6] L. Betzne, I. Laptev, T. Lindebeg, Hand Gestue Recognition using Multi-Scale Colou Featues, Hieachical Models and Paticle filteing, Poc. Fifth IEEE Intenational Confeence on Automatic Face and Gestue Recognition, FRG02, 2002 IEEE. [7] K.Ogawaa, K. Hashimoto, J. Takamtsu, K. Ikeuchi, Gasp Recognition using a 3D Aticulated Model and Infaed Images, Institute of Industial Science,. Univ. of Tokyo, Tokyo, Japan. [8] T. Heap, D. Hogg, Womholes in shape space: tacking though discontinuous changes in shape, Poc. Sixth Intenational Confeence on Compute Vision, ICCV98, pp. 344-349, 1998. [9] G.R. Badski, Compute video face tacking fo use in a peceptual use inteface, Intel Technology Jounal, Q2'98, 1998. [10] D. Comaniciu, V. Ramesh, Robust detection and tacking of human faces with an active camea Poc. of the Thid IEEE Intenational Wokshop on Visual Suveillance, pp: 11-18, 2000. [11] C.M. Bishop, Neual Netwoks fo Patten Recognition. Claendon Pess, 1995. [12] J. Vaona, J.M. Buades, F.J. Peales, Hands and face tacking fo VR applications, Computes & Gaphics, 29(2):179-187, 2005. [13] G.R. Badski, V. Pisaevsky, Intel's Compute Vision Libay, Poc of IEEE Confeence on Compute Vision and Patten Recognition, CVPR00, v. 2: 796-797, 2000.