Ana C. Andrés del Valle. To cite this version: HAL Id: pastel

Size: px

Start display at page:

Download "Ana C. Andrés del Valle. To cite this version: HAL Id: pastel https://pastel.archives-ouvertes.fr/pastel"

Elfrieda McKenzie
6 years ago
Views:

Analye de mouvement faial dur de image monoulaire ave appliation aux téléommuniation: Couplage de la ompréhenion de l expreion et du uivi de la poe du viage Ana C.

1 Analye de mouvement faial dur de image monoulaire ave appliation aux téléommuniation: Couplage de la ompréhenion de l expreion et du uivi de la poe du viage Ana C. André del Valle To ite thi verion: Ana C. André del Valle. Analye de mouvement faial dur de image monoulaire ave appliation aux téléommuniation: Couplage de la ompréhenion de l expreion et du uivi de la poe du viage. domain other. Téléom PariTeh, Françai. <patel > HAL Id: patel Submitted on 26 Nov 2010 HAL i a multi-diiplinary open ae arhive for the depoit and diemination of ientifi reearh doument, whether they are publihed or not. The doument may ome from teahing and reearh intitution in Frane or abroad, or from publi or private reearh enter. L arhive ouverte pluridiiplinaire HAL, et detinée au dépôt et à la diffuion de doument ientifique de niveau reherhe, publié ou non, émanant de établiement d eneignement et de reherhe françai ou étranger, de laboratoire publi ou privé.

Faial Motion Analyi on Monoular Image for Teleom Appliation: Coupling Expreion and Poe Undertanding Ana C.

2 Ph.D. Thei defended to obtain the Dotor of Philoophy Degree è Siene from Téléom Pari. Reearh developed at the Multimedia Communiation Department of the Intitut Euréom Sophia Antipoli. Faial Motion Analyi on Monoular Image for Teleom Appliation: Coupling Expreion and Poe Undertanding Ana C. André del Valle Defended September 19 th, 2003 in front of the following jury: Préident: Rapporteur: Thei upervior: Prof. Mihel Barlaud (I3S) --- Prof. Françoie Prêteux (INT) Prof. Ferran Marqué (UPC) --- Danielle Pelé (Frane Téléom R&D) Prof. Jörn Otermann (Univerität Hannover) --- Prof. Jean-Lu Dugelay (Intitut Euréom)

6 iii Abtrat Faial animation ha beome an ative reearh topi in teleommuniation. Thi field aim at replaing traditional ommuniation ytem by more human oriented olution baed on virtual reality tehnology. Thi thei relate to a omplete analyi/ynthei framework for faial rigid and non-rigid motion analyi from monoular video equene. The obtained motion data are uitable to animate the realiti head lone of the analyzed peaker by generating fae animation parameter. The ore of the ytem i the rigid-motion traking algorithm, whih i able to provide the head poe parameter. The Kalman filter being ued predit the tranlation and rotation parameter, whih are applied on the yntheti lone of the uer. Thi proe enable u to benefit from the ynthetially generated virtual image by providing viual feedbak for the analyi. Thi work expoe in detail novel tehnique to tudy non-rigid faial motion oupled with head poe traking. Speifi feature analyi method have been developed to tudy eah one of the feature that we believe to be the mot relevant while ommuniating: eye, eyebrow and mouth. We have deigned image-proeing algorithm baed on the phyiognomy of the peaker and individual motion model that exploit the orrelation exiting among the analyzed feature. The analyi tehnique have been firt developed for fae being analyzed from a frontal point of view and then, uing the poe parameter derived from the traking and the 3D data of the lone, they have been adapted to allow the peaker more freedom of movement in front of the amera. Thi adaptation i poible by redefining the 2D analyi model over the lone (3D head model), in 3D, and reinterpreting the analyzed data in aordane with the 3D loation of the head. Thi report ontain experimental reult, main ontribution and relevant bibliographi referene of the overall reearh. Keyword Faial animation, 3D, monoular image proeing, fae-feature analyi, Kalman filtering, expreion-poe oupling, teleommuniation, fae animation parameter, FAP treaming.

8 v Réumé Le tehnique d animation faiale ont devenue un ujet atif de reherhe dan la ommunauté de téléommuniation. Ce domaine a pour but de remplaer le ytème traditionnel de ommuniation par de olution plu adaptée aux beoin humain, en utiliant, par exemple, la réalité virtuelle. Cette thèe dotorale e itue dan le adre du développement d un ytème d analye/ynthèe qui étudie le expreion et la poe de viage ur de équene vidéo monoulaire. Le mouvement analyé et utilié pour animer le lone du viage aoié à l utiliateur, tout en générant de paramètre d animation faiale. Le noyau entral du ytème mentionné et l algorithme de uivi du viage qui et apable de générer le paramètre qui déterminent la poe du viage. Le filtre de Kalman utilié pendant le uivi prédit le angle de rotation et le valeur de tranlation qui ont enuite appliqué ur le lone du louteur. Ce donnée nou permettent de profiter de l image virtuelle de l animation du lone obtenue pour rétro-alimenter l analye. Ce rapport expoe minutieuement une nouvelle approhe pour étudier le expreion faiale ouplée ave le uivi du viage. Nou avon développé de méthode d analye péifique pour haque trait aratéritique du viage que nou avon onidéré omme le élément le plu important pendant la ommuniation : le yeux, le ouril et la bouhe. Nou avon onçu de algorithme baé ur la phyionomie du louteur et qui utilient de modèle de mouvement individuel pour haun de trait. Le algorithme font une double vérifiation de la ohérene de réultat en utiliant la orrélation exitant entre le trait analyé. D abord, e algorithme ont été développé et teté pour fontionner ur de viage analyé depui un point de vue frontal. Enuite, il ont été adapté pour travailler ave n importe quelle poe en utiliant de paramètre de la poe et de donnée 3D du lone. Cette olution permet une plu grande liberté de mouvement du louteur fae à la amera. L adaptation et poible en redéfiniant le modèle d analye de trait ur le lone (le modèle 3D), et en réinterprétant l information analyée en relation ave le paramètre 3D qui indiquent la poe du viage. Ce travail ontient le réultat expérimentaux, le ontribution prinipale et le référene bibliographique pertinente ur l enemble de travaux de reherhe. Mot lé Animation faiale, 3D, traitement de image monoulaire, analye baée ur trait faiaux, filtrage Kalman, ouplage expreion-poe, téléommuniation, paramètre d animation faiale, FAP treaming.

10 vii Reumen La ténia de animaión faial e han onvertido en un tema andente de invetigaión en la omunidad ientífia de la teleomuniaione. En ete ampo e ha propueto utituir lo itema tradiionale de omuniaión por oluione má adaptada a la neeidade humana, utilizando la realidad virtual. Eta tei dotoral e enmara en el dearrollo de un itema de análii/íntei que etudia la expreione y la poe de la ara que apareen en euenia de video monoulare. El movimiento analizado e utilizado para animar un lon de la ara del uuario, a media que e generan parámetro de animaión faial. El nodo entral del itema mentado e el algoritmo de eguimiento de la ara que e apaz de generar lo parámetro que determinan la poe de la abeza. El filtro de Kalman que e utilizado durante el eguimiento predie lo ángulo de rotaión y tranlaión que e aplian eguidamente al lon del loutor. Eto dato no permiten aprovehar la imagen virtual de la animaión del lon obtenida graia a retroalimentaión del análii. Ete informe expone minuioamente una nueva ténia de etudio de expreione aoplada al eguimiento de la ara. Hemo dearrollado método de análii epeífio para ada rago de la ara que hemo oniderado má importante para la omuniaión humana: lo ojo, la eja y la boa. Hemo onebido algoritmo baado en la fionomía del loutor y que utilizan modelo de movimiento individuale para ada uno de lo rago faiale. Lo algoritmo verifian la oherenia de lo reultado utilizando la orrelaión exitente entre lo rago analizado. Primero, eto algoritmo han ido dearrollado y tetado para que funionen obre ara analizada dede un punto de vita frontal. Depué, han ido adaptado para trabajar on ualquier tipo de poe, utilizando lo parámetro de la loalizaión y lo dato 3D del lon. Eta oluión permite má libertad de movimiento al loutor que e enuentra delante de la ámara. La adaptaión e poible graia a que lo modelo de análii on redefinido obre el lon (en 3D), y a que e interpreta la informaión analizada en relaión on lo parámetro 3D que indian la poe de la ara. Ete trabajo ontiene lo reultado experimentale, la ontribuione prinipale y la referenia bibliográfia relevante a la totalidad de la invetigaión llevada a abo. Palabra lave Animaión faial, 3D, proeado de imagen monoular, análii baado en ragi faiale, filtrado de Kalman, aoplamiento expreión-poe, teleomuniaione, parámetro de animaión faial, FAP treaming.

12 ix Aknowledgement I would like to how my appreiation to eah one of the jury member for granting part of their time to read and evaluate the reearh work we have developed for thi thei. I want to thank my upervior Profeor Jean-Lu Dugelay. He ha provided the mean for the development of high quality work. I thank Profeor Franio Perale for hi ooperation during my brief tay at the UIB. I alo thank Intitut Euréom for giving me the opportunity and reoure to do my Ph.D and I would like to expre my gratitude toward Frane Teleom R&D for partially upporting my grant. I ay thank to olleague and friend from the intitute who have hared thee four year with me (Philippe de Cueto, Caroline Mallauran, Carine Simon, Chritian Rey, Gwennaël Doerr, Navid Nikaein, and o many other.). I peially mention Adriano Brunetti, Vahid Khami, Julien Mouille and Fabrie Souvannoung, the tudent who have helped me in developing ome of the program for the tet platform. No quiero dejar de agradeer a mi padre y má querido amigo u ontinuo apoyo. Ello on la al de mi vida. Ευχριστώ, Σωκράτη. Η εµπειρί σου κι η υποµονή σου ήτν οι κλύτερες σύµουλες κτά τη διάρκει υτών των χρονών του δοκτοράτ. Muhíima graia Iaa: por exitir, por etar a mi lado y por omprender. Tú ha ompartido mi momento de tenión, el final de eta tei e en parte fruto tuyo. TQM

14 xi Table of Content Abtrat Réumé Reumen iii v vii Introdution 1 1 Motivation 1 2 Contribution 2 3 Outline of the thei report 5 I Faial Image Analyi Tehnique & Related Proeing Fundamental 7 I.1 Introdution 9 I.2 Proeing Fundamental 12 I.2.1 Pre-proeing tehnique 12 I.2.2 Image proeing algorithm 15 I.2.3 Pot-proeing tehnique and their related mathematial tool 19 I.3 Fae Motion and Expreion Analyi Tehnique: a State of the Art 24 I.3.1 Method that retrieve emotion information 24 I.3.2 Method that obtain parameter related to the Faial Animation ynthei ued 27 I.3.3 Method that ue expliit fae ynthei during the image analyi 30 II Realiti Faial Animation & Fae Cloning 35 II.1 Undertanding the Conept of Realim in Faial Animation 37 II.2 The Semanti of Faial Animation 40 II.3 Animating Realim 45 II.4 Privay and Seurity Iue about Fae Cloning: Watermarking Poibilitie 47 III Invetigated FA Framework for Teleommuniation 49 III.1 Introdution 51 III.2 Framework Overview 53 III.3 Our FA Framework from a Teleom Perpetive 55 III.3.1 Coding fae model and faial animation parameter: an MPEG-4 perpetive 58 III.3.2 Faial animation parameter tranmiion 60 III.4 Faial Motion Analyi: Coupling Expreion and Poe 69

15 xii IV Faial Non-rigid Motion Analyi from a Frontal Perpetive 73 IV.1 Introdution 75 IV.2 Eye State Analyi Algorithm 77 IV.2.1 Analyi deription 78 IV.2.2 Experimental evaluation and onluion 81 IV.3 Introduing Color Information for Eye Motion Analyi 87 IV.3.1 Eye opening detetion 87 IV.3.2 Gaze detetion implifiation 88 IV.3.3 Analyi interpretation for parametri deription 88 IV.3.4 Experimental evaluation and onluion 91 IV.4 Eyebrow Motion Analyi Algorithm 94 IV.4.1 Anatomial-mathematial eyebrow movement modeling 95 IV.4.2 Image analyi algorithm: deduing model parameter 97 IV.4.3 Experimental evaluation and onluion 101 IV.5 Eye-Eyebrow Spatial Correlation: Studying Extreme Expreion 106 IV.5.1 Experimental evaluation and onluion 108 IV.6 Analyi of mouth and lip motion 110 IV.6.1 Introdution 110 IV.6.2 Modeling lip motion with omplete mouth ation 114 IV.6.3 Image analyi of the mouth area: Color and intenity-baed egmentation 121 V Extending the Ue of Frontal Motion Template to any other Poe 131 V.1 Introdution 133 V.2 Feature Template Adaptation 134 V.3 Obervation Model 136 V.3.1 Mathematial deription of the model 137 V.4 Model Inverion 140 V.4.1 Feature urfae approximation 141 V.5 3D Definition of Feature ROI 143 V.6 3D Template Modeling for Eye, Eyebrow 145 V.6.1 Eye 145 V.6.2 Eyebrow 147 V.7 Auray of the Adaptation: A Theoretial Study 148 V.7.1 Influene of the urfae modeling 148 V.7.2 Error propagation 150 V.8 Uing other urfae for the algorithmi 3D-extenion 160

16 xiii VI Tehnial Evaluation of Coupling Faial Expreion Analyi and Head PoeTraking 163 VI.1 Introdution 165 VI.2 Deription of the Sytem 167 VI.2.1 Charateriti of video input 168 VI.2.2 Head model ynthei 170 VI.2.3 Deription of the viual tet-bed ued and how it real implementation would be 183 VI.3 Head-Poe Traking Baed on an Extended Kalman Filter 185 VI.3.1 Theoretial review 185 VI.3.2 Ue of the extended Kalman filter in our ontext 186 VI.3.3 Influene of the traking dynami on the expreion analyi 189 VI.4 Evaluating the Motion Template Extenion 192 VI.4.1 Interferene in the image-proeing: Deformation of the ROI and introdution of artifat from other feature 193 VI.4.2 Influene of the urfae linear approximation 194 VI.4.3 Qualitative evaluation of the motion template algorithmi extenion to 3D 202 VI.5 Analyzing Real-time Capabilitie 206 VI.5.1 Time performane evaluation of the algorithm 207 VI.6 Conluion 211 Conluion and Future Work Conluion and main ontribution Future work Publiation derived from thi reearh 219 Appendie Bibliographial Referene a I Réumé étendu en françai réumé - 1

18 xv Table of Figure Figure I-1. Image input i analyzed in the earh for the fae general harateriti: global motion, lighting, et. At that point ome image proeing i performed to obtain ueful data that an be afterward interpreted to obtain fae animation ynthei... 9 Figure I-2. Top: A typial illutration of a two tate HMM. Cirle repreent tate with aoiated obervation probabilitie, and arrow repreent non-zero tranition ar, with aoiated probability. Bottom: Thi i an illutration of a five tate HMM. The ar under the tate irle model the poibility that ome tate may be kipped Figure I-3. Fae feature (eye, mouth, brow, ) are extrated from the input image; then, after analyzing them, the parameter of their deformable model are introdued into the NN whih finally generate the AU orreponding to the fae expreion. Image ourtey of The Roboti Intitute at the Carnegie Mellon Univerity Figure I-4. Traking example of Pighin ytem. The bottom row how the reult of fitting their model to the target image on the top row. Image ourtey of the Computer Siene Department at the Univerity of Wahington Figure II-1. It i imple to undertand how we an generate loned ation in a realiti way if we enure that the group of ation C S R Figure II-2. The freedom of ation permitted on avatar V i greater than the one allowed on realiti head model R. Avatar are meant to jut be a rough repreentation and therefore they are limited by the nature of their model. V R = R V i the group of ation performed on an avatar that will make it behave like a human being Figure II-3. A peifi FA ytem ha it own animation deigned one: Ai ={ a i n }. i Afterward, minimal ation, a n, are gathered in more or le big group aoiated by the emanti of their movement Figure II-4. Ation deigned to animate fae model in a peifi FA ytem (Aji) are grouped following the emanti of the motion. In (a) we illutrate how the emanti of the generated animation, Aj1, parameter and thoe of the animation ytem, Aj2, are the ame. In (b) the general ation generated i expreed by mean of everal ation of the FA ytem; and in () we need to generate divere general ation to animate jut one Figure II-5. When generation and ynthei of animation are in reonane, all generated movement are ompletely undertood and reprodued. If the ytem of animation generation doe not follow the emanti of the fae motion ynthei, there i miundertanding and we need to adapt motion parameter to have ome undertanding. Thi i poible if both ide, generation and yntheti animation, hare at leat ome minimal motion ation Figure III-1. When uing lone animation for ommuniation, there exit two main ative part. The faial animation parameter generator (green print), whih i inluded in the enoding/tranmiion-part and doe heavy image proeing; and the faial animation engine (orange print), whih i loated in the reeiver and whoe tak i regenerate the faial motion on the peaker lone by interpreting fap. The framework herein preented ue yntheti lone image feedbak to improve the image analyi and generate more aurate motion information Figure III-2. Fae animation parameter (FAP) are the generated numerial value aoiated to the faial motion we want to yntheize. We are intereted in building enoding and deoding tehnique to effiiently ompre and tranport animation

19 xvi data over different teleom network. Proprietary olution (b) enure perfet ommuniation between motion generation and ynthei. Uing tandardized olution, for intane, MPEG-4 oding (a), enable interoperability amongt different ytem, at the expene of readapting animation to the enoding requirement and maybe loing animation preiion. Teleonferening ytem () are an example of appliation that would profit of the introdution of faial animation analyi and ynthei Figure III-3. Paket Deriptor of the PFAP Payload...63 Figure III-4. High Level Networking Arhiteture...64 Figure III-5. Detailed deription of the omplete networking apabilitie of the Server (analyi) and the Client (ynthei)...65 Figure III-6. Comparing buffering trategie...71 Figure III-7. General diagram of the propoed analyi framework...75 Figure IV-1. General diagram of the propoed analyi framework The part related to the faial expreion analyi have been highlighted...80 Figure IV-2. The analyzed reult of eah eye feature are filtered through thi Temporal Diagram. Current eye tate SLt and SRt are ontrated to obtain a ommon tate for both eye: St. Sine the tate Sloe doe not have any phyial information about the pupil loation i ignored for future analyi in the temporal hain. The tarting tate i fixed with the X, Y of the eye in their neutral poition...81 Figure IV-3 Thi diagram how the implet ubdiviion poible for the eye ROI o to extrat meaningful motion information regarding the eye ation...81 Figure IV-4. Four frame extrated from eah of the analyzed equene: FRONTAL, NEON, RIGHT SIDE and LEFT SIDE, repetively...85 Figure IV-5. The upper graph depit the evolution of the extrated data regarding the pupil X loation for both eye. The lower graph repreent the reulting X loation after applying the Temporal State Diagram. It how the reult for a f ( X, Y, W, H ) that quantize with an auray of 3%, 5% and 10% of WROI (example equene: NEON )...86 Figure IV-6. The upper graph depit the evolution of the extrated data regarding the pupil Y loation for both eye. The lower graph repreent the reulting X loation after applying the Temporal State Diagram. It how the reult for a f ( X, Y, W, H ) that quantize with an auray of 3%, 5% and 10% of HROI (example equene: NEON )...91 Figure IV-7. Quantization of HPO looking for left eye... Figure IV-8. Temporal State Diagram for the eye ation traking with implified gaze analyi. Sit(R/L) repreent a determined tate i at time t for either the right of left eye and St the final reult. Chek Table IV-3 for the tate ombination...91 Figure IV-9. Analyi Graph for a teted equene. (a) EyeOpening (two quantization level). (b) GazeDetetion (three quantization level) Figure IV-10. Several mule generate the eyebrow movement. Upward motion i mainly due to the Frontali mule and downward motion i due to the Corrugator, the Proeru and the Orbiulari Ouli mule...95 Figure IV-11. Eyebrow model arh for the left eye and it oordinate referene. The origin for the analyi algorithm it i alway ituated at the inner extreme of the eyebrow (loe to the noe) and defined for the eyebrow in it neutral tate...96 Figure IV-12. The ation of the eyebrow behavior model applied over the neutral arh reult on a mooth deformation. The graph on the left depit eyebrow riing motion (upward) for poitive value of Ffx, Ffy and Ffy. The graph on the right repreent eyebrow frowning (downward) for poitive value of Fx, Foox, Fy and Fooy...97 Figure IV-13. The eyebrow hange it hair denity a it goe away from the inner extreme. The bone truture of the kull determine the hading differene along the eyebrow.

20 We et two different binarization threhold: Th1 for the InZygomati zone and Th2 for the ExtZygomati Figure IV-14. The eyebrow two part binarization lead to good determination of the eyebrow area but it alo may introdue artifat by labeling eye or hair a part of the eyebrow. In the urrent eyebrow binary image we ee how the eye ha alo been deteted Figure IV-15. Our tet were performed over video equene where the lighting over the fae wa not uniform. No environmental ondition were known beide the exat loation of the ROI inluding the eyebrow feature, whih remained unhanged through the equene. Here we preent the frame analyzed to obtained the reult preented in Figure IV Figure IV-16. Corret binarization and thinning learly give the data from whih to extrat the model parameter. Graph (b) plot the mixed reult from the analyi of two different video equene. Neut. Seq.2 i the analyi of a frame where the eyebrow wa relaxed taken from a equene different from the Fr equene. Thi omparion imulate what would happen if the poe of the peaker hanged during the analyi. The poe motion would aue the movement of the eyebrow but the algorithm would interpret it a a loal eyebrow expreion (being upward when in reality it i neutral) We mut ontrol the poe of the uer to ompletely exploit the algorithm in pratial appliation Figure IV-17. The anatomi-mathematial motion model niely repreent the eyebrow deformation. We ee on frame 28 how the trange thinning reult obtained at the beginning of the arh, probably due to the eyebrow-eye blending during binarization, woren the algorithm auray. Although the obtained parameter till orretly interpret the general downward movement, howing fair robutne, they are no longer able to expre the exat motion intenity Figure IV-18. Thee plotted reult from three different equene: Ana2, Caroline and Jean- Lu illutrate the analyi behavior of the algorithm under different ondition. The algorithm prove to detet the right movement (the mean differene dereae) and to etimate the motion parameter orretly (the area dereae). We oberve the bet behavior for extreme eyebrow expreion. Ana2 equene ue rate: 90.71%, Caroline equene ue rate: 78.38% and Jean-Lu equene ue rate: 82.26% Figure IV-19. The bai Temporal State Diagram applied to eye analyi and built on only inter-eye ontraint (Figure IV-2)an be omplemented to take into aount the data obtained from the eyebrow analyi Figure IV-20. When the eye i loed (lower row), the eyelid hange due to eyebrow ation an be taken a ome peifi animation. When the eye i open (upper row) it mut be taken into aount to alter the tandard y motion of the eyelid Figure IV-21. Eyebrow fap magnitude evolution (0-fap MAX) taken from equene NEON Figure IV-22. Eyelid tandard analyi reult ompared to the orreted reult after orreting the former with the eyebrow data. Analyi made on equene NEON Figure IV-23. Thee illutration preent the bone and the mule involved in the generation of mouth ation (from Image of mule and bone of the head, 2002) Figure IV-24. The hoen ontrol point oinide with the ending extreme of the major mule that intervene in mouth motion Figure IV-25. Shemati repreentation of the mouth lip and the ontrol point ating on their left ide Figure IV-26. Thee image how the reult (red olor) of applying the diplaement hown in the Table preented below onto the ontrol point of a mouth on it neutral tate (grey olor). The global deformation of the mouth i obtained uing the linear approximation propoed. Mouth proportion in neutral tate: L=8 & H= Figure IV-27. (Figure 6) Shemati repreentation of how jaw motion influene the teeth-lip loation plotted for ome key mouth movement Figure IV-28. Intenity hitogram xvii

21 xviii Figure IV-29. Hue hitogram Figure IV-30. Area delimited for the hitogram tudy and for the mouth motion analyi Figure IV-31. Sreen hot of ome of the 60 video analyzed for the tet. In the image the lip area are urrounded by read and the lip eparation and darker inner part of the mouth deteted in blak. The eond approah wa ued for the analyi of the preented hot Figure V-1. Thi diagram illutrate the general adaptation proe applied to the eye analyi algorithm. Firt, the vertie that define the 3D ROI on the linear urfae model are projeted onto the image plane. Then the image-proeing algorithm retrieve the deired information analyzing inide the delimited area. To undertand the motion of the feature, data are interpreted in 3D pae, over the motion model that ha been defined on the linear urfae approximation of the eye feature viewed from a frontal perpetive. One the motion i interpreted it an be reprodued on a yntheti head model. The projetion and the undertanding of the image information are poible beaue the ytem ontrol the 3D poe of the head with repet to the amera Figure V-2. Shema of the referene ytem and amera model ued (of foal length F) for the adaptation proe. It etablihe the relationhip of a point in the Eulidean ( ) pae x T n = x n,y n,z n ( x,y ) and it projeted ounterpart on the amera image plane T T F x n F yn x p = p p =, F z n F z n. The axi orientation i uh that the amera only ee the negative part of the Z-axi Figure V-3. Example of deformation and framing of one feature ROI Figure V-4. Thee graph depit the evolution of the feature projeted ROI depending on the poe of the head. We oberve the influene of eah of the poe parameter independently and the angle jointly. The tudied obervation model imulate the ROI of an eye. It ha F = 15 A unit, the major axi ( A ) and the minor axi ( B ) are defined from: 1 n = (20,0,4.8) ; 2 n = (25,7.5,4.8) ; 3 n = (30,0,4.8) and 4n = (25, 7.5,4.8) ; the area of the ROI urfae omputed in 3D i 150 unit Figure V-5. The eye ROI on the image mut follow and rehape aording to the view that we have of the feature for a given head poe. Figure (a) hematially how how the originally deigned eye tate analyi algorithm annot be applied diretly on the eye a oon a there exit a rigid motion omponent of the final movement. In Figure (b) the eye model and it linear urfae approximation are preented Figure V-6. The eyebrow ould be approximated by that urfae that tangently follow the eyebrow moment along the forehead. It plane approximated i the average z value of the point de delimit the eyebrow ROI Figure V-7. There exit a olution to the inverion a long a the plane that approximate the feature urfae doe not take, after the rigid motion of the head ha been applied, an orientation parallel to vetor r Figure V-8. With = π /2 or = π /2, the plane, a it i loated in the preented example, generate an undetermined olution that doe not permit the inverion of the ytem around the oberved point Figure VI-1. Two reen hot of the tet etting. On the mot left window we preent the video input and the projetion of the analyi reult and the evolution of the ROI, uitable for viual inpetion of the feature analyi performane; on the mot right window the yntheti reprodution (projeted uing OpenGL) of the uer lone i repreented, allowing u to ontrol the evolution of the head traking algorithm. Sine we ue a highly realiti model to perform the traking we utilize it 3D data to do the algorithmi adaptation: we redefine the motion animation model and their ROI on it...166

22 xix Figure VI-2. Setting for the tehnial evaluation: jut one omputer and one amera are ued Figure VI-3. Syntheti image and video input are blended to be able to initialize the analyi ytem Figure VI-4. Image (a) wa reorded with a tandard amera and aquiition ard. Image (b) wa obtained from a typial web amera Figure VI-5. Referene ytem of the video image in memory Figure VI-6. Different model were ued for the pratial implementation of the algorithm. Very dene wireframe model were utilized to extend the ue of our expreion analyi algorithm (a-), a le heavy verion of thee model wa rendered during the oupling of head traking and expreion analyi. An animated avatar ubtituted the realiti head model of the peaker to evaluate the naturalne of the animation ynthei reated from the parameter obtained from the analyi Figure VI-7. Tree truture of the MPEG-4 oded head model Figure VI-8. A fae model in it neutral tate and the feature point ued to define FAP unit (FAPU). Fration of ditane between the marked key feature are ued to define FAPU. MPEG Figure VI-9. At leat the Fae Definition Point mut be peified on the head model wireframe to define it to allow all animation ytem to utomize their own model. Our model inlude thee point in their meh, enuring the orret undertanding of MPEG-4 animation parameter. MPEG Figure VI-10. Kalman' tranform (a) and MPEG-4 tranform ation (b) Figure VI-11. Perpetive projetion model and referene ytem in the yntheti world generated by OpenGL. The objet are foued on the znear plane and they are rendered on the viewport. The viewport i determined by t=top, r=right, b=bottom and l=left, and take the ize that will be preented on the reen window that mut math the ize harateriti of the video data Figure VI-12. Shema of the referene ytem and amera model ued (of foal length F) for the adaptation proe. It etablihe the relationhip of a point in the Eulidean ( ) pae x T n = x n,y n,z n ( x,y ) and it projeted ounterpart on the amera image plane T T F x n F yn x p = p p =, F z n F z n. The axi orientation i uh that the amera only ee the negative part of the Z-axi Figure VI-13. Side view of the propoed obervation model and it OpenGL pratial implementation. In both ytem, referene ytem and omponent lightly differ but poe and motion deription tay the ame Figure VI-14. Real and reovered Y poition of a ample equene Figure VI-15. Thee two graph how the flutuation that the Kalman filter introdue in the poe value utilized for the head traking and the expreion analyi algorithmi extenion. Head model dimenion: WIDTH = ; HEIGHT = 2871 m_u & DEPTH = m_u Figure VI-16. The eyebrow-motion analyi algorithm ha been able to avoid the influene of the eye feature that i alo overed by the eyebrow ROI when analyzing the right eyebrow. For the analyi of the left eyebrow, the inauray of the ROI determination prevent the algorithm from properly deteting the eyebrow and it detet the eye intead Figure VI-17. Data extrated during the eye-tate traking-algorithm tudy Figure VI-18. Coordinate extrated from the 3D head model of the peaker ued to onform the ROI for the eye-tate traking-algorithm adaptation Figure VI-19. Maximum error in the average X and Y omponent found during the tudy. We have alo indiated the FAP magnitude at whih thee value ourred Figure VI-20. Faial animation parameter for the eye were extrated uing the eye-tate traking algorithm and immediately applied and rendered on the Olivier avatar

23 xx Figure VI-21. Sequene of hot extrated from eye&eyebrpwoupled.avi. The right eyebrow ha been extrated. We oberve the evolution of the head rotation at the ame time a the eyebrow move upward. The pupil traking from the right eye i alo plotted &204 Figure VI-22. Evolution of the ytem peed veru the omplexity of the head model being rendered. Kalman poe traking wa done following 10 feature Figure VI-23. Evolution of the ytem peed veru the quantity of feature utilized during the head traking. The model ued had 2574 vertie. No expreion analyi wa made Figure VI-24. Evaluation of the omputing peed ot of the expreion analyi. Kalman poe traking wa done following 10 feature and the model ued had 2574 vertie...209

24 xxi Table of Table Table I Table IV Table IV Table IV Table IV Table IV Table V Table V Table V Table V Table V Table V Table V Table V Table VI Table VI Table VI Table VI Table VI Table VI Table VI Table VI Table VI

26 xxiii Notation Convention Lower ae letter Capital letter Itali n P t ϕ ϕ ϕ 2D 3D ANN AU BIFS deg extrat F f/ FA FAP fap FDP H HCI HMM HS I ICA MPEG-4 oordinate and vetor omponent repreent matrie and vetor Word or expreion with peial tre or in a foreign language Variable in mathematial expreion neutral 3D-oordinate projeted oordinate tan(ϕ) in(ϕ) o(ϕ) Two-dimenional Three-dimenional Artifiial Neural Network Ation Unit Binary Format for Sene degree Speifi data extrated from referene of doument foal ditane frame per eond Faial Animation Faial Ation Parameter (MPEG-4 ompliant) General faial animation parameter Faial Definition Point Hue Human Computer Interfae Hidden Markov model Hue and Saturation pixel omponent Intenity Independent Component Analyi Motion Piture Expert Group 4 tandard

27 xxiv OGL PCA PDF pel pel/ ROI VRML OpenGL Prinipal Component Analyi Probability Denity Funtion pixel Pixel per eond Region of Interet Virtual Reality Modeling Language

30 Introdution 1 Motivation Fae loning ha beome a need for many multimedia appliation for whih human interation with virtual and augmented environment enhane the interfae. It promiing future in different area uh a mobile telephony and the Internet ha turned it into an important ubjet of reearh. Proof of thi interet i the inreaing appearane of ompanie offering their utomer the reation of utomized yntheti fae and the government upport through publi grant like the European Projet INTERFACE (1999). We an laify yntheti fae in two major group: avatar and lone. Avatar generally are a rough or ymboli repreentation of the peron. Their appearane i not very aurate. They are peaker-independent beaue their animation follow general rule independently of the peron that they are aumed to repreent. Mot of the urrent ommerial yntheti fae fall in thi ategory. In ome appliation, avatar do not ompletely pleae people beaue they may reate a feeling of mitrut. In (Otermann & Millen, 2000), the author expoe how a imple avatar pleae more than an avatar utomized by texturing it with a real human fae. Indeed, people prefer a good avatar animation to a bad lone ynthei. Clone mut be very realiti and their animation mut take into aount the nature of the peron: they need to be peakerdependent. Motivated by the multiple advantage and improvement that uing realiti virtual harater ould upply to teleommuniation, we want to invetigate the feaibility of uing them in traditional videoonferene ytem, with jut one ingle amera. Thi diertation over the reearh developed on the reation of new faial motion and expreion analyi algorithm in order to repliate human motion on realiti head model that will be ued in teleommuniation appliation. The omplete development of our analyi framework i baed on the hypothei that a realiti 3D-head model of the peaker in front of the amera i available. We believe that realiti motion an only be reprodued on realiti head model and, in uh a ae, the model i already available to the ytem. The mot aurate information obtained from monoular video equene taken from tandard environment (with

31 2 Introdution unknown lighting; no marker; ), an only be retrieved if ome data about the uer geometry i known a priori, for example, by uing hi realiti lone, a we do. 2 Contribution We propoe new image analyi algorithm for peifi feature of the fae (eye, eyebrow and mouth) that try to profit a muh a poible of the phyiognomy and the anatomy of the peaker head. Firt, thee tehnique have been defined and teted for a frontal poition: Eye State Traking: We have developed lighting-independent eye motion etimation algorithm that ue natural anatomial intra-feature ontraint to obtain gaze and eyelid behavior from the analyi of the energy ditribution on the eye area. We have alo teted the poibility of uing olor information during the analyi. We interpret the analyi reult in term of ome peifi ation unit that we aoiate to the temporal tate. Following a Temporal State Diagram that ue inter-feature ontraint to et the oherene between both eye, we relate our analyi reult to the final parameter that deribe the eye movement. Eyebrow Movement Analyi: To tudy eyebrow behavior from video equene, we utilize a new image analyi tehnique baed on an anatomialmathematial motion model. Thi tehnique oneive the eyebrow a a ingle urved objet (arh) that i ubjet to the deformation due to muular interation. The ation model define the implified 2D (vertial and horizontal) diplaement of the arh. Our video analyi algorithm reover the needed data from the arh repreentation to dedue the parameter that deformed the propoed model. The omplete oular expreion analyi i obtained after applying ome interfeature ontraint among eye and eyebrow. Thi allow u to enrih the amount of motion information obtained from eah feature, by omplementing it with the information oming from another one. Mouth: It i the mot diffiult feature to analyze; therefore we believe that a hybrid trategy to derive it motion hould be utilized: voie and image onjointly. Our analyi i baed on the following fat: mouth motion may exit even if no word are poken and voiele mouth ation are important to expre emotion in ommuniation. Thi thei preent ome early reult obtained from the analyi tehnique deigned to tudy the viual apet of mouth behavior. We dedue whih are the mouth harateriti available from the fae, that may be the mot ueful when lighting ondition are not known, and how thee harateriti may be analyzed

32 Introdution 3 together to extrat the information that will ontrol the muular-baed motion-model template propoed for it analyi. The main ontribution of our work ome from the tudy of the oupling of thee algorithm with the poe information extrated from the rigid head motion traking ytem. The preented tehnique allow the uer more freedom of movement beaue we are able to ue thee algorithm a independently of the peaker loation a poible. Faial Expreion Analyi Robut to 3D-Head Poe Motion: Kalman filter are often ued in head traking ytem for two different purpoe: the firt one i to temporally mooth out the etimated head global parameter, the eond one i to onvert the poition of the 2D faial feature obervation on the video image into 3D etimate and predition of the head poition and orientation. In our appliation, the Kalman filter i the entral node of our fae-traking ytem: it reover the head global poition and orientation, it predit the 2D poition of the feature point for the mathing algorithm, and thi i the point exploited for teleom appliation it make the yntheized model have the ame ale, poition, and orientation a the peaker fae in the real view, depite the aquiition by a nonalibrated amera. Having already developed and poitively teted fae feature analyi algorithm for head tudied from a frontal perpetive, we need to adapt thee algorithm to any poe. In fat, all developed analyi algorithm ount on the beforehand definition of the Region of Interet to be analyzed and the automati loation of the intereting feature (eye, eyebrow and mouth). The olution we propoe define the feature region to be analyzed and the parameter of the motion template of eah feature on 3D, over the head model in it neutral poition to automatially obtain them on the image plane thank to the poe data extrated from the traking. The omplete proedure goe a follow: ( i ) We define and hape the area to be analyzed on the video frame. To do o, we projet the 3D-ROI defined over the head model on the video image by uing the predited poe parameter of the yntheized lone, thu getting the 2D- ROI. ( ii ) We apply the feature image analyi algorithm on thi area extrating the data required. ( iii ) We interpret thee data from a three-dimenional perpetive by inverting the projetion and the tranformation due to the poe (data pa from 2D to 3D). At thi point, we an ompare the reult with the feature analyi parameter already predefined on the neutral lone and deide whih ation ha been made.

33 4 Introdution The tehnique we ue differ from other previou approahe on that we expliitly ue the lone 3D data: the loation of the model vertie, to define the analyi algorithm on 3D. The main advantage of our olution are the omplete ontrol of the loation and hape of the region of interet (ROI), and the reutilization of robut image analyi algorithm already teted over fae frontally looking toward the amera. Other ontribution: The thei ontain analye and diuion about the role of faial animation in teleommuniation. We have alo given a formal deription of faial animation uing yntheti model in term of the generation and the undertanding of motion parameter. Thi theoretial explanation enable the laifiation of omplete faial animation ytem by omparing their performane regarding the degree of realim they permit. It alo deribe a framework to undertand the level of interoperability among different faial animation ytem.

34 Introdution 5 3 Outline of the Thei Report Thi thei i organized a follow: In Chapter I, we review extenively ome tate-of-the-art analyi tehnique for expreion analyi on monoular image and their related proeing algorithm. Some of the idea that help to undertand realiti faial animation in the ontext of ommuniation are explained in Chapter II. We have developed the notion of realim and motion emanti to ituate the onept of fae loning in our reearh. In Chapter III, we deribe the propoed faial animation framework for teleom appliation. The requirement of the faial motion analyi tehnique needed for the tudied framework are alo detailed. Chapter IV inlude the development and performane tet of the propoed novel analyi tehnique to tudy faial feature motion on monoular image. It ontain the image proeing algorithm and related analyi motion template for eye, eyebrow and mouth. Experiment how the onveniene and robutne of utilizing anatomial knowledge to et intra-feature and inter-feature ontraint during the analyi. In Chapter V, we detail the proedure to ouple the ue of the developed feature analyi tehnique with the knowledge of the head poe. The hapter alo inlude the theoretial tudy of the influene of the poe predition on the analyzed reult. Chapter VI ontain the performane analyi of the poe-expreion oupling in our fae animation framework. To experimentally evaluate the propoed oupling approah, the tehnique developed in Chapter IV are adapted following the proedure detailed in Chapter V in order to ue the poe traking tehnique baed on Kalman filtering propoed by Valente and Dugelay (2001). The thei onlude with a ummary and ome omment on future perpetive.

36 I Faial Image Analyi Tehnique & Related Proeing Fundamental Reearher from the Computer Viion, Computer Graphi and Image Proeing ommunitie have been tudying the problem aoiated with the analyi and ynthei of fae in motion for more than 20 year. The analyi and ynthei tehnique developed an be ueful for the definition of low bit-rate image ompreion algorithm (model-baed oding), new inema tehnologie a well a for the deployment of virtual reality appliation, videoonferening, et. A omputer evolve toward beoming more human oriented mahine, human-omputer interfae, behaviorlearning robot, and diable adapted omputer environment will ue fae expreion analyi to be able to reat to human ation. The analyi of motion and expreion from monoular (ingle) image i widely invetigated firt, beaue image analyi i the leat invaive method to tudy natural human behavior and, eond, beaue non-tereoopi tati image and video are the mot affordable and extenively ued viual media.

38 Faial Image Analyi Tehnique & Related Proeing Fundamental 9 I.1 Introdution Many video enoder do motion analyi over video equene to earh for motion information that will help ompreion. The onept of motion vetor, firt oneived at the time of the development of the firt video oding tehnique, i intimately related to motion analyi. Thee firt analyi tehnique help to regenerate video equene a the exat or approximate reprodution of the original frame, by uing motion ompenation from neighboring piture. They are able to ompenate but not to undertand the ation of the objet moving on the video and therefore they annot retore the objet movement from a different point of view, or immered in a three-dimenional enario. Current trend in reearh fou on the development of new way of ommuniating through the ue of viual tool that would permit more human interation while ommuniating. For intane, thi interation i ought when uing 3D in reating virtual teleonferene room. A aid before, traditional motion analyi tehnique are not uffiient to provide the information needed for thee appliation. Fae play an eential role in human ommuniation. Conequently, they have been the firt objet whoe motion ha been tudied in order to rereate animation on yntheized model or to interpret motion for an a poteriori ue. Figure I-1 illutrate the bai flowhart for ytem dediated to faial expreion and motion analyi on monoular image. Video or till image are firt analyzed to detet, ontrol and dedue the fae loation on the image and the environmental ondition under whih the analyi will be made (head poe, lighting ondition, fae oluion, et.). Then, ome image motion and expreion analyi algorithm extrat peifi data that i finally interpreted to generate fae motion ynthei. Video Image PRE-MOTION ANALYSIS FACE MOTION IMAGE ANALYSIS MOTION INTERPRETATION Fae Synthei Camera alibration Illumination analyi Head detetion Poe determination Optial flow PCA Snake Segmentation Deformable model Fae feature modeling Parameter etimation Figure I-1. Image input i analyzed in the earh for the fae general harateriti: global motion, lighting, et. At that point ome image proeing i performed to obtain ueful data that an be afterward interpreted to obtain fae animation ynthei

39 10 Introdution Eah of the module may be more or le omplex depending on the purpoe of the analyi (i.e., from the undertanding of general behavior to exat 3D-motion extration). If the analyi i intended for later fae expreion animation, the type of Faial Animation ynthei often determine the methodology ued during expreion analyi. Some ytem may not go through either the firt or the lat tage or ome other may blend thee tage in the main motion & expreion image analyi. Sytem laking the pre-motion analyi tep are mot likely to be limited by environmental ontraint like peial lighting ondition or pre-determined head poe. Thoe ytem that do not perform motion interpretation do not fou on delivering any peifi information to perform fae animation ynthei afterward. A ytem that i uppoed to analyze video to generate fae animation data in a robut and effiient onit of the three module. The approahe urrently under reearh and that will be expoed in thi etion learly perform the faial motion and expreion image analyi and to ome extend the motion interpretation to be able to animate. Neverthele, many of them fail to have a trong pre-motion analyi tep to enure ome robutne during the ubequent analyi. Thi hapter review urrent tehnique for the analyi of ingle image to derive animation. Thee method an be laified baed upon different riteria: 1. the nature of the analyi: global veru feature-baed, real-time oriented, ; 2. the omplexity of the information retrieved: general expreion generation veru peifi fae motion; 3. the tool utilized during the analyi, for intane, the ooperation of a 3D head model; 4. the degree of realim obtained from the Fae Animation (FA) ynthei; and 5. the environmental ondition during the analyi: ontrolled or uniform lighting, head-poe independene. Table I-1, in page 33, depit a rough evaluation of the tehnique that we review by omparing thee riteria, onidering the data provided by the referened artile, book and other bibliographial material and the appreiation of the author. In thi etion, the ytem will be preented in three major ategorie, grouped following the exiting relationhip between the image analyi and the expeted FA ynthei, namely: Method that retrieve emotion information: thee are the ytem whoe motion & expreion analyi aim at undertanding fae motion in a general manner. Thee tehnique evaluate the ation in term of expreion: adne, happine, fear, joy, et. Thee expreion are ometime quantified and then interpretable by FA ytem but the analyi tehnique are not onerned about the FA itelf.

40 Faial Image Analyi Tehnique & Related Proeing Fundamental 11 Method that obtain parameter related to the FA ynthei ued: thi inlude the method that apply image analyi tehnique over the image in the earh for peifi meaurement diretly related to the animation ynthei. Method that ue expliit fae ynthei during the image analyi: ome tehnique ue the expliit ynthei of the generated animated 3D-head model to ompute meh diplaement, generally via a feedbak loop. Regardle of the ategory they belong to, many of the method that perform faial analyi on monoular image to generate animation hare ome image proeing and mathematial tool.

41 12 Proeing Fundamental I.2 Proeing Fundamental I.2.1 Pre-proeing tehnique The ondition under whih the uer may be reorded are ueptible to hange from one ytem to another, and from one determined moment to the next one. Some hange may ome from the hardware, for intane, the amera, the lighting environment, et. Furthermore, even though only one amera i ued, we annot preuppoe that the peaker head will remain motionle and looking traight onto that amera at all intant. Therefore, pre-proeing tehnique mut help to homogenize the analyi ondition before tudying non-rigid fae motion, therefore in thi group we alo inlude head detetion and poe determination tehnique. Camera alibration Aurate motion retrieval i highly dependent on the preiion of the image data we analyze. Image reorded by a amera undergo different viual deformation due to the nature of the aquiition material. Camera alibration an be een a the tarting point of a preie analyi. If we want to expre motion in real pae we mut relate the motion meaured in term of pixel oordinate to the real/virtual world oordinate, that i, we need to relate the image referene frame to the world referene frame. Simply knowing the pixel eparation in an image doe not allow u to determine the ditane of thoe point in the real world. We mut derive ome equation to link the world referene frame to the image referene frame in order to find the relationhip between the oordinate of point in 3D-pae and the oordinate of the point in the image. In Appendix I-A we deribe the bai of amera alibration. The developed method an be laified into two group: photogrammeti alibration and elf-alibration. We refer the reader to (Zhang, 2000) and (Luong & Faugera, 1997) for ome example and more detail about thee approahe. Although amera alibration i baially ued in Shape From Motion ytem, above all, when aurate 3D-data i ued to generate 3D-meh model from video equene of tati objet, it i a deired tep for fae analyi tehnique that aim at providing motion auray.

42 Faial Image Analyi Tehnique & Related Proeing Fundamental 13 Illumination analyi and ompenation Other unknown parameter during fae analyi are the lighting harateriti of the environment in whih the uer i being filmed. The number, origin, nature and intenity of the light oure of the ene an eaily tranform the appearane of a fae. Fae refletane i not uniform all over the fae and thu, very diffiult to model. Appendix I-B ontain information about the harateriti of the nature of light and one of the mot ommonly ued model for urfae. Due to the diffiulty of deduing the large number of parameter and variable that the light model ompute, ome aumption need to be taken. One ommon hypothei i to onider fae a lambertian urfae (only refleting diffue light), o a to redue the omplexity of the illumination model. Uing thi hypothei, Luong, Fua and Lerer (2002) tudy the light ondition of fae to be able to obtain texture image for realiti head ynthei from video equene. Other refletane model are alo ued (Debeve et al., 2000) although they fou more on reproduing natural lighting on yntheti urfae than on undertanding the onequene of the lighting on the urfae itelf. In mot ae, the analyi of motion and expreion on fae i more onerned with the effet of illumination on the faial urfae tudied than with the overall undertanding of the lighting harateriti. A fairly extended approah to appreiate the reult of lighting on fae i to analyze it by trying to ynthetially reprodue it on a realiti 3D-model of the uer head. Whether it i ued to ompenate the 3D model texture (Eiert & Girod, 2002) or to lighten the 3D model ued to help the analyi (Valente & Dugelay, 2001), it prove to be reaonable to ontrol how the lighting modifie the apet of the fae on the image. Head detetion and poe determination If we intend to perform robut expreion and fae motion analyi, it i important to ontrol the loation of the fae on the image plane and it i alo ruial to determine the orientation of the fae with regard to the amera. The find-a-fae problem i generally redued to the detetion of it kin on the image. The mot generalized method for kin detetion ue a probabiliti approah where the olorimetri harateriti of human kin are taken into aount. Firt, a probabiliti denity funtion - P( rgb kin) - i uually generated for a given olor pae (RGB, YUV, HSV, or other.). P( rgb kin) indiate what i the probability of a olor belonging to the kin urfae. It i diffiult to reate thi funtion a well a to deide whih will be threhold to ue to determine if the urrent pixel belong to the kin or not. Some approahe (Jone & Rehg, 1999) tudy in detail the olor model ued and alo give a probability

43 14 Proeing Fundamental funtion for the pixel that do not belong to the kin - P ( rgb kin). Other, like the one preented by Sahbi, Geman and Boujemaa (2002), perform their detetion in different tage, giving more refinement at eah tep of the detetion. More omplex algorithm (Garia & Tzirita, 1999) allow region with non-homogeneou kin olor harateriti to be found. Determining the exat orientation of the head beome a more ompliated tak. In general, we find two different way to derive the head poe: uing tati method and uing dynami approahe. Stati method earh for peifi feature of the fae (eye, lip orner, notril, et.) on a frame-by-frame bai, and determine the uer head orientation by finding the orrepondene between the projeted oordinate of thee feature and the real world oordinate. They may ue template-mathing tehnique to find the peifi feature, a Nikolaidi and Pita (2000) do. Thi method work fine although it require very aurate potting of the relevant feature; unfortunately, thi ation ha to be redone at eah frame and it i omewhat tediou and impreie. Another poibility i to ue 3D-data, for intane from a generi 3D-head model, to aurately determine the poe of the head on the image. Thi i the olution given by Shimizu, Zhang, Akamatu and Deguhi (1998). To introdue time onideration and to take advantage of previou reult, dynami method have been developed. Thee method perform fae traking by analyzing video equene a a more or le mooth equene of frame and they ue the poe information retrieved from one frame to analyze and derive the poe information of the next one. One of the mot extended tehnique involve the ue of Kalman filter to predit ome analytial data a well a the poe parameter themelve. We refer the reader to (Ström, Jebara, Bau & Pentland, 1999; Valente & Dugelay, 2001; Cordea, E. M. Petriu, Georgana, D. C. Petriu & Whalen, 2001) to find related algorithmi detail. Other approahe, like the one preented by Huang and Chen (2000), are able to find and trak more than jut one fae on a video equene but they do not provide any head poe information. Other tehnique (Zhenyun, Wei, Luhong, Guangyou & Hongjian, 2001; Spor & Rabetein, 2001), imply look for the feature they are intereted in. They find the feature rough loation but they do not dedue any poe from thi information beaue their proedure i not aurate enough.

44 Faial Image Analyi Tehnique & Related Proeing Fundamental 15 I.2.2 Image proeing algorithm The omplexity of expreion analyi i uually implified by trying to undertand either the hape of ome part of the fae, the loation of very peifi point or the hange in magnitude of ome harateriti of the area analyzed, a for example, it olor. In order to do thi, everal image-proeing tehnique are ued and tuned to work on human fae. In thi etion, we try to ummarize the bai of the mot ommonly ued algorithm. Optial flow There are two major method to perform motion etimation: either we math objet with no ambiguity from image to image, or we alulate the image gradient between frame. In the firt ae, the main goal onit in determining in one of the tudied image the group of point that an be related to their homologue in the eond image, thu giving out the diplaement vetor. The mot diffiult part of thi approah i the eletion of the point, or region to be mathed. For pratial purpoe many appliation ue an artifiial diviion of the image into blok. Blok mathing algorithm have long been ued in video oding. In general, the mot important diadvantage of thi kind of method i that it determine motion in a direte manner, and motion information i only preie for ome of the pixel on the image. In the eond ae, the field of diplaement vetor of the objet that ompoe a ene annot be omputed diretly: we an jut find the apparent loal motion, alo alled optial flow (OF), between two image. It omputation i alo retrited by the aperture problem explained in detail later onequently, the only omponent of the motion perpendiular to the ontour of an image an be etimated from loal differential data. The mot ued tehnique to ompute OF, the gradient-deent method, generate a dene optial flow map, providing information at the pixel level. It i baed on the uppoition that the intenity of a pixel I ( x, y, t ) i ontant from image to image, and that it diplaement i relatively mall. In thee irumtane we verify I I I ( I-1 ) u v = 0, x y t where = and v = t are the pixel diplaement between two image. Eah point u x t y on the image ha one equation with two unknown, u and v, whih implie that motion

45 16 Proeing Fundamental annot be diretly omputed (it an alo be een a a onequene of the aperture problem 1 ). There exit different method that try to olve ( I-1 ) iteratively. 2 A omplete lit of different optial flow analyi method an be found in (Wikott, 2001). Prinipal omponent analyi Eigen-deompoition Optial flow method are extenively ued in hape reognition but they do not perform well in preene of noie. If we want to identify a more general la of objet, it i onvenient to take into aount the probabiliti nature of the objet appearane, and thu, to work with the la ditribution in a parametri and ompat way. The Karhunen-Loève Tranform meet the requirement needed to do o. It bae funtion are the eigen vetor of the ovariane matrix of the la being modeled: T ( I-2 ) Λ = Φ ΣΦ, being Σ the ovariane matrix, Λ the diagonal matrix of eigen value and Φ the matrix of eigen vetor. The vetor bae obtained i optimal beaue it i ompat (we an eaily iolate vetor of low energy) and parametri (eah eigen vetor i orthogonal to the other reating a parametri eigenpae). Element of one la, that i a vetor whoe dimenion i M, an be repreented by the linear ombination of the M eigenvetor obtained for thi la. The Prinipal Component Analyi (PCA) tehnique tate that the ame objet an be reontruted by only ombining the N<M eigen vetor of greatet energy, alo alled prinipal omponent. It alo ay that we will approximate by minimizing the error differene if the linear oeffiient for the ombination are obtained from projeting the la vetor onto the ub-pae of prinipal omponent. 1 Equation ( I-1) i alled the optial flow ontraint equation ine it expree a ontraint on the omponent u and v of the optial flow. Thu, the omponent of the image veloity in the diretion of the image intenity gradient at the image of a ene point i We annot, however, determine the omponent of the optial flow at right angle to thi diretion. Thi ambiguity i known a the aperture problem 2 Some of the mot known algorithm are: Lua and Kanade, Ura, Fleet and Jepon, and Singh algorithm.

46 Faial Image Analyi Tehnique & Related Proeing Fundamental 17 Thi theory i only appliable to objet that an be repreented by vetor, uh a image. Therefore thi theory i eaily extenible to image proeing and generally ued to model the variability of 2D objet on image, like for example, fae. Very often PCA i utilized to analyze and identify feature of the fae. However, it introdue ome retrition. One of them i the need for one tage previou to the analyi during whih the bae of prinipal omponent vetor, in thi ae image, mut be generated. It alo fore all image being analyzed to be the ame ize. Uing PCA in fae analyi ha lead to the appearane of onept like Eigenfae (Turk & Pentland, 1991), utilized for fae reognition, or Eigenfeature (Pentland, Mohaddam & Starner, 1994) ued to tudy more onrete area of fae robutly. The book Fae image analyi by unupervied learning (Bartlett, 2001) give a omplete tudy of the trength and weaknee of the method baed on Independent Component Analyi (ICA 3 ) in ontrat with PCA. It alo inlude a full explanation of onept, like Eigenation, and deribe the mot reent approahe in fae image analyi. Ative ontour model - nake Ative ontour model, generally alled nake, are the geometri urve that approximate the ontour of an image by minimizing an energy funtion. Snake have been widely ued to trak moving ontour within video equene beaue they have the property of deforming themelve to tik onto a ontour that evolve with time. In general, the energy funtion an be deompoed in two term, an internal energy and an external energy: ( I-3 ) E total = E int Eext. The role of the external energy i to attrat the point of the nake toward the image ontour. The internal energy trie to enure ertain regularity on the nake while E at, from a patial a well a from a temporal perpetive. ext One the energy funtion i defined, we ue an iterative proe to find it minimum. We an undertand the minimum energy point a the equilibrium poition of a dynami ytem ubmitted to the fore derived from the energy funtion. We find the minimum energy by olving a dynami equation of eond order whoe form i imilar to: 3 Uing ICA mean to apply fatoring probability ditribution, and blind oure eparation to image analyi. Thi tehnique i related to other field - entropy and information maximization, maximum likelihood denity etimation (MLE), EM (expetation maximization, whih i MLE with hidden variable) and projetion puruit. It i baially a way of finding peial linear (non-orthogonal) o-ordinate ytem in multivariate data, uing higher-order tatiti in variou way ee (ICA, 2003).

47 18 Proeing Fundamental ( I-4 ) M U C U KU = F( t ).... Thi i why nake are often repreented by a group of weight (the ampling point of the ontour) onneted by pring (applying the internal fore among the point). U i vetor repreenting the oordinate of the ontour point, and M, C and K are the ma, the elatiity and the tiffne of the dynami ytem. F(t) i the fore funtion derived form the energy ontraint. At equilibrium, the ytem remain immobilized and follow the hape of the ontour. The mot diffiult about deploying nake i their initialization: we need to plae the ontour loe to the border that ha to be traked; otherwie, we may plae it loe to another ontour that alo minimize the energy funtion (loal minimum). Mathematial morphology - edge detetion & egmentation When analyzing image of fae under unontrained ondition, laial image filtering tehnique may not be robut enough to extrat all the information. Mathematial morphology appeared a an alternative math tool to proe image from a viual perpetive intead of from a numerial perpetive. The tehnique for mathematial morphology are baed on et-theoreti onept and non-linear uperpoition of ignal and image. Morphologial operation have been applied uefully to a wide range of problem inluding image proeing, analyi tak, noie uppreion, feature extration and pattern reognition. In (Serra, 1982, 1988), the author explain in depth how to take advantage of thee tehnique for the proeing of image. The et of tool give the mean to develop algorithm that effiiently detet edge and peifi area of the fae. One of the mot ued morphologial algorithm, the waterhed tranformation, i deribed in Appendix I-C. Deformable model (template) A deformable model i a group of parametri urve with whih we try to approximate the ontour of an image and the behavior of it objet. The advantage of a deformable template are it omputational impliity and the few number of parameter needed to deribe different hape. Unfortunately, ine a template i generally made peifially for a given hape, we need to redefine the rule of parameter variation o that the model follow and behave like the right ontour. It i alo reproahable the fat that it ha a diffiult adaptation to unexpeted hape, whih may beome a diadvantage when dealing with noiy image. The diverifiation of olution

48 Faial Image Analyi Tehnique & Related Proeing Fundamental 19 i well een in the literature, where we an find a many different model a artile treating the ubjet. Some of the mot ommon model are: Elliptial: irle and ellipoid an model the eye (Holbert & Dugelay, 1995). Quadrati: paraboli urve are often ued to model the lip (Leroy & Herlin, 1995). Spline: pline are an option to develop more omplex model. They have already been ued to haraterize mouth expreion (Moe, Reunard & Blake, 1995).

49 20 Proeing Fundamental I.2.3 Pot-proeing tehnique and their related mathematial tool To rereate motion on yntheized 3D-model, it i neeary to relate the analyzed information to the faial Ation Unit (AU) 4 of Faial Animation Parameter (FAP) 5. If motion i not derived heuritially from the image proeing reult themelve perhap ometime helped by the iterative feedbak ynthei of the motion ation on the model, a een explained by Eiert and Girod (1998) we mut find ome way to tie analyi to ynthei. There are two major approahe to do o: by modeling the motion with a diret relationhip between the analyzed reult and the phyial deformation the parameter exert on the model/tehnique utilized for the analyi, or by relating the analyi reult to the motion parameter blindly, not knowing how the parameter influene the analyi but building the relationhip on previouly een reult, mot of the time through the ue of mathematial etimator. Thi approah need a training preproeing to tune the etimator. Motion modeling of faial feature To extrat motion information from peifi feature of the fae (eye, eyebrow, lip, et.), we mut know the animation emanti of the FA ytem that will yntheize the motion. Deformable model, nake, et. deliver information about the feature in the form of magnitude of the parameter that ontrol the analyi. It i alo neeary to relate thee parameter to the ation that we mut apply to the 3D-model to rereate motion and expreion. If there exit many different image-proeing tehnique to analyze fae feature, there are at leat, a many orreponding feature motion model. Thee motion model tranlate the reult into fae animation parameter. Maliu and Prêteux (2001) trak fae feature uing deformable prototype ompatible with the Faial Definition Parameter (FDP) defined in the MPEG-4 tandard. Thi allow them to dedue the FAP related to eye and mouth; they ode them into an MPEG-4 tream; and they finally animate a fae lone with them. Chou, Chang and Chen (2001) preent an analyi tehnique that earhe for the point belonging to the projetion of a imple 3D-model of the lip, alo ontaining the FDP. 4 AU are the minimal meaurement of ation oneived within the Faial Ation Coding Sytem onept explained in next ubetion to deribe faial motion. 5 FAP are the minimal ation oneived within the MPEG-4 tandard explained in detail in Chapter VI to deribe faial motion.

50 Faial Image Analyi Tehnique & Related Proeing Fundamental 21 From their projeted loation they derive the FAP that operate on the FDP to generate the tudied motion. Sine one FAP may at on more than one point belonging to the lip model, they ue a leat-quare olution to find the magnitude of the FAP involved. Goto, Khiragar and Magnenat-Thalmann (1999) ue a impler approah where image proeing i redued to the earh of edge, and where the mapping of the obtained data i done in term of motion interpretation: open mouth, loe mouth, half opened, et. The magnitude of the motion i related to the loation of the edge. They extend thi tehnique to eye, developing their own eye motion model. Similarly, eyebrow are traked on the image, and are aoiated to model ation. Etimator One faial expreion are viually modeled by ome image proeing tehnique, we obtain a et of parameter. The mapping of thee parameter onto the orreponding fae animating parameter an be een a finding the etimator that relate fae motion parameter to analyi parameter. Etablihing the mapping relationhip require a training proe. Among other, there exit the following etimator: linear, neural network (NN), Radial Bai Funtion network, et. We deribe the firt two in Appendix I-D. Valente, André del Valle and Dugelay (2001) ompare the ue of a linear etimator againt an RBF network etimator. ANN perfetly omplement image-proeing tehnique that need to undertand image and in analyi enario where ome previou training i permitted. Thi i why they have been ued in fae reognition for many year and in reent time it ue ha been extended to the analyi of fae motion and expreion. In (Tian, Kanade & Cohn, 2001), we find one fine example of the help neural network an provide. In their artile, the author explain how they have developed the Automati Fae Analyi to analyze faial expreion. Their ytem take a input the detailed parametri deription of the fae feature that they analyze. They ue neural network to onvert thee data into AU following the motion emanti of the FACS 6. A imilar approah, aiming at analyzing pontaneou faial behavior, i deribed by Bartlett et al. (2001). Their ytem alo ue neural network to deribe fae expreion in term of 6 The Faial Ation Coding Sytem( Ekman and Frieen, 1978) FACS objetively deribe and meaure faial expreion and movement. Baed on an anatomial analyi of faial ation, it offer a omprehenive method for deribing all faial movement, thoe related to emotion and thoe that are not in term of Ation Unit. EMFACS foue only on movement known to be related to emotion. A new verion of the Faial Ation Coding Sytem by Paul Ekman, Wallae V. Frieen, and Joeph C. Hager i omplete. If you are training new FACS oder or atively uing FACS in reearh, the new verion of FACS i an eential aquiition. The hange to FACS are ignifiant for the future ue of FACS and enable muh more effiient training of oder. See the Web ite below for detail of thee hange

51 22 Proeing Fundamental AU. Thee two approahe differ in the image proeing tehnique and parameter they ue to deribe the image harateriti introdued a input to the neural network. Fuzzy ytem Fuzzy ytem are an alternative to traditional notion of et memberhip and logi. The notion entral to fuzzy ytem i that truth value (in fuzzy logi) or memberhip value (in fuzzy et) are indiated by a value on the range [0.0, 1.0], with 0.0 repreenting the abolute Falene and 1.0 repreenting the abolute Truth. Thi i a new approah to the binary et 0 (Fale) 1 (Truth) ued by laial logi. Fuzzy ytem try to gather mathematial tool to repreent natural language, where the onept of Truth or Fale are too extreme, and intermediate or more vague interpretation are needed. Appendix I-E how the mathematial bai of fuzzy logi. The firt appliation that have benefited from the ue of fuzzy ytem theory have been information retrieval ytem, navigation ytem for automati ar, featuredefinition ontroller for robot viion, In many of them, it appear a a omplement to the image proeing involved and they help in the deiion-making proe needed to evaluate reult from analyzed image. Huntberger, Roe and Ramaka (1998) have developed a fae proeing ytem alled Fuzzy-Fae that ombine wavelet preproeing of input with a fuzzy elf-organizing feature map algorithm. The waveletderived fae pae i partitioned into fuzzy et, whih are haraterized by fae exemplar and memberhip value to thoe exemplar. The mot intereting propertie for fae motion analyi that thi ytem preent are that it improve the training tage beaue it ue relatively few training epoh and that it generalize to fae image that are aquired under different lighting ondition. Fellenz, et al. (2000) propoe a framework for the proeing of fae image equene and peeh, uing different dynami tehnique to extrat appropriate feature for emotion reognition. The feature will be ued by a hybrid laifiation proedure, employing neural network tehnique and fuzzy logi, to aumulate the evidene for the preene of an emotional faial expreion and the peaker voie. Hidden Markov model Hidden Markov model (HMM) are a powerful modern tatitial tehnique that ha been applied to many ubjet area. A Markov proe not only involve probability but alo depend on the memory of the ytem being modeled. Information partially taken from Fuzzy Sytem A Tutorial by Brulé (1985).

52 Faial Image Analyi Tehnique & Related Proeing Fundamental 23 An HMM onit of everal tate. In the formulation of HMM, eah tate i referred to individually, and thu pratial and feaible example of thee model have a mall number of tate. In an HMM, a ytem ha a number of tate S 1 S n. The probability that the ytem goe from tate i to tate j i alled P(i, j). The tate of the ytem are not known, but the ytem doe have one obervable parameter on output, whih ha m poible value from 1 to m. For the ytem in tate i, the probability that output value v will be produed i alled O(i, v). We mut point out that the tranition probabilitie are required to depend on the tate, not the output. Appendix I-F preent tehnique for modeling HMM. P(1, 2) S 1 S 2 S 1 S 2 S 3 S 4 S 5 Figure I-2. Top: A typial illutration of a two tate HMM. Cirle repreent tate with aoiated obervation probabilitie, and arrow repreent non-zero tranition ar, with aoiated probability. Bottom: Thi i an illutration of a five tate HMM. The ar under the tate irle model the poibility that ome tate may be kipped. We refer the reader to the tutorial on HMM by Rabiner (1989), where theoretial bae are further diued and example of the mot ommon appliation an be found. A model for motion We an model behavior pattern a tatitial denitie over onfiguration pae by olleting data from real human motion. Different onfiguration have different obervation probabilitie. One very imple behavior model i the mixture model, in whih ditribution i modeled a a olletion of Gauian. In thi ae the ompoite denity i deribed by: ( I-5 ) P Pr( O λ = k) N k= 1 k

53 24 Proeing Fundamental where P k i the oberved prior probability of ub-model k. The mixture model repreent a lutering of data into region within the obervation pae. Sine human motion evolve over time in a omplex way, it i advantageou to expliitly model temporal dependene and internal tate. An HMM i one way to do thi, and ha been hown to perform quite well at reognizing human motion. In (Metaxa, 1999), the author preent a framework to etimate human motion (inluding faial movement) where the traditional ue of HMM i modified to enure reliable reognition of geture. More peifially, Pardà and Bonafonte (2002) ue an HMM to dedue the expreion of fae on video equene. They introdue the onept of high-level/low-level analyi. In their approah, the high-level analyi truture take a input the FAP produed by the low-level analyi tool and, by mean of an HMM laifier, detet the faial expreion on the frame.

54 Faial Image Analyi Tehnique & Related Proeing Fundamental 25 I.3 Fae Motion and Expreion Analyi Tehnique: State of the Art Sytem analyzing fae from monoular image are deigned to give motion information with the mot uitable level of detail depending on their final appliation. In thi ene, ome of the mot ignifiant differene among the tehnique found in the literature ome from the animation emanti they utilize to deribe fae ation. Some ytem aim at providing very high level fae motion and expreion data in the form of emotion emanti, for intane, deteting joy, fear or happine on fae. Some other provide generi motion data determining what the ation of the faial feature i, for example, deteting open/loe eye. And other an even etimate more or le aurately the 3D-motion of the overall fae giving out very low-level fae animation parameter. In an analyi-ynthei heme for generating fae animation, both analyi and ynthei part mut hare the ame level of emanti. The higher the level of detail of the motion information given by the analyi the fewer tandard motion interpretation the FA ytem will have to make. To repliate the exat motion of the peron being analyzed it i neeary to generate very detailed ation information. Otherwie, if we only generate rough data about the fae ation, we will only be able to get utomized fae motion if the peron expreion behavior ha previouly been tudied and the FA already ha the peifi detail of the individual. It i quite diffiult to laify fae motion and expreion analyi method due to the ommon proeing harateriti that many of them hare. Depite thi fat, we have tried to group them baed on the preiion of the motion information generated and the importane of the role that the ynthei play during the analyi. I.3.1 Method that retrieve emotion information Human detet and interpret fae and faial expreion in a ene with little or no effort. The ytem we preent in thi etion aomplih thi tak automatially. The main onern of thee tehnique i to laify the oberved faial expreion in term of generi faial ation or in term of emotion ategorie, and not to attempt to undertand the fae animation that ould be involved to ynthetially reprodue them. Y. Yaoob ha explored the ue of loal parameterized model of image motion for reognizing the non-rigid and artiulated motion of human fae. Thee model provide a deription of the motion in term of a mall number of parameter that are intuitively related to the motion of ome faial feature under the influene of

55 26 Fae Motion and Expreion Analyi Tehnique: State of the Art expreion. The expreion deription i obtained after analyzing the patial ditribution of the motion diretion field obtained from the optial flow analyi, whih are omputed at point of high gradient value of the faial image. Thi tehnique give fairly good reult although the ue of optial flow fore very table lighting ondition and very mooth movement of head motion during the analyi. Computationally, it i alo quite demanding. From the early reearh (Yaoob & Davi, 1994) to the lat publihed reult about the performane of the ytem (Blak & Yaoob, 1997), improvement in the tuning of the proeing have been added to make it more robut to head rotation. C.-L. Huang and Y.-M. Huang (1997) introdue a ytem developed in two part: faial feature extration (for the training-learning of expreion) and faial expreion reognition. The ytem applie a point ditribution model and a gray-level model to find the faial feature. Then, the poition variation are deribed by 10 ation parameter (AP). During the training phae, given 90 different expreion, the ytem laifie the prinipal omponent of the AP into 6 different luter. In the reognition phae, given a faial image equene, it identifie the faial expreion by extrating the 10 AP, analyze the prinipal omponent, and finally alulate the AP profile orrelation for a higher reognition rate. To perform the image analyi, deformable model of the fae feature are fitted onto the image. The ytem i only trained for fae on a frontal view, apparently it eem more robut to illumination ondition than the previou approah but no detail about the image proeing tehnique are given, whih make thi point diffiult to evaluate. Panti and Rothkrantz (2000) deribe another approah, whih i the ore of the Integrated Sytem for Faial Expreion Reognition (ISFER). The ytem find the ontour of the feature with everal method uited to eah feature: nake, binarization with threhold, deformable model, et., making it more effiient under unontrolled ondition: irregular lighting, glae, faial hair, et. It i worth mentioning the NN arhiteture of the fuzzy laifier, whih i deigned to analyze the omplex mouth movement. In thi artile, the author do not preent a robut olution to the nonfrontal view poition. To ome extend, all ytem diued have baed their deription of fae ation on the Faial Ation Coding Sytem propoed by Ekman and Frieen (1978). The importane granted to FACS i uh that two reearh team, one at Univerity of California San Diego (UCSD) and the Salk Intitute, and another at Univerity of Pittburgh and Carnegie Mellon Univerity (CMU), were hallenged to develop prototype ytem for automati reognition of pontaneou faial expreion.

56 Faial Image Analyi Tehnique & Related Proeing Fundamental 27 The ytem developed by the UCSD team, deribed in (Bartlett et al., 2001), analyze fae feature after having determined the poe of the individual oppoite the amera; although tet of their expreion analyi ytem are only performed on frontal view fae. Feature are tudied uing Gabor filter and ubequently laified uing a previouly trained HMM. HMM are applied in two way: taking Gabor repreentation a input, and taking upport vetor mahine (SVM) output a input. SVM are ued a laifier. They are a way to ahieve good generalization rate when ompared to other laifier, beaue they fou on maximally informative exemplar, i.e., the upport vetor. To math fae feature, they firt onvolve them with a et of kernel (out of Gabor analyi) to make a jet. Then, that jet i ompared with a olletion of jet taken from training image, and the imilarity value for the loet one i taken. In their tudy Bartlett et al. laim an AU detetion auray from 80% for eyebrow motion to around 98% for eye blink. They do not give any reult on mouth analyi. CMU ha opted for another approah, where fae feature are modeled in multitate faial omponent of analyi. They ue neural network to derive the AU aoiated with the motion oberved. The have developed the faial model for lip, eye, brow, heek and furrow. In their artile, Tian, Kanade and Cohn (2001) deribe thi tehnique, giving detail about the model and the double ue of NN, one for the upper part of the fae and a different one for the lower part (ee Figure I-3). They do not diu the image proeing involved in the derivation of the feature model from the image. Tet are performed over a databae of fae reorded under ontrolled light ondition. Their ytem allow the analyi of fae that are not ompletely in a frontal poition; however, mot tet are only performed on frontal view fae. The average reognition rate ahieved are around 95.4% for upper fae AU and 95.6% for lower fae AU. Piat and Tapatouli (2000) take the hallenge of deduing fae expreion out of image from another perpetive, no longer baed on FACS. Their tehnique firt find the ation parameter (MPEG-4 FAP) related to the expreion being analyzed and then they formulate thi expreion with high-level emanti. To do o, they have related the intenity of the mot ued expreion with their aoiated FAP. Other approahe (Chen & Huang, 2000) omplement the image analyi with the tudy of human voie to extrat more emotional information. Thee tudie are oriented to develop the mean to reate a human-omputer interfae (HCI) in a ompletely bimodal way. The reader an find in (Panti & Rothkrantz, 2000) overview and omparative tudie of many tehnique, inluding the one we have diued. Thee tehnique are

28 Fae Motion and Expreion Analyi Tehnique: State of the Art analyzed from the HCI perpetive, whih ontrat with our onideration about fae animation. Figure I-3.

57 28 Fae Motion and Expreion Analyi Tehnique: State of the Art analyzed from the HCI perpetive, whih ontrat with our onideration about fae animation. Figure I-3. Fae feature (eye, mouth, brow, ) are extrated from the input image; then, after analyzing them, the parameter of their deformable model are introdued into the NN whih finally generate the AU orreponding to the fae expreion. Image ourtey of The Roboti Intitute at the Carnegie Mellon Univerity. I.3.2 Method that obtain parameter related to the Faial Animation ynthei ued Some fae animation ytem need, a input, ation parameter that peify how to open the mouth, the poition of the eyelid, the orientation of the eye, et. in term of parameter magnitude aoiated to phyial diplaement. The analyi method tudied in thi etion try to meaure diplaement and feature magnitude over the image to derive the ation to be performed over the head model. Thee method do not evaluate the expreion on the peron fae but extrat from the image thoe meaurement that will permit it ynthei on a model. Terzopoulo and Water (1993) developed one of the firt olution of thi nature. Their method trak linear faial feature to etimate orreponding parameter of a three-dimenional wireframe fae model, allowing them to reprodue faial expreion. A ignifiant limitation of thi ytem i that it require faial feature to be highlighted with make-up for ueful traking. Although ative ontour model are ued, the

58 Faial Image Analyi Tehnique & Related Proeing Fundamental 29 ytem i till paive; the traked ontour feature paively hape the faial truture without any ative ontrol baed on obervation. Baed on a imilar animation ytem to that of Water, that i, developed on anatomial baed mule ation that animate a 3D fae wireframe, Ea and Pentland define a uitable et of ontrol parameter uing viion-baed obervation. They all their olution FACS beaue it i an extenion of the traditional FAC ytem. They ue optial flow analyi along the time of equene of frontal view fae to get the veloity vetor on 2D and then they are mapped to the parameter. They point out in (Ea, Baun, Darrel & Pentland, 1996) that driving the phyial ytem with the input from noiy motion etimate an reult in divergene or a haoti phyial repone. Thi i why they ue a ontinuou time Kalman filter (CTKF) to better etimate unorrupted tate vetor. In their work they develop the onept of motion template, whih are the orreted or noie-free 2D motion field that i aoiated with eah faial expreion. Thee template are ued to improve the optial flow analyi. Morihima ha been developing a ytem that ueed in animating a generi parametri mule model after having been utomized to take the hape and texture of the peron that the model repreent. By mean of optial flow image analyi omplemented with peeh proeing, motion data i generated. Thee data are tranlated into motion parameter after paing through a previouly trained neural network. In (Morihima, 2001) he explain the bai of thi ytem a well a how to generate very realiti animation from eletrial aptor on the fae. Data obtained from thi hardware baed tudy permit a perfet training for oupling the audio proeing. To ontrol the optial flow data generated from the analyi of ontinuou frame, Tang and Huang (1994) projet the head model wireframe vertie onto the image and earh for the 2D motion vetor only around thee vertie. The model they animate i very imple and the 2D motion vetor are diretly tranlated into 2D vertex motion (no 3D ation are generated). Almot the ame proedure i ued by Sarri and Strintzi (2001) in their ytem for video phoning for the hearing impaired. The rigid head motion (poe) i obtained by fitting the projetion of a 3D wireframe onto the image being analyzed. Then, non-rigid fae movement (expreion) are etimated thank to a feature-baed approah adapted from the Kanade, Lua, and Tomai algorithm. The KLT algorithm i baed on minimizing the um of quared intenity differene between a pat and a urrent feature window, whih i performed uing a Newton-Raphon minimization method. The feature to trak are ome of the projeted point of the wireframe, the MPEG-4 FDP. To derive MPEG-4 FAP from thi ytem, they add to the KLT algorithm the

59 30 Fae Motion and Expreion Analyi Tehnique: State of the Art information about the degree of freedom of motion (one or everal diretion) that the ombination of the poible FAP allow on the tudied feature FDP. Ahlberg (2002) alo expoe in hi work a wireframe fitting tehnique to obtain the head rigid motion. He ue the new parameterized variant of the fae model CANDIDE, named CANDIDE-3, whih i MPEG-4 ompliant. The image analyi tehnique inlude PCA on eigentexture that allow the analyi of more peifi feature that ontrol the model deformation parameter. They derive 6 different FAP for their wireframe model. More detailed feature point traking i developed in the work of Chou et al. (2001). The author trak the projeted point belonging to the mouth, eye and notril provided. Thee model are alo baed on the phyial vertex ditribution of MPEG-4 FDP. They are able to obtain the ombination of FAP that regenerate the expreion and motion of the analyzed fae. Their omplete ytem alo deal with audio input, analyzing it and omplementing the animation data for the lip. The main goal of their approah i to ahieve real time analyi to employ thee tehnique in teleonferening appliation. They do not diretly obtain the poe parameter to alo ynthetially reprodue the poe of the head; but they experiment on how to extend their analyi to head poe other than a frontal view fae, by roughly etimating the head poe from the image analyi and retifying the original input image. The MIRALab reearh team, at the Univerity of Geneva, ha developed a omplete ytem to animate avatar in a realiti way, o that they an be ued for teleommuniation. In (Goto et al., 2001), the author review the entire proe to generate utomized realiti animation. The goal of their ytem i to lone fae behavior. The firt tep of the overall proe i to phyially adapt a generi head meh model (already ueptible of being animated) to the hape of the peron to be repreented. In eene, they follow the ame proedure a Morihima preent in hi work, but T. Goto et al. do it by uing jut a frontal and ide view piture of the individual, wherea Morihima alo inlude other view to reover texture for the elf oluion. Model are animated uing MPEG-4 FAP, to allow for ompatibility with other teleom ytem. Animation parameter are extrated from video input of the frontal view fae of the peaker and then yntheized, either on the loned head model or on a different one. Speeh proeing i alo utilized to generate more aurate mouth hape. An intereting pot-proeing tep i added; analyi reult are double-heked before being yntheized and if they are not oherent, they are refued and the ytem earhe in a probability databae for the mot probable motion olution to the inoherene. In (Goto, Eher, Zanardi & Magnenat-Thalmann, 1999), the author give a more detailed explanation about the image proeing involved. Feature motion model for eye, eyebrow, and mouth allow them to extrat image parameter in the

60 Faial Image Analyi Tehnique & Related Proeing Fundamental 31 form of 2D point diplaement. Thee diplaement repreent the hange of the feature from the neutral poition to the intant of the analyi and are eaily onverted into FAP. Although the ytem preent poibilitie to ahieve fae loning, the urrent level of animation analyi only permit intant motion repliation with little preiion. In general, we may onider that fae loning i not guaranteed but realiti animation i. I.3.3 Method that ue expliit fae ynthei during the image analyi Some fae motion analyi tehnique ue the yntheized image of the head model to ontrol or to refine the analyi proedure. In general, the ytem that ue yntheized feedbak in their analyi need a very realiti head model of the peaker, a high ontrol of the ynthei and the knowledge of the ondition of the fae being reorded. Li, Roivainen and Forhheimer (1993) preented one of the firt work to ue reyntheized feedbak. Uing a 3D model Candide, their approah i haraterized by a feedbak loop onneting omputer viion and omputer graphi. They prove that embedding ynthei tehnique into the analyi phae greatly improve the performane of motion etimation. A lightly different olution i given by Ezzat and Poggio (1996a, 1996b). In their artile, they deribe image-baed modeling tehnique that make poible the reation of photo-realiti omputer model of real human fae. The model they ue i built uing example view of the fae, bypaing the need of any 3D omputer graphi. To generate the motion for thi model, they ue an analyi-byynthei algorithm, whih i apable of extrating a et of high-level parameter from an image equene involving faial movement uing embedded image-baed model. The parameter of the model are perturbed in a loal and independent manner for eah image until a orrepondene-baed error metri i minimized. Their ytem i retrited to undertand a limited number of expreion. More reent reearh work are able to develop muh more realiti reult with three-dimenional model. Eiert and Girod (1998), for intane, preent a ytem that etimate 3D motion from image equene howing head and houlder ene typial for video telephone and teleonferening appliation. They ue a very realiti 3D head model of the peron in the video. The model ontrain the motion and deformation in the fae to a et of FAP defined by the MPEG-4 tandard. Uing the model, they obtain a deription of both global (head poe) and loal 3D head motion a a funtion of unknown faial parameter. Combining the 3D information with the optial flow ontraint lead to a linear algorithm that etimate the faial animation parameter. Eah yntheized image reproduing fae motion from frame t i utilized to analyze the image

61 32 Fae Motion and Expreion Analyi Tehnique: State of the Art of frame t1. Sine natural and yntheti frame are ompared at the image level, it i neeary for the lighting ondition of the video ene to be under ontrolled. Thi implie, for example, regular well ditributed light. Pighin, Szeliki and Salein (1999) exploit thi approah to the maximum by utomizing animation and analyi in a peron-by-peron bai. They ue new tehnique to automatially reover the fae poition and the faial expreion from eah frame in a video equene. For the ontrution of the model, everal view of the peron are ued. For the animation, tudying how to linearly ombine 3D fae model, eah orreponding to a partiular faial expreion of the individual, enure realim. Their meh morphing approah i detailed in (Pighin, Heker, Lihinki, Szeliki & Salein, 1998). Their fae motion and expreion analyi ytem fit the 3D model on eah frame uing a ontinuou optimization tehnique. During the fitting proe, the parameter are tuned to ahieve the mot aurate model hape. Video image and ynthei are ompared to find the degree of imilarity of the animated model. They have developed an optimization method whoe goal i to ompute the model parameter yielding a rendering of the model that bet reemble the target image. Although being a very low proedure, the animated reult are impreive beaue they are highly realiti and very loe to what we would expet from fae loning (ee Figure I-4). Figure I-4. Traking example of Pighin ytem. The bottom row how the reult of fitting their model to the target image on the top row. Image ourtey of the Computer Siene Department at the Univerity of Wahington.

62 33 Faial Image Analyi Tehnique & Related Proeing Fundamental Table I-1 COMPARATIVE STUDY OF SOME ANALYSIS TECHNIQUES REVIEWED Training? Controlled lighting? Doe it allow rotation? (poe undertanding) Marker? Potential real-time? Dee it ue a 3D fae model? Poible ynthei in other head poe? Realiti reprodution? Time-line (video) analyi? Method that obtain emotion information Optial flow / parametri model of image motion [BY97] J. Blak & Y. Yaoob N Y Y N N N N.A. N.A. Y Deformable model / PCA [HH97 ] C. H. Huang & Y. M. Huang Y Y N N N N N.A. N.A. Y Feature modeling / neural network [TKC0 1] Y. Tian et. al. Y Y N N N N N.A. N.A. N NN / Fuzzy logi / deformable model [PR00] M. Panti & L.J.M. Rothkrant z Y N N N N N N.A. N.A. N HMM / optial flow / Gabor filter / PCA / ICA [BBL 0 1] M. S. Bartlett et. al. Y Y Y Y N N Y Y Y Method that obtain parameter related to the Fae Animation ynthei ued (I) Snake [TW93 ] D. Terzopo ulo & K. Water N N N Y N N Y Y Y Optial flow / Motion template [EBD 96] I Ea et. al. Y Y N N N N Y Y Y Optial flow / neural network [Mor0 1] S. Morihi ma Y Y N N/Y N N Y Y Y Model fitting / feature point traking [SS01] N. Sarri & M.G. Strintzi N N ~ N Y Y Y N/Y Y Model fitting / PCA / ative model / eifentexture [Ahl02 ] J. Ahlberg Y N ~ N Y Y Y N/Y Y Optial flow [TH94] Li-an Tang N Y N N N Y N N Y Author appreiation For the fae traking, whih i baed in point traking

63 34 Fae Motion and Expreion Analyi Tehnique: State of the Art Training? Controlled lighting? Doe it allow rotation? (poe undertanding) Marker? Potential real-time? Dee it ue a 3D fae model? Poible ynthei in other head poe? Realiti reprodution? Time-line (video) analyi? & T. S. Huang Method that obtain parameter related to the Fae Animation ynthei ued (II) Feature model [CCC0 1] J. C. Chou, Y.-J. Chang & Y.-C. Chen N N ~ N Y N Y N/Y Y Feature model motion [GKM T01] [GEZ 99] Goto et. al. N N N N Y N Y Y Y Method that ue expliit ynthei during the analyi Image-baed tehnique [EP96] [EP96 2 ] T. Ezzat & T. Poggio Y N N N Y N N Y Y Optial flow / plinebaed 3D fae model [EG98] P. Eiert & T. Poggio N Y Y N N Y Y Y Y 3D model fitting / image differene minimization [PSS99 ] F. Pinghin et. al. Y Y Y N N Y Y Y Y Author omment For the fae traking, whih i baed in point traking ~ Slight rotation are permitted although there i no diret ue of the poe data during image proeing

64 II Realiti Faial Animation & Fae Cloning Evaluating faial animation ytem i an ambiguou tak beaue predefined generalized quality riteria do not exit. Mot of the time, the degree of realim and naturalne of yntheti faial reprodution i determined from ubjetive judgment. Thi hapter ontain the definition of ome theoretial onept related to faial animation. We have tried to formalize the notion of realim within the ontext of our reearh. We aim at providing a oneptual bai where the notion of avatar and lone are learly and unambiguouly tated. Thi formal framework allow u to deribe the interation exiting between faial motion generation and it ynthei from a global perpetive. We onlude the hapter with ome onideration about fae loning viewed from an ethial perpetive.

66 Realiti Faial Animation & Fae Cloning 37 II.1 Undertanding the Conept of Realim in Faial Animation Realim i the latet and greatet hallenge human animation onfront. Computer graphi tehnique have already being ued to bring to life yntheti harater that behave in a more or le human way. In the pat few year, we have been able to ee the firt reult on three-dimenional realiti human animation in the entertainment indutry. We find one of the mot impreive reult in the movie Final Fantay (Lee, 2001). All the harater and enario of thi movie have been rendered ynthetially and, unlike other example of omputer-graphi aided film, it reator have aimed at reproduing human in a highly realiti manner. In general, the degree of realim ought depend on the appliation whih the yntheti human harater will be involved in. Fae ation and expreion are very important in human ommuniation and the lak of them i one of the weaknee of traditional teleom ytem ( and telephone). Video-baed ommuniation eem to be the olution to thi problem. Nowaday, teleom network are not ready to arry all the video data to enable the extenive ue of high quality teleonferene, and uer do not like the ontrained environment that one-to-one ommuniation with very little motion flexibility impoe. Syntheized or virtual harater, above all animated fae, have appeared a a poible way-out to reate better ommuniation environment at a lower network ot. Talking- Head (2002) that repreent peaker in onveration are already a reality. Unfortunately, we annot reate eaily fae model and animate them in uh a way that they are able to ubtitute not only the phyial preene but alo the trut we have on the real peron we are peaking with. In order to ahieve o, we need to build 3D-head model that an be realitially animated; furthermore, we will have to make a virtual lone of the peaker, implying the utomization the 3D head motion to the ation of the peaker, if we want the peron to be ompletely repreented by it yntheized model. It i diffiult to determine at whih point we an onider a realiti 3D-head model to be animated realitially. In fat, realim depend on two iue: on the one hand, the phyial appearane of the head model and, on the other hand, the motion and ation the model an generate. Ideally, a 3D-head model i realiti when the urfae or urfae that ompoe the yntheize objet exatly repreent a human head. To obtain o, not only the geometry of the model mut be a detailed human reprodution but alo the texture, olor and light refletion harateriti of the urfae mut math thoe of the part of a human head. We talk about realiti head animation when we refer to the number R of undetermined but limited motion ation that the human head model an

67 38 Undertanding the Conept of Realim in Faial Animation generate. One motion ation i eah of the 3D-movement Bv r, where r Bv = ( x, y, z ), whih are exerted on eah of the point belonging to the urfae of the model to reate animation. The magnitude of thee diplaement, B, i limited to the maximum value permitted by real human motion. Due to the wide range of ation and the omplexity of human nature it i very diffiult to ompletely render highly realiti virtual head. It took 5 year to finih Final Fantay, eah of the ation of the human virtual harater involved thouand of omputation, and neverthele, the reult, although very atonihing, i till far from being humanly believable. A head lone i a peifi ae of realiti head animation. We onider that a lone i a realiti 3D-head repreentation of a living human. From the overall ation R that a realiti head model an do, only S R, will be exerted. S are thoe ation that involve the peifi and exat head motion and expreion of the peron whom the lone repreent. Obviouly, thi definition i the ideal that all FA ytem hould target when loning people. It eem a priori very diffiult and almot impoible to generate the right and exat motion belonging to omeone ation, and even a diffiult to generate the 3D-model repreentation of the peron. That i why, in a more general way, we all head lone the 3D-head model and the group C S of ation that doe not allow u to differentiate the rendered moving repreentation of the lone from a reorded video from the peron. Figure II-1 repreent graphially the relationhip amongt the different ation group. Figure II-1. It i imple to undertand how we an generate loned ation in a realiti way if we enure that the group of ationc S R. R repreent ation feaible on a realiti head model animation, C thoe ation that would utomize the realiti animation to beome a lone, S are the final ation that we are able to repliate from the tudy of the media input, in our ae, monoular image. We mut point out that the onept of differene between a human being reorded and a rendered lone i ubjetive. Realiti human head ation, ation belonging to R, will alway enure a human-like lone behavior, but they may be jut too general to We define motion ation a phyial vetor diplaement of the urfae point, but we do not imply the ue of vertex diplaement a the only way to generate yntheized animation, jut that the final rendered reult mut imitate the natural head urfae motion (deformation een a diplaement of the infinite point that belong to the urfae). Bv r

Judging the realim will depend on the degree of aquaintane we have with the peron himelf.

68 Realiti Faial Animation & Fae Cloning 39 repreent the peron being loned. Similarly, jut uing a ubgroup S of all the poible ation being ueptible of ourring when that peron ommuniate may not be uffiient to repreent the individual. Judging the realim will depend on the degree of aquaintane we have with the peron himelf. For intane, for thoe people that intimately know the peron being loned, it will be muh eaier to find out they are dealing with hi lone. Head avatar are another la of animated head model. An avatar i a rough or ymboli repreentation of an entity, in thi ae, a peron. When an avatar take the form of a head/fae there i no pre-etablihed rule that fore it to at in a realiti human-like way. The number of ation permitted V i fairly more extenive than thoe allowed to realiti faial animation, but ine they are jut a rough 3D repreentation they do not permit omplete realiti yntheized behavior. Only a limited group = R V of ation will fall in the definition of realiti human head animation. V R Avatar willing to reate a human-like feeling will only ue ation belonging to V R. Figure II-2. The freedom of ation permitted on avatar V i larger than the one allowed on realiti head model R. Avatar are meant to jut be a rough repreentation and therefore they are limited by the nature of their model. V R = R V i the group of ation performed on an avatar that will make it behave like a human being

69 40 The Semanti of Faial Animation II.2 The Semanti of Faial Animation A tated in the previou etion, 3D-head animation ynthei an be een a the ombination of motion ation to be applied on eah of the point belonging to the head-model urfae. Let u all A i the group of all poible ation that a head model an undergo in the peifi FA ytem i. Eah time a new FA ytem i reated we need to et it A i o to peify how to generate the minimal ation on the head model, i a n = B v r. Mot of the time, determining eah of the ingle movement, a i i, n n, that the model mut undergo to generate a peifi ation i too ompliated to do in a one-perone bai; therefore, thi operation i done one. Afterward, ation are grouped following ome ort of emanti motion riteria aoiated to the movement they involve. i Eah of thee ubgroup, A j, repreent a more or le omplex ombination of ation related amongt themelve by it motion. Figure II-3. A peifi FA ytem ha it own animation deigned one: A i ={ a i n }. Afterward, i minimal ation, a n, are gathered in more or le large group aoiated by the emanti of their movement We define the onept of emanti level a the amount of motion information given i by eah of the ubgroup A j, that i, it refer to the degree of detail that a FA general i ation A j inlude. For intane, the ame overall ation to be performed over a 3Dhead model let u hooe a an example how to loe the right eye may be expreed in many different way by uing more or le detailed emanti level. Taking very high-level 1 emanti, an FA ytem ould diretly undertand the ation «loe the right eye»: A j. Another FA ytem, uing more detailed emanti, may define the ation like «move 2 right eye upper lid down and move right eye lower lid up»: A j. And if the FA ytem only undertand very low-level ation then we ould get «move point p right _ eye 1 y

70 Realiti Faial Animation & Fae Cloning 41 unit along the y-axi, move p 2 right _ eye z unit along the z-axi, et.»: denote an overall ation of the ame nature. 3 A j. Let j Naturally, for a determined FA tehnique, generation and animation are alway in i emanti reonane, that i, all ubgroup A j are ompletely undertood and ynthetially rendered by the model. Thi mut our regardle of the origin hoen for the generation of the movement ation. Mot of urrent FA ytem animate motion derived from manually et ation. Currently, new way of obtaining ation are been earhed for and there exit ome enouraging reult to automatially generate head/fae ation from different media: text, audio and video. Sine the emanti meaning of motion may be very different from one FA tehnique to another, mot of the time, it i diffiult for ation defined and deigned for a onrete FA ytem to be undertood by a different one. High-level ation generally are a ompound of everal low-level ation. Although the final motion animation done on the 3D-head model i alway performed by applying minimal (the i lowet emanti level) ation, a n, it doe not mean that all general ation are univerally undertood. For a peifi FA ytem to undertand motion ation deigned for a different FA tehnique, it mut emantially tranform through T {}, the inoming ation and tranlate them into undertandable motion ation before applying them on the head model. The T {} tranform will only be poible if both FA ytem hare i ome minimal ation a n for a given general movement j. In general, T {} i diffiult to find. The minimal ation grouped by a FA ytem 1 for a peifi movement, A j, ould be inluded in everal group in the other FA ytem: { A j, A i, A k, }; or vie vera, everal ation group from the FA origin, { A j, A i, A k, }, may be ontained in jut one group at the FA ynthei A j. Figure II-4b and Figure II-4 illutrate thee ituation. When generating motion information to lone human behavior, we mut enure that the emanti motion information obtained during the analyi of the peron ation i ompletely undertandable by the FA ytem that will render the head model. Figure II-5 preent the three major ituation that appear when onneting a ytem generating motion data with a FA ytem: Reonane: In thi ituation motion data ompletely uit the FA emanti and it i diretly undertood. Miundertanding: The motion data generated i not diretly undertandable by the FA ytem. The emanti of the motion data are inompatible with thoe of the ynthei.

42 The Semanti of Faial Animation Undertanding: After adapting the generated motion data, initially inomprehenive for the FA ytem, we are able to animate the model.

71 42 The Semanti of Faial Animation Undertanding: After adapting the generated motion data, initially inomprehenive for the FA ytem, we are able to animate the model. The mot optimal ituation happen when both motion generation and ynthei are 100% ompatible. In thi ae, there will be no lo of information. Otherwie, the FA ytem will not render all the motion data generated. Figure II-4. Ation deigned to animate fae model in a peifi FA ytem (A ji ) are grouped following the emanti of the motion. In (a) we illutrate how the emanti of the generated animation parameter, A j1, and thoe of the animation ytem, A j2, are the ame. In (b) the general ation generated i expreed by mean of everal ation of the FA ytem; and in () we need to generate divere general ation to animate jut one

72 Realiti Faial Animation & Fae Cloning 43 Reonane (ame emanti level): Motion ation generation { A 1 j } { A 2 j } A 1 j = A 2 j, Semanti level A 1 j A 2 j = A 2 j Motion Synthei j Example: Ation: Turn eye pupil 10 degree over the y- axi. Synthei undertanding: Turn eye pupil 10 degree over the y- axi. Miundertanding (different emanti level): Motion ation generation { A 1 j } { A 2 j } Semanti level Motion Synthei Ation: Turn eye pupil 10 degree over the y- axi. Synthei undertanding: Nothing. A 1 2 j A j = for ome j Undertanding (adapting emanti level): T Motion ation generation { A 1 j } T{ } { A 2 j } Semanti level tranformation Motion Synthei 1 2 1, 2 T { A j } A j = A j for ome j {} A = { a, a, a, K, a }; A = { a, a, a, K, a }; j 1 1, j 2 2, j 3, j N, j 2, j & a k, j = a k, j for at leat one i of one general movement j In reonane, T {} = I ; I being the identity funtion. j 1, j 3, j M, j Ation: Turn eye pupil 10 degree over the y- axi. Synthei Tranform: Undertand phyial eye motion: 10 deg = right. Synthei undertanding: Look right. Figure II-5. When generation and ynthei of animation are in reonane, all generated movement are ompletely undertood and reprodued. If the ytem of animation generation doe not follow the emanti of the fae motion ynthei, there i miundertanding and we need to adapt motion parameter to have ome undertanding. Thi i poible if both ide, generation and yntheti animation, hare at leat ome minimal motion ation

73 44 The Semanti of Faial Animation In a omplete FA ytem, analyi and ynthei mut hare the ame emanti and the ame yntax in their motion deription. While by emanti we have referred to the onept of generating the ame et of poible ation or analyzed movement a the ynthei i able to render, in the onept of yntax we imply the way thi motion i deribed. Given ertain parameter and magnitude involved in a peifi movement, we hould be able to expre the ame ation while they are generated a well a when they are applied. Although aomplihing thee requirement apparently eem a trivial tak, in urrent reearh there i till little impliation in how the motion emanti and yntax determine the way ation parameter hould be generated. Therefore, propoed olution to ahieve the ame goal, fae motion analyi and ynthei, are pare and have led to the development of algorithm to be ued in very peifi environment. In the hitory of Computer Graphi, there have been everal attempt to define group of ation and the aoiated emanti for fae animation. Reearher have generally foued on the generation of fae expreion. In hi Ph.D. diertation A parametri model for human fae, Parke (1974) defined a faial model that i rendered a a meh of Gouraud-haded polygon from a et of real-valued parameter. Parke model allowed both the expreion and the onformation, or hape, of the fae to be defined by parameter. Ekman and Frieen (1978) defined a ytem (the Faial Ation Coding Sytem, or FACS) for enoding faial expreion in term of ation unit (AU). Their ytem wa deigned to be ued by noting down oberved faial expreion, and define ix ategorie for omplete expreion (happine, adne, anger, fear, digut and urprie). Water (1987) baed hi faial animation ytem on FACS, mapping the ation unit to mule in the fae. Water alo rendered the fae uing finer mehe of polygon, diplaed moothly o a to model kin elatiity. Kalra, Mangili, Magnenat- Thalmann and Thalmann (1992) developed another tehnique for more realitially rendering faial image; they ued rational free form deformation, a method of ditorting urved urfae by manipulating grid point urrounding them. The faial model ued i imilar to FACS, in that it wa made of pereivable ation; it onited of 21 Minimum Pereptible Ation (MPA), eah repreented by a real value. Later work on fae animation, above all thoe foued on very realiti animation (Eiert & Girod, 1998; Pighin, Szeliki & Salein, 1999), define more onrete and preie ation for the model animation. Their emanti are jut uited for their peifi ytem and annot be eaily generalized, thu making the ue of their ation by other ytem apparently more diffiult.

74 Realiti Faial Animation & Fae Cloning 45 II.3 Animating Realim A tated in Setion II.1, ation exerted on a head-model mut belong to the group R of realiti human motion ation to reate realiti faial animation, that i, all a i n R for a given FA ytem i. Realiti yntheti animation of 3D head need learning from the peaker behavior. Motion apture (Sturman, 1994) i often utilized to obtain peifi animation parameter that are tudied to utomize human animation for yntheti repreentation. Some motion apturing tehnique ue hardware-baed method like eletri or eletromagneti aptor (Motion apture webite, 2002), ome other utilize an imagebaed approah and they generally analyze video image input from the peron fae to tudy it movement and expreion. Hardware-baed method an give very preie reult but they are expenive and ompliated to operate. Mot of the image analyi algorithm that aurately retrieve motion information are developed to work on image reorded under known and table ondition. Furthermore, ome of them need ome fae marker to trak the peron movement. Thee algorithm are generally too omputationally demanding to perform in real time. All thee approahe are ommonly ued by the entertainment indutry for the reation of virtual harater (Thalmann, 1996) but they uffer from too many ontraint to be ued in teleommuniation appliation. If we are animating a lone, the analyzed parameter mut repond to the utomized movement of a peifi individual. Fat motion generation for the lone ould give out global ation in the form of more or le general parameter (higher level emanti) for a given FA; if that i the ae, it mean that the utilized FA ytem already ontain the utomized animation information about the peron being loned. If we are apable of generating very low-level ation then the FA ytem will need to keep le information. Clone ynthei and animation from video analyi In the literature, we find three main 3D-head animation tehnique mathing uitable fae motion analyi: (i) animation rule and feature-baed tehnique: thee method are baed on parametri fae model whih are animated by a few parameter diretly ontrolling the propertie of faial feature, like the mouth aperture and urvature, or the rotation of the eye-ball. Fae analyi provide ome meaurement data, for intane, the ize of the mouth area,

75 46 Animating Realim and ome animation rule tranlate thee meaurement in term of animation parameter. (ii) (iii) wire frame adaptation and motion-baed tehnique: motion information, omputed on the uer' fae, i interpreted in term of diplaement of the fae model wire frame, via a feedbak loop. Thee tehnique have proved to be very preie, epeially when a realiti fae model i ued. However, they are generally low beaue they ue iterative method. key-frame interpolation and view-baed tehnique: fae animation i done by interpolating the wire frame between everal predefined onfiguration (key-frame) that repreent ome extreme faial expreion. The diffiulty of thi approah i to relate the performer' faial expreion to the key-frame, and find the right interpolation oeffiient. Thi i generally done by view-baed tehnique, whih ue appearane model of the ditribution of pixel intenitie around the faial feature to haraterize the faial expreion. Regardle of the tehnique ued to derive the animation ation, realim i never lot a long a minimal ation unit are hared amongt ytem and, if being tranformed, they remain in the group R of realiti ation. Fae loning animation may ompletely loe the loning peifiation if, when adapting the emanti of one FA ytem to another, the minimal ation onretely attahed to the individual behavior are lot during the adaptation. We an onlude that the degree of detail of the deribed motion play a very important role in fae loning.

76 Realiti Faial Animation & Fae Cloning 47 II.4 Privay and Seurity Iue about Fae Cloning: Watermarking Poibilitie Syntheti objet an be ued to reate video and image. In thi new enario, watermarking tehnique beome ueful for everal purpoe. In partiular, it ould help viewer to hek the reation origin (yntheti or natural) of an image/video objet, to determine if the ue of a given objet i legal or not, and to ae additional information onerning that objet (e.g. opyright, date of reation, and o on.). Privay and eurity onern will inreae a oon a the ue of tool able to manipulate hybrid media (yntheti and natural) will reah the general publi. Highly realiti lone will add unertainty to all viual reording. Very realiti yntheti avatar are already a reality. For intane, people from Vir2elle (2003) are able to yntheize talking head that annot be ditinguihed from reorded video. The algorithmi tehnique behind Vir2elle tehnology an be found in (Coatto & Graph, 2000). In a not o far future, it ould beome poible for an anonymou peron to uurp omeone ele identity by making hi lone at, ay and behave in a ertain way on a video or image. Claial 3D watermarking algorithm (Benede, 1999) fou on the protetion of the omputer repreentation of the 3D objet (via it geometry data); they annot protet it uage (i.e., the et of all poible yntheti image generated from the model). It eem more intereting to develop algorithm that aim at proteting the ue of a 3D model by watermarking it aoiated texture. Dugelay, Garia, and Mallauran (2002) propoe to embed a watermark into the texture of the model, and then to reover it from the yntheized model image view. The reovery of the mark from yntheized view would be a more or le valid proof (depending on the urrent performane of the algorithm) that the ene i yntheti. Thi algorithm i reilient to any modifiation of the internal repreentation of the 3D model ine it i baed only on the view of the objet and on ome arbitrary original repreentation of the objet ued to embed the watermark (whih need to be known for the mark reovery).

78 III Invetigated FA Framework for Teleommuniation In thi hapter we develop the pratial apet of faial animation and more peifially fae loning viewed from the perpetive of deploying teleommuniation appliation. We deribe how faial expreion analyi tehnique developed and tudied in thi thei are framed inide an analyi/ynthei ooperation heme whoe main objetive i to ahieve fae loning for teleonferening purpoe. We tart exploring the pratial enario under whih we have developed our algorithm, alo reviewing urrent trend in how to deploy faial animation in teleommuniation. Then, we diu ome tehnial iue related to the ue of faial animation for teleommuniation and we preent how our framework allow ome intereting networking performane evaluation. Finally, we introdue the faial motion and expreion analyi proedure deribed inide the propoed framework.

80 Invetigated FA Framework for Teleommuniation 51 III.1 Introdution The range of poible and expeted appliation for ytem aiming at deploying highly realiti faial animation i wide. Faial animation an be ueful to video ommuniate via newer and more flexible ommuniation link uh a, the Internet or mobile telephony, whih do not have high bit rate apaitie and annot enure nie quality of ervie. Next generation mobile ommuniation already ontemplate the poibility of fae-to-fae onveration. E-ommere uing virtual eller enhane the ontat with utomer by uing fae-to-fae human omputer interfae. The game indutry an alo benefit from uing lone of player intead of imple avatar. Finally, ome advaned ommuniation ytem involving everal peron (video teleonferening) ould be deigned to redue the feeling of ditane between partiipant by introduing ome element that exit in real meeting when reating artifiial but realiti work diuion environment. In thi ene, our thei reearh ha been done following the pirit of developing more advaned teleonferene ite. It i important to notie that until now, all the appliation mentioned before have preferred uing avatar rather than animating inuffiiently realiti artifiial fae. Thi jutifie the great effort and reoure put into fae loning reearh and the relevane of the thei work herein expoed in urrent teleommuniation. To generate fae loning we need a highly realiti 3D model of the peaker; and we hould be able to realitially animate it. In addition to that, the animation of the model i enlaved by, or reprodue, a ertain reality beaue the goal of loning i to reprodue the behavior of a real peaker. Contrary to avatar or talking head (even if realiti), fae loning implie for the omplete ytem of animation generation to be peaker-dependent. Thi domain fall in the larger ategory of virtualized reality, in oppoition to virtual reality ine the realim of the retitution i not reahed from rath by advaned omputer viion tehnique but inpired and ontrained by reting on real audio-viual data of the peaker. Fae loning i a relevant illutration of the reent phenomenon of onvergene between different reearh domain: image analyi (i.e. ignal proeing), image ynthei (omputer graphi), and teleommuniation. Modeled yntheti fae are animated following the ation derived from the interpretation of ome animation parameter. Generating fae animation parameter beome a diffiult tak if done manually; therefore automati or emi-automati parameter generation ytem have been developed. Thee ytem extrat fae motion information from either the peron peeh, image or both. Viual Text-To-Speeh (Viual TTS) yntheizer, whih refer to thoe TTS that alo provide fae ynthei, generate their animation parameter from the input text given to the TTS. The Viual

81 52 Introdution TTS analyze the text and upplie the orreponding phoneme. Thee phoneme have their mathing yntheti mouth motion repreentation, alo alled vieme, whih an be yntheized. Viual TTS preent everal advantage: they are the mot imple analyi ytem to generate fae animation parameter, they do not need human interation and they an generate quite aurate mouth movement. For thee reaon, ome of the urrent fae animation produt available to the publi ue thi tehnique. We an alo utilize the duality phoneme-vieme to derive animation from peeh. In thi ae, peeh i analyzed to dedue the phoneme. Whether we extrat the phoneme from text or from peeh, the major drawbak of thee method i that they only generate automati movement for the mouth therefore ome other oure of ation generation i needed to omplete fae animation. They give aeptable reult when animating non-realiti harater (artoon, animal, et.) but ine their generated parameter information i not peaker dependent, they hardly give a natural human feeling. We an obtain improvement in realim and naturalne by utomizing the ynthei baed on omeone fae motion and expreion. Motion apture i often utilized to obtain peifi animation parameter that are tudied to utomize human animation for yntheti repreentation. Some motion apturing tehnique ue hardwarebaed method like eletri or eletromagneti aptor, ome other utilize an imagebaed approah and they generally analyze video image input from the peron fae to tudy it movement and expreion. Hardware-baed method an give very preie reult but they are expenive and ompliated to operate. A oftware-baed alternative for utomizing animation i the ue of image-proeing algorithm to aurately retrieve motion information from video of the peaker. Thee algorithm are generally developed to work on image reorded under known and table ondition. Furthermore, ome of them need ome fae marker to trak the peron movement. Some olution an generate information to yntheize omplex animation but they are omputationally heavy and they annot perform in real time. In addition to make more realiti and natural generi faial animation, we alo need motion analyi tehnique to intantly tudy the ation of the peaker at a given time. When uing FA in ommuniation, the appliation environment require real-time non-invaive analyi method to generate faial animation parameter; therefore mot of the approahe taken to utomize faial animation ytem are no longer ueful to be applied to ommuniation.

82 Invetigated FA Framework for Teleommuniation 53 III.2 Framework Overview Figure III-1 illutrate the ytem that we propoe for faial motion and expreion loning. During the analyi (green print) analyzed information from the peaker, mainly viual although it ould alo be of different origin, i obtained and ued to repliate the faial behavior (denoted by λ), on a highly realiti 3D-head model. The generated parameter ould be enoded and ent to be diretly interpreted; intead, it i preferable to imulate the deoding and ynthei during the image analyi. Adding thi ynthei feedbak, we an ontrol the error ommitted and we an adjut the analyzed parameter to adapt them to a more aurate motion (). The final data (µ) mut be undertandable by the faial animation engine of the deoder in the remote ite (orange print), following the peifi emanti or maybe after having been adapted to a tandard. The ue of a highly realiti head model of the peaker enable u not only the ue of a onvenient and exploitable viual feedbak but alo the knowledge of anthropometri data that an alo be utilized during the analyi. Sytem Overview Real Image (input) 3-D Geometry & Texture Syntheti Image (output) Image Analyi Analyi Synthei Cooperation imulation of deoding λ Coding µ p oe & expreion valid in 3-D Realiti (peaker-dependent) fae model MPEG-4 om pliant? another model low bit rate other input (from text or audio)? Viewer p referene Image Synthei Figure III-1. When uing lone animation for ommuniation, there exit two main ative part. The faial animation parameter generator (green print), whih i inluded in the enoding/tranmiion-part and doe heavy image proeing; and the faial animation engine (orange print), whih i loated in the reeiver and whoe tak i to regenerate the faial motion on the peaker lone by interpreting fap. The framework herein preented ue yntheti lone image feedbak to improve the image analyi and generate more aurate motion information

83 54 Framework Overview Baially, the omplete development of thi ytem ontain 4 major blok: ( i ) aquiition or reation of artifiial, peaker-dependent and realiti 3D yntheti head model of the peaker; ( ii ) analyi of video of a peaker reorded in a natural environment to extrat parameter for animation; ( iii ) ompreion and tranmiion of parameter between enoder and deoder; ( iv ) ynthei of the 3D model and it animation from the reeived parameter. The main ore of the reearh work preented in thi thei: the poe-expreion oupling trategy for faial non-rigid motion analyi ha been developed to uit the requirement of blok ( ii ).

84 Invetigated FA Framework for Teleommuniation 55 III.3 Our FA Framework from a Teleom Perpetive Until reently, viual interperonal teleommuniation have been uing only video and audio oding and tranmiion tehnique. Video quality depend on the available ommuniation bandwidth. The performane of video tehnique notieably dereae when only very low rate are attainable, a it happen, for intane, in the Internet and the Mobile network (Dubu & Budreau, 2001). Uing fae animation method beome an alternative to video tranmiion beaue fae animation parameter an replae image, thu reduing the amount of information to be ent. Even when high bit rate are available, urrent videoonferene ytem annot provide a natural immering feeling. Video ommuniation among more than two people beome unomfortable. A already diued by Dugelay, Fintzel and Valente (1999), urrent trend in reearh fou on developing teleonferene ytem where peaker will enjoy a more natural ommuniation environment. If thee Teleom ytem ueed in developing fae loning, they ould provide two main advantage ompared to laial audio-video ommuniation ytem: they would imulate the ame viual reult, requiring lower tranmiion need (bandwidth); the yntheti nature of the viual reprodution would permit offering extended ervie through original funtionalitie for the viewer, uh a modifiation of point of view, language tranlation of peeh with adjutment of lip motion, et.

85 56 Our FA Framework from a Teleom Perpetive Rough evaluation of the ue of FA on a teleom network To illutrate the advantage of uing fae animation, let u evaluate what would be the load of the FA data in a teleom ytem when wanting to have the viual moving repreentation of the peaker. Τhi example i uitable for appliation like mobile telephony and trie to ompare two very extreme ituation: no ompreion at all or plane data ompreion againt faial animation oding. Currently there exit oding olution like the H. 264 tandard for video oding that give outtanding reult in video oding but, unlike faial animation oding, do not introdue yet 3D information oding poibilitie (ee MPEG-4 for detail). Downloading the model (raw omplete ize: ~1 MB): ** Texture: RAW: byte (1) JPEG ompreed (100% quality): byte (2) JPEG ompreed (50% quality): byte Note: The viual differene between (1) and (2) may not be ignifiant for ome appliation, neverthele the ompreion rate differene i not negligible. Etimated time of tranmiion at 10 kb/: RAW: ~7 min and 55 (1): ~2 min and 44 (2): 21.7 At firt glane, only the lat ae eem to be reaonable. Neverthele, the ize of the texture ould be utomized to the appliation requirement. ** 3D meh (in the form of ASCII value): High quality 5000 vertie: Unompreed: ZIP ompreed: byte byte Etimated time of tranmiion at 10 kb/: Unompreed: 360 ~6 min ZIP ompreed: 130 ~2 min and 10 Reduing the number of vertie by a fator of 6, the ZIPped meh would take 21.6 to be tranmitted. Meh and texture are the larget data of the omplete FA ytem. A light olution ould be to keep the model already tored in the reeiver. In any ae, the advantage of

86 Invetigated FA Framework for Teleommuniation 57 the ytem i that one the model and the meh have been ent, they remain the ame for future ommuniation and they do not have to be retranmitted. Animating the model: We will jut evaluate the animation of the model a a whole, that i, applying on it only rigid motion parameter. It i eay to extend the reult to more omplex animation by adding extra parameter to be ent. For global poe traking we an ue ix parameter. Three of them determine tranlation along the x-, y- and z-axi, and the other three repreent the rotation degree of the head with regard to thee ame axi. Auming that eah parameter i tored in 2 byte (there i no ompreion). If we want to ahieve 10 f/ (undertanding frame a the differene in movement determined by the parameter hange), we would need: 2x6x10x8 = 960 bit/ ~ 1kbit/. A non-ompreed b&w video of ize 384x288 pixel (oded with 8 bit/pel) would need 8,847,360 bit/ ~ 9 Mbit/. In the high-quality video, we ould alo appreiate other fae movement beide the poe hange. Neverthele, we fae a ratio differene of almot between ytem, whih ould be undertood in the following way: - To ahieve the ame bit rate, at high quality, the image to be ent hould be time maller. In pratial enario, video frame would be too mall to appreiate anything (10000 time maller i jut impoible). We mut reall that the quality of the yntheized fae doe not depend on the tranmiion rate but on the FA ytem ued, the model and texture quality, the oding of the omponent, and rendering ued. Therefore an almot be onidered ize independent. - To ahieve the ame bit rate, maintaining the ize of the image, the video ent hould be ompreed by a fator of Auming that we an obtain uh a ompreion, there are many hane that the quality reeived would be o low that all the detail of the fae would have been lot, and moreover, there would be an overall dipleaing feeling due to the general lo. In fat, thi i the main argument for the development of model-baed oding tehnique of ompreion: video ompreion tehnique trying to ue generi 3D model to be able to ompre video even more. They an be onidered to be half way from laial video to Fae Animation ynthei. - The FA ytem ould ue around 5000 parameter more to improve it animation. Current avatar ytem utilize muh le than 1000 parameter to animate fae and they give out quite pleaant fae motion ynthei.

87 58 Our FA Framework from a Teleom Perpetive Tranmitting o many parameter, although loading network a muh a video would, till ha the advantage of having yntheti motion reprodution, uh a, hanging the peaker point of view, integrating the peaker in different bakground, et. Conidering thi example from a mobile-telephony perpetive, a handet are getting more ophitiated and they have more mean of toring data in their memory, FA, in the form of avatar and maybe later, more realiti 3D head animation, are a viable appliation to be deployed on 3-G mobile telephone. Thi example allude to the fat that fae animation ommuniation involve the oding and tranmiion of the modeled repreentation of the fae (a highly realiti 3D meh for lone ommuniation and it material/texture harateriti) and the et of fae motion parameter that animate the model. We onider fae animation parameter (fap) a the pratial peifiation of the motion ation of a onrete FA ytem. When a given ytem i made to ahieve the mot realiti animation that i poible, the generated analyzed motion data mut be ompletely undertandable, that i, it mut uit the parametri emanti requirement of the fae ynthei in ue. Proper oding and tranmiion tehnique mut prevent ommuniation from altering the generated data flow o the ent parameter remain oherent for the ynthei. III.3.1 Coding fae model and faial animation parameter: an MPEG-4 perpetive Eah different analyi-ynthei heme ha it own way of handling head model and animation data. To takle oding and tranmiion iue, mot ytem elaborate their own olution. For example, Varaklioti, Otermann and Hardman (2001) propoe a method for 3D meh animation oding. Proprietary approahe may fulfill ingle appliation need but they annot enure interommuniation amongt different fae animation enoding-deoding ytem. The new multimedia tandard, MPEG-4, trie to tandardize the way natural and yntheti animation data are oded o they an uit global ommuniation need. We refer the reader to (MPEG-4, 2000) for a omplete overview on MPEG-4. The tandard over a wide range of multimedia poibilitie and inlude fae and body animation a eparate item to ode. When uing fae lone ommuniation, what i the advantage of uing peifi tandardized oding for fae animation? It i baially to ahieve high ompreion and to obtain omplete interoperability with other different ommuniation ytem. What are the main ontraint of uing MPEG-4 peifi oding? Thi

88 Invetigated FA Framework for Teleommuniation 59 tandard peifie ommon yntax to deribe fae behavior. Thi yntax ha been reated in uh way that MPEG-4 onider fae animation a peifi ae of parametri 1 3D-fae motion ynthei. Therefore, we find that it i intrinially tied to it own emanti level of animation. To eae the tandardization among different FA ytem, 3D-head model are required to ontain ertain peifi vertie or Faial Definition Point (FDP) and motion i oded in the form of 68 Faial Animation Parameter (FAP). One FAP generally repreent the magnitude of the vertex motion related to a given ation, although it an alo deribe a higher-level ation, being then an expreion or vieme 2 parameter. For an MPEG-4 ompliant terminal to interpret FAP value, it may ue animation table (FAT), one per eah FAP, to aoiate magnitude value to ation motion for the head model in ue. A more extended explanation on how Faial Animation i takled inide the MPEG-4 tandard an be found in Pandzi and Forhheimer (Ed.) (2002). We refer the reader to Setion 2.2 in Chapter VI of thi thei diertation, where we detail how the tandard ha been applied to make the model ued in our experiment MPEG-4 ompliant. It i not lear yet whih i the degree of realim that uh a oding tandard permit. MPEG-4 FAP are not the mot minimal ation FA an univerally undertand (if uh ation exit), therefore ome different fae animation ytem, baed on more omplex parameter or different animation tehnique, may find the oding requirement too retritive. Sytem whoe fae animation parameter do not diretly math MPEG-4 fae oding yntati and emanti truture mut reinterpret their fae animation rule in term of FAP and FAT. Figure III-2, whih diplay the ommuniation proe, illutrate how before enoding, animation data mut be tranformed, T{ }, to beome omplying FAP magnitude value. Likewie, MPEG-4 deoded FAP tream will have to undo thi tranformation, T -1 { }, to obtain omprehenible animation data. If thee tranformation are poible, no lo of motion information an only be guaranteed if the degree of ation permitted by the analyiynthei ytem are fewer than the ation retrition impoed by the oding. If oding following MPEG-4 Fae Objet requirement beome too ontraining, the tandard alo manage the oding of 2D, 3D mehe and texture, whoe ue may be more onvenient in ome ommuniation enario. 1 Undertanding parametri fae ynthei a the fae animation ynthei where ation are deribed in term of magnitude that repreent the phyial diplaement of the vertie of the 3D-meh fae model. 2 Vieme: head model deformation aoiated to the reprodution of a peifi phoneme.

89 60 Our FA Framework from a Teleom Perpetive (a) MPEG-4 COMPLIANT T( ) FAP ENCODING FAP FAP DECODING T -1 ( ) EXPRESSION & MOTION ANALYSIS FAP PARAMETER ENCODING Speifi emanti tream PARAMETER DECODING FAP. EXPRESSION & MOTION SYNTHESIS (b) PROPRIETARY SOLUTION Illumination, poe and feature analyi Parameter etimation FAP Parameter interpretation Model rendition () ANALYSIS-SYNTHESIS INTERACTION Figure III-2. Fae animation parameter (FAP) are the generated numerial value aoiated to the faial motion we want to yntheize. We are intereted in building enoding and deoding tehnique to effiiently ompre and tranport animation data over different teleom network. Proprietary olution (b) enure perfet ommuniation between motion generation and ynthei. Uing tandardized olution, for intane, MPEG-4 oding (a), enable interoperability amongt different ytem, at the expene of readapting animation to the enoding requirement and maybe loing animation preiion. Teleonferening ytem () are an example of appliation that would profit from the introdution of faial animation analyi and ynthei. III.3.2 Faial animation parameter tranmiion The main requirement for the tranmiion proe in fae ommuniation appliation are: aurate audio and parameter flow ynhronization, minimum delay to avoid lateny diruption and minimum lo of information beaue the ation data tranmitted ha been already muh redued. Thee ondition are hard to ahieve in very-low bit rate network and effort i being put in developing effiient treaming method. If fae animation intend to lone the behavior of a living peron, the overall ommuniation hould be ompletely tranparent to the FA ytem. Thi implie that we mut ue lole oding tehnique and that the tranmiion hould not alter the ent data. Thi lat point i mot of the time out of ontrol of the mot ommonly ued teleom ytem (mobile & the Intenet) and thu, almot impoible to ahieve. Neverthele, it i poible to minimize it effet on the animation (a it ha already been done for other media).

90 Invetigated FA Framework for Teleommuniation 61 Approahe for very-low bit rate network The generalized ue of paket network make the Internet Protool (IP) employment extenive to other domain, e.g. 3-G Mobile Network (Platon, 2000). Streaming data over IP will eventually enure appliation to be independent of the phyial network. Novel teleonferene ytem ue multiat for effiient, real-time, point-tomultipoint delivery. From the two tranport layer protool available in the IP uite, TCP and UDP, UDP i the mot onvenient for multiat appliation. Sine the latter doe not guarantee timely and reliable datagram delivery, RTP (Shulzrinne, Caner, Fredderik & Jaobon, 1996) i ued to provide uh apabilitie end-to-end to the appliation layer, by ue of timetamp, equene number and payload identifiation, among other header field. Human are very enitive to the ynhronization of voie and lip movement. Aurate audio ynhronization mut be ahieved when generating fae animation parameter. Tranporting FAP data over paket network may reult in delay variation and paket lo or reordering due to inreaed ongetion level, therefore protool involved in the tranport mut enure reynhronization of FAP at the reeiver. Relevant work ha already been developed for lip ynhronization between audio and video tream over RTP (Kouvela, Hardman & Waton, 1996; RAT & VIC, 2002). RTP/UDP treaming eem the natural way to tranport fae animation data. Takling ynhronization, delay, buffering, error onealment, et. for faial data beome an extenion of the work already being developed for video and audio treaming. RTP ha alo been the protool of hoie for the delivery of MPEG-4 media over IP (Avaro et al., 2001). For MPEG-4 eah media ha an aoiated Elementary Stream. Media tranport ha been addreed in a joint effort with the Internet Engineering Tak Fore (IETF). Although Fae Animation information i onidered an Elementary Stream, how to enapulate fae animation data i till an open iue. One-to-one onferene ommuniation: a pratial example In thi pratial example, we wanted to tet the poible apabilitie of a bai tranmiion ytem and evaluate the weaknee and trength of tranmiion approahe already ued for other media, in partiular, audio. Our implementation i baed on RTSP (Real Time Streaming Protool), whih i urrently ued for treaming appliation, and define a peifi format for ending FAP

91 62 Our FA Framework from a Teleom Perpetive (following MPEG-4 peifiation) through an IP network. It enable to tream FAP over the network to any MPEG-4 ompliant fae animation terminal. To generate the FAP we ue automati video input analyi of the peaker fae (expliitly it head poe and it eye ation). To render the animation we have a 3Dmodel viewer, where an avatar apable of undertanding the inoming FAP i animated. The video input analyi and fae motion ynthei have already been poitively teted working together on the ame mahine (a PC running Mirooft Window TM NT). They interat intantly, the FAP are animated by the model on the viewer with no delay, thank to the ue of Window internal meage 3 among window. More detail about the analyi/ynthei implementation and the related algorithmi bakground are given in Chapter IV, V and VI. FAP are not ompreed and it i not the purpoe of thi tet to evaluate different oding method. Thi tudy will only onider the tranmiion proedure deigned from laial audio treaming proedure applied on FAP treaming. Some previou approahe In their work, Haverlant and Dax (1999) deribe a networking trategy for a point-to-point ommuniation in avatar world. The Fae Animation Parameter are proprietary, whih exlude any interation with other avatar ytem. In Chen work (Chen & Kambhamettu, 1997), MPEG-4 ompliant FAP are tranmitted uing RTP on UDP, with a elf-defined payload type, whih exlude any interation with other MPEG-4 terminal. Chen main onern i to explore multiat and graphi ompenation algorithm, and not inter-operability. In our work the whole teleonferening ytem, from the extration of the FAP to the lone animation, inluding the ending of movement feature, wa deigned to be a muh MPEG-4 ompliant a poible RTP treaming for real-time FAP RTP i the hoen protool for the delivery of MPEG-4 tream over IP. In order to ue it, we need to define the payload aoiated to the peifi tream we want to tranmit. Payload are till in the proe of tandardization. Having one payload for one media, that i, enapulating eah MPEG-4 Elementary Stream by mean of individual payload, i the mot flexible way to handle treaming but enourage the development of very different payload. We deided to ue the Phoneme and Faial Animation (PFAP) RTP payload format preented by Otermann, Rurainky and Cinvanlar (2001) a a draft to IETF. In 3 We refer the reader the Mirooft Window Developer webite: mdn.mirooft.om.

92 Invetigated FA Framework for Teleommuniation 63 their propoal the author deribe a payload for tranporting phoneme and faial animation parameter (PFAP) tream in RTP paket. In their artile, Otermann, Rurainky and Civanlar (2002) diu it performane when treaming the output of a viual TTS 4. We have hoen thi payload for everal reaon: it already ontain a reovery trategy for lo-tolerant tranmiion of thee tream, it inlude not only plain FAP but alo phoneme (leaving an open door for lip motion oding in the form of phoneme) and, ine it wa intended to be ued by viual TTS appliation, it would make our ytem diretly ompatible with them. We mut point out that the propoal expired in April 2002, and no further ation wa purued, therefore ha not been offiially aepted. The PFAP payload onit of three type of information: phoneme deriptor, FAP deriptor, and reovery information (Figure III-3). Eah payload tart with a paket deriptor field followed by optional reovery information. Phoneme deriptor and FAP deriptor may follow the paket deriptor or the reovery information if available. Figure III-3. Paket Deriptor of the PFAP Payload. Image ourtey of Otermann, Rurainky and Civanlar (2002) FAP are aoiated with phoneme to determine their timing in a entene. The tart time of a FAP i the ame a the tart time of the firt phoneme following the FAP. Therefore, a paket mut end with a phoneme if it ontain any information other than reovery information. In the PFAP payload, it i till poible to end FAP without phoneme deriptor, by inluding the timing information in the tranition field. And in thi way, we are able to profit of thi payload in our tranmiion ytem. Setting The original analyi-ynthei appliation wa deigned to work on a tandalone dektop. The analyi module analyze a real-time video, or a aved video equene, 4 TTS: Text-to-Speeh

64 Our FA Framework from a Teleom Perpetive orret the illumination, predit the uer head poe, analyze the eye feature and onvert the data to MPEG-4 ompliant FAP 5.

93 64 Our FA Framework from a Teleom Perpetive orret the illumination, predit the uer head poe, analyze the eye feature and onvert the data to MPEG-4 ompliant FAP 5. The model viewer module intantly reeive FAP and render the animated fae model. Thi appliation i meant for future ue in teleonferening. The tet-oriented etting of having analyi and ynthei in a ingle loation had to be hanged to turn it into a more teleonferene-like platform. To do o, we deided to provide a network frame that would allow one-to-one videoonferening over the Internet network. For our development we uppoed that the reeiver already ha the head model, it texture and the aoiated FAT. To implement the tranmiion, the funtionality of the exiting analyi-ynthei appliation wa extended o that treaming ould be upported. Thi wa ahieved by eparating the original appliation in a lient/erver truture. Naturally the analyi module, whih wa ending the media element, wa hoen a the Server, and a FAP treaming erver wa added to it. The Synthei module at a a remote ontrol, aking for PLAY/PAUSE of FAP tream and i the Client, an FAP treaming lient wa added to it, ee Figure III-4. Sine RTSP provide ome of the funtionalitie required to develop a treaming verion of the analyi-ynthei appliation, we deided that thi protool would be ued in addition to RTP. Figure III-4. High Level Networking Arhiteture Deign apet Our laial RTSP-implementation initially treamed audio file, with a wave format in our ae. The aim of the implementation wa to extend it apabilitie to tream FAP, and integrate them into the analyi-ynthei appliation. During the merging of apabilitie, our main onern wa not to interrupt the appliation layer funtion (analyi of video input for the erver and ynthei of the model for the lient) with the networking funtion (onnetion, tranportation, ynhronization, ). Thread alway provide good way to iolate and eparate tak and exeute them imultaneouly. The following proee need to run imultaneouly: 5 The developed algorithm for the faial analyi involved an be found in Chapter IV, V and VI.

94 Invetigated FA Framework for Teleommuniation 65 (Server) Adding generated FAP into the Input Buffer (Server) Reading the Input Buffer and Sending the element (Client) Reeiving FAP and Adding to the Output Buffer (Client) Reading the Output Buffer and playing the FAP Figure III-5. Detailed deription of the omplete networking apabilitie of the Server (analyi) and the Client (ynthei) Server The Server work a an aynhronou RTSP erver, it i waiting to reeive data and i able to tream audio/wav and audio/au paket, and it ha alo been extended to deal with FAP by defining it own MIME type: MIME_FAP. MPEG-4 Client The Client implement the minimal tandard RTSP ontrol funtion SETUP/PLAY/PAUSE/TEARDOWN. The referene RTSP lient reognize the audio/pcm and audio/pcmu MIME type. The additional MIME type wa reated for FAP. The reeived tream of FAP i added to the Output Buffer, and played out uing the time-tamping information. Buffering Strategie We implemented the Fixed Playout Strategy in the Output buffer for ending the FAP frame (group of FAP obtained from the analyi of one video frame) to the model viewer. Aording to thi trategy, the player attempt to playout eah frame exatly d m (we fixed it at 250 m by default) after it i generated. So if a frame whih i

95 66 Our FA Framework from a Teleom Perpetive ontained in one RTP paket i time tamped at time t, the player play out the FAP frame at time t d, auming the frame ha arrived by that time. Paket that arrive after their heduled playout time are diarded and onidered lot. The differene between the Intant Playout Strategy and the Fixed Playout Delay Strategy are illutrated in Figure III-6. Thi figure how the time at whih paket are generated and played. A hown by the left tairae, the ender generate paket aynhronouly. The firt paket i reeived after a time that orrepond to the urrent network delay. For the firt playout hedule the fourth paket doe not arrive at the time peified by it timetamp beaue of the network jitter. In the other hand, the Fixed Playout Strategy enable to play out thi ame paket repeting it timetamp relative to the previou paket. Figure III-6. Comparing buffering trategie What i a good hoie for the play-out delay? By making the initial delay large, mot paket will make their deadline and there will therefore be negligible lo; however long delay an beome botherome if not intolerable, epeially for audio. Internet phone an upport delay up to about 400 m, and then many paket may mi their heduled playbak time. Roughly peaking, if large variation in end-to-end delay are typial, it i preferable to ue a large delay. Neverthele if delay i mall and variation in delay are alo mall, it i preferable to ue a mall fixed playout delay, perhap le than 150 m. The timetamp referene of eah of the FAP frame i obtained by extrating the time referene right after the analyi of fae motion and expreion ha finihed. For real-time appliation the differene between two oneutive FAP-frame timetamp hould be maller than 1/fp, where fp i the frame rate of the video input. In pratie,

96 Invetigated FA Framework for Teleommuniation 67 the analyi ontrain the peed of FAP generation. Neverthele, tranmiion mut enure that the model viewer lient ha reeived enough frame to be able to play moothly. Conluion After implementing, we were able to validate and tet our deign hoie. Clearly the hoie of RTSP i not optimal, beaue we do not have enough ontrol on the erver ide, or better expreed, the ontrol are uitable for laial video and audio appliation but not uitable for virtual videoonferene appliation. RTSP i a tandard and it ue i adviable unle new hortoming demontrate to perform better. Neverthele it like uitable ontrol for virtual videoonferening related to the 3D nature of the appliation: loation of the peaker, 3D environment ontrol, et. The hoie of RTP proved to be more optimal, a it i effiient and onnetionle. Chooing PFAP payload a our appliation payload wa not optimal beaue at the moment of the implementation only 40% of the payload information wa ued. We ould remove all dupliate and unneeary header but, in turn, our implementation would not be able of treaming FA parameter over the network to any MPEG-4 ompliant fae animation terminal that would be uing the PFAP payload. We built a demo appliation on a loal network but unfortunately, not in real life ondition (no delay, no loe, et.). A ober evaluation made by imulating paket lo and delay (by dropping FAP and lowing down the tranmiion) howed ome of the problem we may enounter in real life appliation. A it happen in video tranmiion, if many frame are dropped, that i many FAP are lot the quality lower exponentially. FA ytem are even more enitive beaue the animation information i ondened in the form of parameter, and one they are lot no ation take plae. Loe an make the animation ynthei abrupt. From the image quality point of view, the rendering would till be viually nie ; only motion would be affeted, wherea in video tranmiion, viual quality i uually highly perturbed under thee ame networking ondition. The main interet of our tudy wa to how the potential of the analyi-ynthei heme from the ommuniation, and more onretely, the tranmiion perpetive. There exit many open iue that are, by themelve, ubjet of broader and deeper reearh: The buffering effet on ommuniation, analyzed when the network ondition hange during tranmiion; the deign of an effiient teleonferene-oriented payload;

97 68 Our FA Framework from a Teleom Perpetive the development of oding mehanim for the FAP information adapted to the appliation and the ommuniation requirement (tudy of the advantage and diadvantage of uing MPEG-4 oding algorithm); the development of data reovery trategie more uitable for thi kind of appliation; and the earh for an alternative to the regular RTSP ommand o they are more onvenient for teleonferening-like appliation. Many of thee point are already being invetigated, in mot ae, in the ontext of avatar animation. The novelty of the platform ued here lie in the FAP generation from video input, whih permit to perform almot realiti animation thank to the analyi of real human behavior. We believe it i the bet ontext for imulating and tudying networking iue related to the ue of FA for teleonferening.

98 Invetigated FA Framework for Teleommuniation 69 III.4 Faial Motion Analyi: Coupling Expreion and Poe To obtain motion data from the peaker fae over video equene in our realtime teleonferene environment, we mut develop video image analyi tehnique robut to any environment and lighting ondition and not to impoe any phyial ontraint. Mot of the exiting fae expreion analyi tehnique are not really uitable for pratial purpoe. The mot performing analyi ytem where rigid and non-rigid motion i imultaneouly analyzed utilize the ynthei of head model. For example, Eiert and Girod (1998) have improved the approah propoed by DeCarlo and Metaxa (1996) in their teleonferening ytem. Their analyi tehnique i baed on optial flow ontraint to analyze and interpret fae motion. Their reearh reult are very enouraging but their analyi algorithm work under many environmental retrition due to the ue of optial flow tehnique. The proe developed for the analyi of faial feature i oneived for very peifi ue. The main goal of the reearh deribed in thi thei i to develop robut and fat image analyi algorithm to be ued in the propoed FA framework for teleommuniation. In thi pratial etting, motion data extrated from video input i intended for real-time reprodution on a 3D yntheti head of the peaker (it lone). The algorithmi olution ought trie to fit the following requirement: We ue urrently available media for teleonferene ytem, that i, monoular image. Video data extrated from one amera in front of the uer (for intane, web am) without any alibration. We do not allow any interferene over the natural environment: no makeup or marker on the peaker; no preie lighting ondition: imply, illumination that would allow human to undertand the motion; a muh freedom of movement a poible for the uer, avoiding the 'near-to-front' point of view retrition, whih iommon in thi kind of analyi. We try to avoid any training previou to the analyi, or viual knowledge of the peron harateriti that annot be obtained from it yntheti model.

99 70 Faial Motion Analyi: Coupling Expreion and Poe We want to obtain motion data that are a preie a the analyi ondition an allow u, by generating an analyi trategy with improvement potential in it preiion. The obtained data mut be omplete enough to be ued for oherent natural faial ynthei. The image analyi proeing will be deigned to be a univeral a poible. The algorithm and hain of proee involved in the analyi mut be oriented to potentially work in real-time, thu permitting their deployment in teleommuniation appliation. It i diffiult to reate a omplete analyi proe able to fit all thee requirement. Algorithm that are univerally ueable generally lak preiion. Indeed, if no previou aumption are taken, then, making uitable the analyi to all ae implie lot of omputation and therefore the lo of real-time poibilitie. To ompenate thi retrition, we may generate le preie analyi algorithm but keeping in mind the poibility of improving the omplexity of the ytem; a the omputational requirement beome le and le retritive, a flexible analyi heme will allow u to inreae the omplexity and to extrat more detailed motion data. Faial expreion an be onidered independent of the head rigid motion (Bale & Blake, 1998). Although their projeted appearane on the image i not ompletely independent of the poe, the olution developed in thi artile trie to exploit the real expreion-poe independene in 3D pae to tudy oular expreion in 2D. We have deigned a two-tep proe to develop image analyi algorithm to analyze non-rigid and rigid faial motion imultaneouly. Firt, we deign imageproeing tehnique to tudy the feature extrated from a frontal point of view of the fae. Fae how mot of their expreion information under thi poe and thi allow u to verify the orret performane of the image proeing involved and the orretne of the hypothee made for the analyi. Seond, we extend our algorithm to analyze feature taken at any given poe. Thi adaptation i poible beaue the motion model (motion template) utilized during the analyi an be redefined in 3D pae and the auray of the retrieved poe parameter i uh that it enable u to reinterpret the data we obtain from the image analyi in 3D.

Invetigated FA Framework for Teleommuniation 71 poe & 3D model Intrafeature ontraint Interfeature ontraint eye eyebrow mouth Motion Model Parameter Figure III-7.

100 Invetigated FA Framework for Teleommuniation 71 poe & 3D model Intrafeature ontraint Interfeature ontraint eye eyebrow mouth Motion Model Parameter Figure III-7. General diagram of the propoed analyi framework. Developing a video analyi heme where head poe traking and fae feature analyi are treated eparately permit to deign peialized image analyi algorithm adjuted to peifi need, feature harateriti, et. For our work on virtual teleonferening environment, a poe-traking algorithm baed on Kalman filtering (Valente & Dugelay, 2001) wa firt developed. The poe traking ytem permit a robut predition of the poe of the peaker frame by frame with the help of the ynthei of peaker lone. The analyi trategy propoed to tudy non-rigid faial motion profit from the poe traking algorithm robutne and the ue of a highly realiti 3D head model of the peaker. The tudy of monoular image i ontrained by the fat that motion information i only retrieved from one view. When analyzing fae from a ingle perpetive, we an etimate the motion of the intereting feature (eye, eyebrow and mouth) by tudying the diplaement of their projetion on the video frame. Regarding the imageproeing tehnique developed to etimate the diplaement and the generated motion, we have opted for applying peifi and different image-proeing tehnique per feature. Many analyi heme apply the ame tehnique independently of the feature or expreion they analyze. In our preliminary approah (Valente & Dugelay, 2000), we ued a olution baed on image orrelation. We utilized PCA to build the image databae. Storing the image taken from all poible lighting ondition, global poe ituation and FAP ombination beame diffiult for feature like the mouth and the eye, whoe expreion an be quite omplex. Neverthele, the approah wa fairly uitable for eyebrow movement (Valente, André del Valle & Dugelay, 2001). The image-proeing tehnique involved are intended to extrat ueful information for the motion template that have been deigned for eah feature. We aim

101 72 Faial Motion Analyi: Coupling Expreion and Poe at obtaining data under very general irumtane. To help the algorithm to behave orretly, we ue intra- and inter-feature ontraint derived from tandard human motion. All tandard ation that generalized the motion of feature of the ame nature are onidered intra-feature ontraint (for example, both eye at the ame way). Motion information from one feature derived from the analyi of another feature i onidered an inter-feature ontraint. The geometrial nature of the data that ontrol the motion template enable u to extend the algorithm previouly deigned to work on fae howing other poe by imply doing an adaptation that i ompletely detailed in Chapter V. The motion template and the image-proeing algorithm tudied are deribed in Chapter IV. Claifying the propoed methodology into one of the ategorie deribed in Chapter I, we ould onider it a hybrid between method that obtain parameter related to the Faial Animation ynthei ued and thoe that ue expliit ynthei during the image analyi. In fat, our analyi framework trie to take advantage of the trength of both tehnique. In the one hand, obtaining parameter that diretly ontrol the ynthei by uing proeing tehnique peifi to eah faial feature make the analyi more effiient; in the other hand, utilizing ynthei feedbak by uing the peaker lone during the poe traking guarantee a high degree of robutne. Looking at the approah propoed from a oding perpetive, we would like to point out that the ue of realiti 3D model i only mandatory during the analyi/enoding of motion. Faial animation parameter extrated from a ytem that utilize the tehnique herein tudied an be ued on any other head model, a long a the animation ytem for that model hare the ame motion yntax and emanti.

102 IV Faial Non-rigid Motion Analyi from a Frontal Perpetive We have deigned image-proeing tehnique that tudy faial feature motion from a frontal point of view. Thi hapter ontain the formal deription of the motion model that have been deigned (motion template) and the natural ontraint utilized (intrafeature & inter-feature). It alo preent the experimental evaluation of the propoed algorithm by tudying the robutne of the proeing involved.

103

104 Faial Non-rigid Motion Analyi from a Frontal Perpetive 75 IV.1 Introdution The developed feature motion template do not only utilize anatomial ontraint (intra-feature) to derive feature ation, they alo ue human natural tandard motion ontraint to generate natural realiti faial animation exploiting the exiting orrelation between eye and among eye and eyebrow (inter-feature). The image proeing tehnique propoed try to globally minimize the influene of the unknown oure of error and improve the overall behavior undertanding by impoing ome tandard human motion retrition on the data obtained from the analyi of eah feature. They onform an analyi trategy that aim at providing oherent motion undertanding that an generate reliable animation data to ynthetially reprodue faial feature expreion from analyzed video input. The developed algorithm aume that loation and delimitation of the feature (eye, eyebrow or mouth) region of interet (ROI) are known. Thi aumption i realiti in the preent ontext beaue, a it i explained in Chapter V, the proedure that extend the ue of thee algorithm to any other poe alo take are of the traking and definition of feature ROI. For all the preented experimental reult the loation of the faial feature and the delimitation of the ROI ha been made manually. In Chapter V, we deribe how thi delimitation i done automatially, one the analyi algorithm are oupled with the poe traking ytem. poe & 3D model Intrafeature ontraint Interfeature ontraint eye eyebrow mouth Motion Model Parameter Figure IV-1. General diagram of the propoed analyi framework The part related to the faial expreion analyi have been highlighted

105 76 Introdution When analyzing faial feature from video input reorded in unknown environment, very few aumption an be made beaue we annot guarantee any determined image quality or peifi lighting over the fae. To reate robut algorithm, the development of our ight expreion analyi i baed on the following premie: a) the behavior of the feature to be analyzed i known and it an be modeled by undertanding peifi image data; b) the phyial truture of eye and eyebrow i imilar for all human being and image proeing algorithm need to profit from thi fat; and ) the feature will be aumed to be ompletely viible, oluion will be onidered an unontrolled oure of mileading reult, a it happen with extreme lighting ondition.

106 Faial Non-rigid Motion Analyi from a Frontal Perpetive 77 IV.2 Eye State Analyi Algorithm The importane of eye gaze on human ommuniation i ignifiant. Gaze i a rihly informative behavior in fae-to-fae interation. It erve at leat five ditint funtion ( ), regulating onveration flow, providing feedbak, ommuniating emotional information, ommuniating the nature of interperonal relationhip and avoiding ditration by retriting viual input (Garau, Slater, Bee & Sae, 2001). When developing new teleom ytem for videoonferening the orret undertanding and reprodution of eye motion beome neeary. An example i Mirooft Reearh projet Gaze Mater, a tool aiming at providing gaze-orreted videoonferening (Gemmell, Zitnik, Kang & Toyama, 2000). Reently, the 10 th Int. Conf. in Human- Computer Interation granted a omplete eion to eye analyi (Eye movement in HCI, 2003). Due to the vat number of appliation where eye-motion undertanding through image analyi i ueful (eye-loing detetion in vehile driving, model-baed oding in teleommuniation, human ation awarene in HCI, et.), there exit many tehnique to tudy eye ativity on monoular image. It i not the purpoe of thi hapter to go over all the poible method that an be found in different field of reearh, but we will overview ome approahe that relate to our work in video ommuniation. Two major tehnique have been ued to analyze eye movement on image: PCA and deformable template mathing (motion modeling), we refer the reader to Chapter I for theoretial detail about thee tehnique. PCA ha been widely invetigated to tudy faial motion, above all oupled with the ue of optial flow a a oure of motion data (Valente, 1999). Mot reent work prefer to do thi analyi through ICA (independent omponent analyi) rather than uing PCA (Fidaleo & Neumann, 2002). In both ae, their main drawbak i the performane dependeny on the environmental ondition of the analyi, baially on the lighting. The ue of motion template eem to be the hoen olution to retrieve eye ation robutly (Goto, Eher, Zanardi and Magnenat- Thalmann, 1999; Tian, Kanade and Cohn, 2001). Generally, thee motion template are ompoed by ellipe and irle, repreenting the eye hape, that are extrated from the image and traked along video equene. If lighting independene i ought, optial flow annot be ued and other imageproeing tool, analyi uing mathematial morphology, non-linear filtering, et. are utilized. Aiming at working under flexible ondition lead reearher to look for olution where erroneou reult in the analyi many may our hould be ompenated or minimized, for intane, by tudying the temporal behavior of eye ation. Ravye, Sahli, Reinder and Corneli (2000) perform eye geture analyi by

107 78 Eye State Analyi Algorithm uing a mathematial morphology ale-pae approah, forming patio-temporal urve out of ale meaurement tatiti. The reulting urve provide a diret meaure of the eye geture, whih an then be ued a an eye animation parameter. Although in their artile they only onider the opening and loing of eye, they already how the potential of uing the temporal evolution of eye motion for it ation analyi. Our approah follow the ame analyi philoophy a the one preented by Ravye et al. It differ in the image proeing involved: we propoe motion dedution through the tudy of the pupil loation beaue it provide both eye gaze and opening/loing information. Intead of a tatitial analyi, we introdue a temporal tate diagram baed on human motion tandard behavior that ontrain motion uing ome intra-feature retrition. In ommuniation it i very important to generate nonditurbing faial expreion. A it i already diued by Al-Qayedi and Clark (2000), the knowledge of tandard human behavior an be helpful to trak and animate eye. IV.2.1 Analyi deription Our analyi trategy deompoe the eye traking ation in two ategorie: the open-loe movement and the eyeball movement. Eye behavior an be deribed through two major ation: the open-loe eyelid movement and the eyeball rotation. Non exaggerated oular ation are haraterized by the exitene of a tight relationhip between the pupil vertial loation and the eyelid opening; therefore, we an expet obtaining mot of eye motion information from the analyi of the pupil ativity. The propoed analyi heme exploit thi fat and redue the tudy of eye motion to the determination of the pupil poition in the eye area. Then, we aign an ation tate to the eye baed on thi poition and finally we apply a Temporal State Diagram that dedue the bet eye ation by omparing the tate of both eye in the urrent frame and the tate obtained in previou analye. Pupil earh algorithm: Firt, we etimate the ize of the pupil inide the omplete eye feature to determine the hape of the evaluation zone. Figure IV-2 illutrate the hape pf an average eye upon whih we have tudied the pupil ize related to the overall eye ROI. Then, we perform an exhautive an by performing the following energy omputation:

108 Faial Non-rigid Motion Analyi from a Frontal Perpetive 79 ( X, Y ) = ( x, y). t. argmin (IV-1) 4( 2 Weye) Weye Weye 1 2 ( ) I Weye l = Weye m=weye x, y x l > 0, y m > 0 3 ( x l, y m) Weye / 2 Weye / 2 2 I 2 l = Weye / 2 m= Weye / 2 ( x l, y m) inide thi peifi zone and at the ame time weeping vertially and horizontally thorough the eye analyi feature. ( X, Y ) orrepond to the point where thi evaluation 2 i minimized. I ( x, y) i the energy of a pixel omputed a the quare of it intenity omponent. The evaluation formula trie to look for the intenity ditribution that i loet to the pupil-iri hape on the analyi area. Sine the poition of the head with repet to the amera and the video input harateriti may be different on equene of divere origin, we define a the ratiowpupil / Weye. Sine i an anthropometri meaurement uniquely related to an individual, it remain ontant for all analyi enario thu ompletely determining the evaluation. Thi algorithm relie on the intenity information of the image, therefore it i a priori dependent on the lighting ondition. Thi dependene i low beaue the pupil mainly tay a the lowet energy point of the eye thank to the anatomial eye harateriti. Unexpeted analyi reult are ontrolled by the next tep of the proe to minimize the influene of erroneou pupil detetion during the interpretation of eye behavior. The peular nature of the eyeball urfae may introdue very high point of energy (white refletion on the pupil/iri) that hould not milead the reult. Pixel whoe intenity value i onidered too high are ignored during the energy evaluation. Parameterization and interpretation of the analyi To interpret and yntheize the reult from the previou analyi tehnique, it i neeary to parameterize the reulting data. The parameterization proe map the X, Y value of the pupil loation onto their orreponding tate value. A tate value S i obtained a a funtion of the loation of the pupil with referene to the width (W ROI ) and the height (H ROI ) of the eye Region of Interet that i being analyzed. To define the tate we divide the region of analyi in different zone and we aign to eah of them a onrete eye ation (look o muh up, look o muh down and left, et.). Thi ation an be rendered if eah of the tate an be ynthetially reprodued. It i aigned following thee hypothee: (a) eyelid partially over the irie and vertially follow the pupil motion; () both eye behave alike; therefore pupil motion i orrelated; (d) pupil remain alway the darket part of the eye, when it i open,

109 80 Eye State Analyi Algorithm regardle of the lighting ondition; (b) the abene of pupil indiate that the eye i loed, in uh a ae, the darket point fall on the eyelahe. The aignment of tate i at the ame time a 2D quantization proe where the exat X, Y loation of the pupil i mapped onto the loet pupil loation for a given tate. The number of poible tate i limited to the ize of analyi area and to the auray of the pupil earh. When analyzing very mall eye feature or under very extreme lighting ondition, it i adviable to aign few tate to ompenate the intability of the image proeing output. In Figure IV-2, a implified diagram of eye ROI ubdiviion in preented. The ROI i divided into nine tate zone on whih we have aigned an eye ation: look right & down, look enter & up, et., and where pupil and eyelid motion are orrelated. We alo define an extra tate, S loe, whih i aigned when Y value of both eye i at it lowet point. Syntheti motion generated from the analyi of eye behavior mut rereate natural human ation. Unexpeted analyi reult mut not lead to unnatural eye motion that would interfere in proper ommuniation. To avoid generating unpleaant eye ynthei, we filter the individual tate with a Temporal State Diagram (Figure IV-3), whih aign the mot uitable ommon tate auming the ame behavior in both eye and uing previou reult to ompenate for mileading analyi. We firt input the pupil loation a a State (one for the left eye and one for the right eye), we ompare both tate, if they are alike, we determine that we have analyzed right, otherwie the filter tart omparing both State together and then both tate with the tate of the eye on the previou frame until the filter mathe a orret analyi. Heye RIGHT CENTER LEFT Wpupil Weye UP CENTER DOWN Figure IV-2 Thi diagram how the implet ubdiviion poible for the eye ROI o to extrat meaningful motion information regarding the eye ation

Faial Non-rigid Motion Analyi from a Frontal Perpetive 81 S Lt =f(x Lt,Y Lt,W Lt,H Lt ) S Rt =f(x Rt,Y Rt,W Rt,H Rt ) No S Lt =S Rt?

Or, X L t oppoite extreme of X R t? No S t =S t-1 Ye S t =S loe S t-1 (next omparion)=s t-2 Figure IV-3. The analyzed reult of eah eye feature are filtered through thi Temporal Diagram.

Sine the tate S loe doe not have any phyial information about the pupil loation, it i ignored for future analyi in the temporal hain.

110 Faial Non-rigid Motion Analyi from a Frontal Perpetive 81 S Lt =f(x Lt,Y Lt,W Lt,H Lt ) S Rt =f(x Rt,Y Rt,W Rt,H Rt ) No S Lt =S Rt? Ye S t =S Lt =S R t No t X new = (X t L X t R )/2 S t R next to S t-1? Y t Ye new = (Y Lt Y Rt )/2 W t = (W Lt W Rt )/2 S t =f(x newt,y newt,w t,h t ) H t = (H Lt H Rt )/2 No Y L t & Y Rt very low? Or, X L t oppoite extreme of X R t? No S t =S t-1 Ye S t =S loe S t-1 (next omparion)=s t-2 Figure IV-3. The analyzed reult of eah eye feature are filtered through thi Temporal Diagram. Current eye tate S L t and S R t are ontrated to obtain a ommon tate for both eye: S t. Sine the tate S loe doe not have any phyial information about the pupil loation, it i ignored for future analyi in the temporal hain. The tarting tate i fixed with the X, Y of the eye in their neutral poition IV.2.2 Experimental evaluation and onluion To evaluate the omplete proedure, ome video equene of an individual who wa rigidly tanding faing the amera have been ued. The peron wa aked to move hi/her eye in a natural manner (up, down, left, right & loing eye). Different lighting wa ued for eah reording. One et of equene wa hot under natural tandard ondition: neon lighting. Another et of equene wa reorded under extreme lighting ondition: diret light oming either from the front, from the right or the left ide. The average length of eah equene wa 500 frame, and the average eye ROI wa 32x24 pixel. Figure IV-4. Four frame extrated from eah of the analyzed equene: FRONTAL, NEON, RIGHT SIDE and LEFT SIDE, repetively

111 82 Eye State Analyi Algorithm Etimation of the improvement obtained by uing the Temporal State Diagram We run the pupil-earh algorithm over the video equene (hot of thee equene an be een on Figure IV-4). To onider that the algorithm dedued uefully eye ation on one frame (Table I-Ok), we heked thee riteria: quantitative: the X and the Y omponent of both eye were the ame, omputed with the following interval of auray: (IV-2a) X X 3%, 5% or 10% of W and L R ROI (IV-2b) Y Y 3%, 5% or 10% of H ; L R ROI qualitative: the reult obtained defined the expeted eye ation. Thi wa viually inpeted. The performane evaluation an be een on Table IV-1 on the row labeled a WITHOUT. Appendix IV-G ontain the data from the analyi reult of the equene NEON. To undertand the level of improvement ahieved when adding the Temporal State Diagram, we ompared the previou reult with thoe obtained after utilizing the diagram (Table IV-1 [A]). The improvement i notoriou; above all, the tate diagram ould determine the S loe tate that the algorithm utilized independently ould not. Lighting ondition influene the image analyi reult but the tate diagram enure a ue rate of 70%-90% by providing animation data that i a mooth a poible and that generate natural eye motion. We have alo tudied the perentage of error oming from deteting a fale S loe tate (Table IV-1 [B]) and in (Table IV-1 [C]) we ount the perentage of orret analyzed loe motion (that would rendered to a loed eye) but that were not deteted a S loe. The mot diturbing artifat ome from the introdution of S loe where there ha not been any uh ation. We onlude from the analyi that a trade-off between the auray of motion and the robutne of the Temporal State Diagram i needed. The greater the interval of auray permitted, the le mooth the eye motion will be and the maller the margin of ation that the Temporal State Diagram will have to orret error (ee Figure IV-5 and Figure IV-6, where the graph plot the moothne of the reult). Appendix IV-H inlude the data obtained after applying the Temporal State Diagram on the analyzed reult from equene NEON.

112 Faial Non-rigid Motion Analyi from a Frontal Perpetive 83 PERCENTAGE WITHOUT WITH STATE DIAGRAM Table IV-1 TEST RESULTS ON EYE-STATE ANALYSIS SEQUENCES NEON FRONTAL LEFT SIDE RIGHT SIDE 3% % % A - 3% A - 5% A - 10% B - 3% B - 5% B - 10% C - 3% C - 5% C - 10% A Corret interpreted reult B Error derived from fale S loed aigned tate. C A orret loe-eye motion interpretation i obtained without need of deduting S loed in the Temporal State Diagram Analyi for auray of 3%, 5% and 10% of the ROI width and height. Row WITHOUT, A, C ontain perentage over the omplete video equene. Row B reflet perentage over the amount of total erroneou reult Study of the Diagram filtering effet and the tandard eyemotion ontraint After examining the evolution of the reult obtained from the pupil earh analyi (example from the NEON equene are on Figure IV-5a and Figure IV-6a) we realized that the variability of the data i uh that deriving animation parameter diretly from them would lead to inoherent eye motion. Thi would diturb interperonal ommuniation. Quantizing eye-motion (X and Y omponent) with the ame value enure a ommon behavior that allow u rereating natural eye motion baed on the oberved movement. Looking at the graph from Figure IV-5b and Figure IV-6b, we notie how the auray on the imilarity hoen (3%, 5% or 10%) play and important role in the determination of the moothne of the final data. For intane, looking at frame ranged on Figure IV-5b, we ee that the quantization at 10% introdue ome annoying peak inherited from the analyzed data on the left eye. Thee peak do not appear at 3% and at 5%, after the Temporal State Diagram ha been applied, beaue the Temporal State Diagram i able to hooe the bet value for thee auraie, where no oherent

113 84 Eye State Analyi Algorithm reult wa poible from the imple analyi beaue the X omponent of both eye were too different. Another effet that i viible after quantizing and filtering with the Temporal State Diagram i the uppreion of ome motion information if we do not obtain enough orret analyzed data. Thi reult an be oberved on frame ranged in Figure IV-6b. The value aigned to the Y omponent for the 3% auray do not math the one oberved for both eye in Figure IV-6a. Thi i due to the lak of data taken a orret with 3% auray; the algorithm doe it bet by applying the previou analyi reult. Although the Temporal State Diagram i applied over tate defined aounting the x and y omponent of the pupil loation, to undertand the effet of the filtering on the eye analyi, we have plotted the X-omponent behavior eparately from the Y- omponent behavior (Figure IV-5 and Figure IV-6 repetively). Evaluation of the generated animation The evaluation of the degree of naturalne generated by the animation of head model with the parameter obtained thank to our eye analyi tehnique i preented in Chapter VI. The vetor harateriti of the data extrated from the analyi enable the algorithmi extenion of our method to analyze fae preenting different poe in front of the amera. The eye expreion poe oupling i alo teted in Chapter VI.

114 Faial Non-rigid Motion Analyi from a Frontal Perpetive 85 pupil X evolution loation Left Eye Right Eye frame # minxl maxxl meanxl XL minxr maxxr meanxr XR (a) X evolution loation 3% 5% 10% frame # (b) Figure IV-5. The upper graph depit the evolution of the extrated data regarding the pupil X loation for both eye. The lower graph repreent the reulting X loation after applying the Temporal State Diagram. It how the reult for a f ( X, Y, W, H ) that quantize with an auray of 3%, 5% and 10% of W ROI (example equene: NEON )

115 86 Eye State Analyi Algorithm pupil Y evolution 400 loation Left & Right Eye maxyl minyl meanyl YL maxyr minyr meanyr YR frame # (a) Y Evolution loation 3% 5% 10% frame # (b) Figure IV-6. The upper graph depit the evolution of the extrated data regarding the pupil Y loation for both eye. The lower graph repreent the reulting X loation after applying the Temporal State Diagram. It how the reult for a f ( X, Y, W, H ) that quantize with an auray of 3%, 5% and 10% of H ROI (example equene: NEON )

116 Faial Non-rigid Motion Analyi from a Frontal Perpetive 87 IV.3 Introduing Color Information for Eye Motion Analyi Several reaon motivated u to introdue olor information a a poible oure of reliable data for the analyi of eye motion. Firt, we wanted to tudy a new way to analyze eye motion that ould allow u to omplement the energy earh algorithm previouly deribed and to inreae the auray of the motion analyi alway profiting from the ue of a Temporal State Diagram. Furthermore, the tudy would alo point out the relevane of uing olor information to analyze motion, whih we have rarely een in the literature. Our tudy ha led u to a deign of a peifi eye-opening algorithm that beome the oure to generate the parameter for Y motion of the eyelid. It alo make poible to implify the eye tate algorithm by aigning the Y motion of the pupil baed on the analyzed eyelid motion. We would then apply the dual interpretation of the orrelation pupil-eyelid harateriti exploited in the previou approah. If the implifiation i not made, then the ombination eye-opening detetion/gaze etimation i apable of deteting extreme expreion the ame way the inter-feature omplementary information from the eyebrow doe (Setion IV.4 ontain thee detail). IV.3.1 Eye opening detetion Color ditribution analyi on the eye area how that the eye an be learly laified a different from the kin in term of it hue and aturation omponent. We define the degree of eye opening a proportional to the invere of the amount of kin ontained within the analyzed ROI. To meaure the quantity of kin on the eye feature that we have extrated, we ount the number of pixel we laify a kin pixel. The laifiation i made baed on the probability of the pixel belonging to the kin. Every frame will be analyzed obtaining the opening a: (IV-3) where EyeOpening 1 probskin (IV-4) NUMpel PDF ( H = h, S = probskin ) h h, SKIN.

117 88 Introduing Color Information for Eye Motion Analyi Sine feature extrated from different video equene may have different ize, NUMpel h, i the total amount of pixel of determined hue and aturation normalized by the total number of analyzed pixel. PDFSKIN i the Probability Denity Funtion of the kin HS harateriti. The PDF i obtained by analyzing the pixel HS ditribution of different kin image. Intead of uing a general databae for non-peifi kin detetion (Sahbi & Boujemaa, 2000), we ue peaker-dependent data. In our approah, we approximate probskin probcloedeye and we obtain the PDF from a equene of frame of the loed eye of the individual to be analyzed. We perform ome training with equene of loed eye of the individual to be analyzed. The hrominane data of video image, hue and aturation, are highly dependent on the harateriti of the aquiition ytem utilized: amera, image ard, et., and on the peron nature (eye and kin olor). They are le dependent on the reording environmental ondition, uh a the lighting on the fae. Our olution need the deribed training tep to enure that the tudy we have performed reflet the uefulne of olor information in the preent ontext. IV.3.2 Gaze detetion implifiation Uing the pupil-eyelid orrelation hypothei. We an retrit the analyi to tudy the horizontal movement of the eye. Alternatively, we do not perform an exhautive an in a quare zone but a horizontal weep with a vertial retangle of area Weye i Hfeature. Equation (IV-1) i then tranformed to only look for oordinate X, whih indiate if the eye look right or left: (IV-5) X Weye Hfeature 2 min I l = 1 m= 1 = ( x l Weye/ 2, m) x, y x l Weye / 2 > 0 IV.3.3 Analyi interpretation for parametri deription To be able to yntheize eye movement, we have parameterized the analyi data obtained o it an be interpreted. We et a tight ooperation between the two previouly deribed analyi tehnique in the Temporal State Diagram (Figure IV-8), that allow u to double-hek poible erroneou reult from the algorithm. Next ub-etion develop the omplete proe for the tate diagram peifiation.

118 Faial Non-rigid Motion Analyi from a Frontal Perpetive 89 Parameterization of eye movement We define parameter to deribe the eye movement to be yntheized. Thee parameter are imple ation unit that mark how ation hould be yntheized. We have defined two parameter aording to the two analyi tehnique we ue, eye-opening (EO) and horizontal pupil orientation (HPO). Eah parameter take different value depending on the ation to perform during the ynthei. To tet our proedure and to be able to evaluate it viability in real time, EO and HPO are quantified uing 9 tate that allow u to deribe eye ation with a minimum rihne (ee Figure IV-2 for the loation ditribution aigned to the 9 tate). Table IV-2 depit the ation and the orreponding value. A the table how, no aadi motion i pereptible by uh a ation deription. Table IV-2 ACTION UNIT DESCRIPTION. open loed left enter right EO 1 0 HPO Quantifying the reult to parameterize them The analyi algorithm deribed in the previou etion generate reult that have to be paired to the proper parameter value. Computing the EyeOpening along a equene generate a funtion defining two level. The funtion adopt the highet value when the eye i open (EO=1) and the lowet one when the eye i loed (EO=0) (Figure IV-9). From equene to equene thi differene in level i fairly table but the level may be ituated at different value. The value of the level depend on the video amera and the lighting ondition. Sine we analyze the equene in a frame-by-frame bai and we do not ount on a priori reult, we define EO in relative term. To do o, we ompare the EyeOpening value of urrent frame i with the average EyeOpening value of the previou k frame (avg). If the differene, i _ avg, i greater than a ertain threhold (Th EO ) the eye ha opened, if it i maller the eye ha loed, otherwie it remain a in the previou frame.

119 90 Introduing Color Information for Eye Motion Analyi Table IV-3 THE 36 COMBINATORY RESULTS FROM THE EYE ANALYSIS. S L S R S L S R S L S R EO HPO EO HPO (*) EO HPO EO HPO (*) EO HPO EO HPO (*) A S A A A A A A A A A A S A A A A A A A A A A A S A A A A A A S 4 S L S R = left&right analyi reult; (*) = evaluation of the reult; (-) = both reult are erroneou; A = at let one of them i orret & S L S R ; S i = defined tate (S 1 =S loed, S 2 =look left, S 3 = look enter, S 4 = look right) The parameter X that we obtain from the eyeight detetion algorithm define the horizontal loation of the pupil in the feature. Finding it relative loation regarding the eye on the feature image determine if the eye look left, enter or right (HPO=-1,0,1). Figure IV-7 how on how we quantify the X value. The more preie the ynthei i wanted, the more intermediate value both ation unit, EO and HPO, hould take. In uh ae, the quantization of the pae of analyi reult would hange by adding more level. The number of quantization level mut be hoen baed on the apaity of yntheizing thoe detail by the lone animator and the image quality and ize of the feature. We mut alo evaluate if inreaing the omplexity of the quantization i appreiated when wathing the real-time ynthei. Thi an be evaluated through an on-line analyi (ee Chapter VI for detail). Thi firt parameterization provide the preliminary analyi data, whih might be erroneou. We etimate if our reult are orret and deide whih i the mot uitable ation by ombining the information obtained from both eye and both analyi algorithm in the Temporal State Diagram. Applying the Temporal State Diagram Table IV-3 how all the poible ombination of analyi reult, S L S R. They an be ompletely erroneou for both eye (-), different for eah eye, in whih maybe one i

120 Faial Non-rigid Motion Analyi from a Frontal Perpetive 91 wrong (A), or exatly the ame for left and right (S i ). Applying the ontraint of having the ame behavior in the left and the right eye we have developed our diagram of tate, ee Figure IV-8. The diagram ro-hek the behavior in both eye and etimate the bet urrent eye ation (S i t) depending on the analyi reult (A, (-), S i ) and the previou eye tate (S i t-1). X 1 HPO=-1 HPO=0 HPO=1 Weye 1/3Weye 2/3Weye Figure IV-7. Quantization of HPO looking for left eye S t =S i t-1 S i t-1 ( - ) S j A S L t poible? no S t =S i S t =S R S R ye S L t = S i,l t-1? no ye S t =S t-1 S t R poible? ye S t R = S t-1 i,r? ye no S t =S L S L no S t =S L S L Figure IV-8. Temporal State Diagram for the eye ation traking with implified gaze analyi. S i t (R/L) repreent a determined tate i at time t for either the right of left eye and S t the final reult. Chek Table IV-3 for the tate ombination

121 92 Introduing Color Information for Eye Motion Analyi IV.3.4 Experimental evaluation and onluion We have ued two et of image for our experiment; the experimental etting ha being the ame a in the previou etion. To obtain the PDF of HS value, one et ha the reorded loed eye of the peron. The other et ontain the eye of the reorded fae that i analyzed. Both et were obtained under unontrolled lighting ondition, and to redue the noie introdued by the amera we firt filtered the analyi region with low pa filter (a ombination of 3x3 ize median and average filter). The PDF ued for the final analyi i an average of the PDF obtained from different equene. To redue the influene of noie on the reult, we average the analyi area from N frame of eah equene and then we obtain it PDF. After teting our algorithm, the reult were atifatiory (Figure IV-9). In around 85% of the tudied ae the EyeOpening algorithm ould learly provide the two expeted level for the open-loe movement. The number of previou reult aounted for tate determination depend on the frame rate of the equene. For 15 f/ we have ued the previou three reult. The GazeDetetion algorithm i performing better, leading to poitive reult in around 98% of the tet. Applying the tate diagram i onvenient above all on thoe tranition area where the EyeOpening algorithm hange from open to loe, or vie vera. Regarding the peed of performane, the heaviet omputational part lie on the filtering and the omponent onverion from RGB to HSI of the video input. The importane of the filtering trongly depend on the graphi ard output quality. Regarding the onverion, we ould onider poitive to adapt thee algorithm to the YUV omponent (S:U, H:V, I:Y) ine many graphi ard provide YUV output diretly. Sine the peaker-dependent parameter Th EO and it kin PDF need to be trained, we onlude that olor onideration are ueful if the kin harateriti of the peron are available and geometrial ontraint do not interfere. The probabiliti nature of the analyi doe not allow u to extend the EyeOpening algorithm to analyze any other poe different from a near-to-frontal head orientation. The EyeOpening reult repreent a alar value with no geometrial meaning and annot be interpreted in 3D, thu they are not ueptible to be extended to be ued in other poe. The GazeDetetion value are of vetor nature thu interpreted in 3D and adaptable to undertand motion in a different poe. No tet ha been made uing a generi PDF. We believe that reult hould not hange for all thoe individual whoe hromati harateriti do not differ muh from thoe repreented in the generi PDF. We expet a dereae in performane on people

122 Faial Non-rigid Motion Analyi from a Frontal Perpetive 93 whoe hromati kin harateriti are extreme, above all if no peifi quality an be guaranteed EyeOpening Graph for Video Sequene EYES 2500 EyeOpening Value open loed (a) Frame Number Le ft E ye Right Eye Gaze Analyi Graph for Video Sequene EYES Pupil loation (entered at Weye/2) (b) Frame Number right enter left Le ft Eye Right Eye Figure IV-9. Analyi Graph for a teted equene. (a) EyeOpening (two quantization level). (b) GazeDetetion (three quantization level). The upper row how ome frame from the analyzed video

123 94 Eyebrow Motion Analyi Algorithm IV.4 Eyebrow Motion Analyi Algorithm Hitorially, eyebrow motion analyi ha been le invetigated than other feature analyi tehnique (eye and mouth). In the literature we find that the firt work trying to analyze eyebrow behavior (Huang, C.-L. & Huang, Y.-M, 1997) are onerned by the earh of expreion information. More reently, Goto, Khiragar and Magnenat- Thalmann (2001) have alo preented a method to analyze eyebrow motion to extrat faial animation parameter. The analyi methodology followed i rather heuriti and when preenting the propoed approahe the influene of the environmental ondition i very often not diued. Kampmann (2002) propoe a tehnique that i able to detet the eyebrow even if they are partially overed by hair. In general, we have not found any motion analyi tehnique that formally relate the image analyi proeing reult to the generation of the motion parameter. In thi etion we deribe an eyebrow motion analyi tehnique where the image proeing involved trie to adapt it analyi to the harateriti of the uer and the environmental ondition. We relate the reult of thi image analyi diretly to a motion template that model eyebrow motion. To tudy eyebrow behavior from video equene we utilize a new image analyi tehnique baed on an anatomial-mathematial motion model. Thi tehnique oneive the eyebrow a a ingle urved objet (arh) that i ubjet to the deformation due to muular interation. The ation model define the implified 2D (vertial and horizontal) diplaement of the arh. Our video analyi algorithm reover the needed data from the arh repreentation to dedue the parameter that deformed the propoed model. Table IV-4 NOTATION CONVENTIONS USED IN THIS SECTION S FORMULAE x n, y n : real oordinate value of the eyebrow in their neutral poition. x n [ i ], y n [ i ] : oordinate value of the pixel obtained from the video image analyi of the eyebrow in it neutral poition at poition i. x, y : real oordinate value of the eyebrow in their urrent (analyzed frame) poition. x[ i ], y[ i ] : oordinate value of the pixel obtained from the video image analyi of the urrent frame eyebrow at poition i. x, y : real oordinate differene between the urrent eyebrow arh and the neutral arh x frame x neutral, y frame y neutral, repetively. x[ i ], y[ i ] : real oordinate differene between the pixel from the urrent eyebrow arh being analyzed and thoe from the neutral arh at poition i. [0], [N] and [max] indiate the omputed value for the firt point (x = 0), the lat point (x = lat) and the point with the maximum vertial value ( x : y = max ) on the arh.

Baially, four mule ontrol the eyebrow movement: (i) (ii) (iii) (iv) Frontali (F): that elevate them. Corrugator (CS): that pull them downwardly, produe vertial glabellar wrinkle.

124 Faial Non-rigid Motion Analyi from a Frontal Perpetive 95 IV.4.1 Anatomial-mathematial eyebrow movement modeling To model the eyebrow movement, we define ome mathematial expreion that uperfiially follow the muular behavior and interation when eyebrow motion exit. Baially, four mule ontrol the eyebrow movement: (i) (ii) (iii) (iv) Frontali (F): that elevate them. Corrugator (CS): that pull them downwardly, produe vertial glabellar wrinkle. Proeru: that lower the eyebrow downwardly. Orbiuulari Ouli (OO): that loe eye and lower eyebrow. FRONTALIS CORRUGATOR PROCERUS ORBICULARIS Figure IV-10. Several mule generate the eyebrow movement. Upward motion i mainly due to the Frontali mule and downward motion i due to the Corrugator, the Proeru and the Orbiulari Ouli mule 1 Although the hape of an eyebrow i dependent on the phyial appearane of the peron, it motion i related to more general muular ation. Thi enable u to repreent eyebrow a arhe, whoe hape i peifi to the peron but whoe motion an be mathematially modeled. We parameterize the arh movement a the diplaement in the x, y and z-axi of eah of it point ompared to the initial neutral poition, when no fore i ating on it: x = x frame x neutral, y = y frame y neutral and z = z. frame z neutral Two different behavior exit in eyebrow motion, one when the expreion goe upward and another when the expreion goe downward. Different muular ation i involved in eah of them and therefore different model ontrol them. Thee 1 Image and information baed on data from

125 96 Eyebrow Motion Analyi Algorithm expreion have been derived from the obervation of the muular motion of the eyebrow and the empirial tudy of the optial flow behavior of the eyebrow area oberved on real video equene and adapting the parameter involved to the anatomial hape of the eyebrow. Eyebrow Motion Expreion: Upward: (IV-6) (IV-7) x = Ff y = Ff y x e ( x n / ) Ff ' e y ( x n / ) Downward: (IV-8) y = F y Foo y ( x ) n 2 If x < : n (IV-9a) x = F x If x > : n (IV-4b) x = Foo ( x ) F x n x Ff, Ff, F and Foo are the magnitude aoiated to the fore of the Frontali mule, Corrugator mule and the Orbiuulari Ouli repetively. The ation of the Proeru mule, being loe and highly orrelated to the one from the Corrugator, i w w inluded in the F term. x and y indiate the different omponent, = 2 and =. 3 2 All oordinate relate to the eyebrow loal oordinate ytem. Figure IV-11 depit the oordinate axi for the left eyebrow; the right eyebrow i ymmetrial over an imaginary vertial line loated between eyebrow. Y 0,0 y x w X Figure IV-11. Eyebrow model arh for the left eye and it oordinate referene. The origin for the analyi algorithm i alway ituated at the inner extreme of the eyebrow (loe to the noe) and defined for the eyebrow in it neutral tate

126 Faial Non-rigid Motion Analyi from a Frontal Perpetive 97 z = f ( x, y) i diffiult to model out of the frontal view of an eyebrow and doe not provide ritial information regarding the expreion. z annot be well etimated from a frontal orientation. If we want to yntheize realitially 3D eyebrow motion with information obtained from image analyi under thee ondition, we may etimate the z movement by auming that eyebrow motion follow the forehead urfae, thu, imulating it natural behavior. The diplaement of texture oordinate yntheti animation tehnique deribed in (Valente & Dugelay, 2000) illutrate thi onept. Thi proedure imulate the eyebrow kin liding motion on the kull. Changing the texture oordinate vertially and horizontally generate the animation; the model 3D oordinate remain unhanged, thu leaving the kull hape untouhed. Applying the formulae over two rounded arhe with different parameter value, we obtain deformation that orrepond to the expeted eyebrow deformation due to thoe fore. Figure IV-12 how the imulation of extreme modeled movement (downward and upward) on both eyebrow. U P W A R D S N e u t r a l DOWNWARDS Neutral Figure IV-12. The ation of the eyebrow behavior model applied over the neutral arh reult on a mooth deformation. The graph on the left depit eyebrow riing motion (upward) for poitive value of Ff x, Ff y and Ff y. The graph on the right repreent eyebrow frowning (downward) for poitive value of F x, Foo x, F y and Foo y IV.4.2 Image analyi algorithm: deduing model parameter We ahieve two goal by modeling the eyebrow movement. On the one hand, we implify the eyebrow motion undertanding to a point where we an derive it movement on image by omparing image data with the model parameter. On the other hand, thi model i omplete enough to generate the required information to reate yntheti eyebrow expreion. The developed image analyi algorithm trie to redue the image of the eyebrow down to the propoed model in order to tudy the ditribution of the point on the eyebrow arh. Then, it dedue the trength of the parameter involved. The analyi ompare data extrated from the urrent video frame againt the data obtained from the frame where the eyebrow i in a neutral poition or neutral frame.

127 98 Eyebrow Motion Analyi Algorithm Algorithm proedure Binarization: Normally, eyebrow and kin are eay to eparate in term of hue, and with le auray, intenity. Under regular although unontrolled lighting ondition we an differentiate eyebrow from kin and therefore binarize the feature image. We onider unontrolled lighting, any illumination over the fae that permit the eyebrow viual differentiation on the video frame. Due to the anatomial nature of the head, eyebrow do not preent the ame apet all over their arh. The area ituated on the inner part of the Superiliary arh i generally better defined and eaier to differentiate from the kin than the eyebrow arh that goe toward the joint with the Superior Temporal Line, beaue thi lat one i uually more pare. Our analyi algorithm need to detet the omplete eyebrow beaue we are intereted in tudying the overall hape behavior. We have developed a binarization algorithm that analyze the eyebrow in two different zone. One zone inlude from the enter of the fae up to the point whih i half way between the Foramen and the Zygomati proe (point where the eyebrow uually hange hape diretion and texture) and the other zone goe from there to the external end of the eyebrow. We refer to Figure IV-13 to loate the different part. To perform the binarization we apply two different threhold, one per zone. Eah threhold i obtained by analyzing the hitogram ditribution of the orreponding area. The eyebrow i taken a the darket part on the video image being analyzed. (IV-10) Th i max i min i = min i 3 If pixel_value<th i, the pixel i onidered a part of the eyebrow. The threhold ha been hoen to be at a third of the intenity ditribution beaue the analyi area over three major intenity zone, whih are well differentiated in mot lighting ondition: the eye zone under the eyebrow, the eyebrow itelf, and the forehead zone over the eyebrow. Uually, the darket one belong to the eyebrow. Figure IV-14a how the hitogram of one of the ROI. We have marked the three different zone on the hitogram. The reader mut notie that the urrent binarization proedure i uitable for normal to dark haired people. The analyi of blond peron hould undergo the tudy of more peifi riteria to binarize the image.

Faial Non-rigid Motion Analyi from a Frontal Perpetive 99 InZygomati ExtZygomati Figure IV-13. The eyebrow hange it hair denity a it goe away from the inner extreme.

We et two different binarization threhold: Th 1 for the InZygomati zone and Th 2 for the ExtZygomati Reult of the binarization proe: Thi algorithm alway detet the eyebrow even if, in ome ae, it alo

128 Faial Non-rigid Motion Analyi from a Frontal Perpetive 99 InZygomati ExtZygomati Figure IV-13. The eyebrow hange it hair denity a it goe away from the inner extreme. The bone truture of the kull determine the hading differene along the eyebrow. We et two different binarization threhold: Th 1 for the InZygomati zone and Th 2 for the ExtZygomati Reult of the binarization proe: Thi algorithm alway detet the eyebrow even if, in ome ae, it alo introdue ome artifat. Eye and hair are often labeled a being part of the eyebrow (ee Figure IV-14b, where eye i marked a eyebrow). Both, eye and hair mut not be taken into aount when analyzing the ROI to extrat the eyebrow arh. Due to predefined and fixed ituation of the artifat on the ROI, we have eaily adapted the thinning algorithm to avoid taking them a part of the eyebrow. i i i (a) (b) () Figure IV-14. The eyebrow two part binarization lead to good determination of the eyebrow area but it alo may introdue artifat by labeling eye or hair a part of the eyebrow. In the urrent eyebrow binary image we ee how the eye ha alo been deteted Thinning: We perform a vertial thinning over the binarized image to obtain the rounded arh that will define the eyebrow. The anatomial truture of the oular avity reate very dark hadow under extreme lighting ondition; therefore, ometime the

129 100 Eyebrow Motion Analyi Algorithm binarization proe an only make a rough etimation of the overall eyebrow hape. Under unknown ondition the eyebrow arh i robutly obtained by deteting the gradient hange between forehead and eyebrow on the binarized feature image. Thi gradient hange remain table under mot lighting ondition (ee Figure IV-14). Parameter dedution: The parameter that model eyebrow behavior are dedued from omparing the thinned arh at the urrent frame againt the arh obtained from the analyi of the eyebrow in it neutral poition (i.e. howing no ation). The proe tart by deduting the general eye behavior beaue our model formulate upward or downward motion with different expreion. The median vertial value of the arh (median of the y- omponent of the point haping the arh) i ompared againt the median vertial value of the neutral arh. If the urrent median i greater than the neutral one, we onlude that we are analyzing upward expreion; otherwie, the downward model repreentation i ued. After eleting the model, the mot ignifiant data from the arh are extrated and ued to obtain the model parameter. Parameter expreion: UPWARDS: ( x [0] / ) n (IV-11a) Ff x[0] e x[0] x (IV-11b) (IV-11) Ff y y[ N] y[0] ' ( xn[ N ] / ) ( e 1) Ff y y[ 0] Ff y' DOWNWARDS: (IV-12a) F x x[0] (IV-12b) Foo x x[ N] F x [ N] n x (IV-12) F y y[ 0] y[ N] 2 (IV-12d) Foo y F y y[max] 2

130 Faial Non-rigid Motion Analyi from a Frontal Perpetive 101 IV.4.3 Experimental evaluation and onluion To our knowledge, it doe not exit a databae of fae image that ompletely uit the need of our tet. Neverthele, we have tried to tet our proedure over more than one peaker; and peifially, we how the reult from the analyi of three individual of different eyebrow harateriti reorded under unontrolled lighting ondition. To tet the orret behavior of the model and it appliation for eyebrow motion analyi, we applied the binarization-thinning tehnique over the left eye on the frame of everal video equene. Then, we dedued the model parameter ontrating the frame arh againt the neutral poition arh. To verify that the obtained parameter atually orrepond to the eyebrow behavior, we have plotted the thinning reult of eah frame together with the obtained arh from applying the model over the neutral poition arh, thi reulting arh i the modeled arh. Figure IV-17 how thi arh omparion for the frame preented in the equene of Figure IV-15; they alo inlude the neutral poition arh. The bet way to evaluate the performane of our tehnique i to viually ompare arh reult; unfortunately, thi proedure i not uitable to be applied over a large amount of data. To interpret the performane orretne of our approah, we have defined two different meaurement: (i) (ii) i k m a peudo-area: ~ a = y [ i ] y [ i ], whih an be undertood a the area ontained inbetween arh k and m and it denote the hape imilarity between them; the loer ~ a i to 0 the more alike they are. We ompare the area differene between the neutral poition arh and the urrent frame arh againt the hape differene between the modeled arh and the urrent frame arh. Thi meaurement hek if the eye hape modeled by the extrated motion parameter follow the expeted eyebrow obtained from the ation; and a mean differene omparion, where we ompare the mean (average vertial value of the arh) differene between the urrent frame arh and the neutral poition arh againt the mean differene between the urrent frame arh and the modeled arh. Thi information help u to evaluate why the analyi proedure wa not able to detet the right eyebrow general ation: up/down and it alo provide information on the hape behavior of the modeled arh, by tudying the ign of the meaurement that give the etimation of it vertial poition. We onider that the algorithm ha worked orretly if the peudo-area omparion how that the modeled arh i loer to the urrent frame arh than the neutral poition arh.

131 102 Eyebrow Motion Analyi Algorithm Tet onluion Reult how that thi analyi tehnique poitively dedue the eyebrow behavior. We are able to analyze video image and extrat the few needed parameter to undertand and to later yntheize eyebrow motion. From the viual inpetion of our reult we onlude that error ome more from the image proeing performane of the analyi than from the motion model ued. Corret binarization and later thinning are ritial to obtain aurate motion parameter. Figure IV-18 plot the meaurement reult of three different tet. The perentage of etimation ue (better meaurement over the modeled arh) i around 85% for thoe equene where image quality and environment lighting ondition are tandard. For low quality video input performane drop to around 50%. We mut point out that the wort etimation uually happen for low expreion movement, where the inauray of the ituation of the analyi area (the peaker may lightly move) i large enough to milead the average reult. In thi ae, like the average differene meaurement how, we may interpret an up movement a being down or vie vera. Looking at Figure IV-16b we realize how important the orret and preie definition of the eyebrow analyi area i. The graph plot the reult of one analyzed equene along with the neutral analyzed frame of another equene where head loation and ize were not exatly the ame. Motion not due to the eyebrow expreion but to the overall head poe lead to mitaken reult. Our tet have been performed aepting that the head poe on the video equene i known and frontal. The vetor nature of the analyi reult make poible to adapt the method preented to extend it ue to any head poe. Then, it fit into the analyi approah deribed in Setion III.4 that i afterward theoretially developed in Chapter V. Uing the propoed extenion tehnique alo permit to trak aurately the ROI on the video image thu minimizing the influene of the head poe on the analyi. Frame 13 Frame 14 Frame 19 Neutral expreion Frame 20 Frame 28 Frame 40 Figure IV-15. Our tet were performed over video equene where the lighting over the fae wa not uniform. No environmental ondition were known beide the exat loation of the ROI inluding the eyebrow feature, whih remained unhanged through the equene. Here we preent the frame analyzed to obtained the reult preented in Figure IV-17

132 Faial Non-rigid Motion Analyi from a Frontal Perpetive 103 Serie1 Serie2 1.2 Serie3 Serie5 Serie4 Serie6 Serie7 Serie8 1 Serie9 Serie10 Serie11 Serie Serie13 Serie14 ROI height Serie15 Serie17 Serie19 Serie21 Serie23 Serie16 Serie18 Serie20 Serie22 Serie24 Serie25 Serie Serie27 Serie28 Serie29 Serie ROI width Serie31 Serie33 Serie35 Serie32 Serie34 (a) ROI height ROI width Fr 2 Fr 3 Fr 4 Fr 5 Fr 10 Fr 11 Fr 12 Fr 13 Fr 14 Fr 19 Fr 20 Fr 21 Fr 26 Fr 27 Fr 28 Fr 40 Fr 41 Neut. Sequ 2l (b) Figure IV-16. Corret binarization and thinning learly give the data from whih to extrat the model parameter. Graph (b) plot the mixed reult from the analyi of two different video equene. Neut. Seq.2 i the analyi of a frame where the eyebrow wa relaxed taken from a equene different from the Fr equene. Thi omparion imulate what would happen if the poe of the peaker hanged during the analyi. The poe motion would aue the movement of the eyebrow but the algorithm would interpret it a a loal eyebrow expreion (being upward when in reality it i neutral). We mut ontrol the poe of the uer to ompletely exploit the algorithm in pratial appliation

133 104 Eyebrow Motion Analyi Algorithm Neutral ModelFrame13 ReultFrame Neutral ModelFrame14 ReultFrame Neutral ModelFrame19 ReultFrame Neutral ModelFrame20 ReultFrame Neutral ModelFrame28 ReultFrame Neutral ModelFrame40 ReultFrame40 Figure IV-17. We have ompared the arh extrated from the analyi (ReultFrame) with the arh reulting from applying the motion parameter on to the Neutral arh (ModelFrame). If the motion etimation i orret both hould fall together. The anatomi-mathematial motion model niely repreent the eyebrow deformation. We ee on frame 28 how the trange thinning reult obtained at the beginning of the arh, probably due to the eyebrow-eye blending during binarization, woren the algorithm auray. Although the obtained parameter till orretly interpret the general downward movement, howing fair robutne, they are no longer able to expre the exat motion intenity

Faial Non-rigid Motion Analyi from a Frontal Perpetive 105 Sequene Ana2 Sequene Caroline Sequene Jean-Lu Sequene Ana2: mean omparion Sequene Ana2: ditane omparion 0.3 0.35 0.2 0.3 0.25 0.1 0.2 0 0.

3 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 mean_diff_neutral mean_diff_modeled Sequene Caroline: mean omparion 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 area_neutral area_modeled Sequene

3 1 6 11 16 21 26 31 36 mean_diff_neutral mean_diff_modeled Sequene Jean-Lu: mean omparion 0 1 6 11 16 21 26 31 36 area_neutral area_modeled Sequene Jean-Lu: ditane omparion 0.3 0.35 0.2 0.3 0.1 0.

134 Faial Non-rigid Motion Analyi from a Frontal Perpetive 105 Sequene Ana2 Sequene Caroline Sequene Jean-Lu Sequene Ana2: mean omparion Sequene Ana2: ditane omparion mean_diff_neutral mean_diff_modeled Sequene Caroline: mean omparion area_neutral area_modeled Sequene Caroline: ditane omparion mean_diff_neutral mean_diff_modeled Sequene Jean-Lu: mean omparion area_neutral area_modeled Sequene Jean-Lu: ditane omparion mean_diff_neutral mean_diff_modeled area_neutral area_modeled Figure IV-18. Thee plotted reult from three different equene: Ana2, Caroline and Jean-Lu illutrate the analyi behavior of the algorithm under different ondition. The algorithm prove to detet the right movement (the mean differene dereae) and to etimate the motion parameter orretly (the area dereae). We oberve the bet behavior for extreme eyebrow expreion. Ana2 equene ue rate: 90.71%, Caroline equene ue rate: 78.38% and Jean-Lu equene ue rate: 82.26%

135 106 Eye-Eyebrow Spatial Correlation: Studying Extreme Expreion IV.5 Eye-Eyebrow Spatial Correlation: Studying Extreme Expreion Generally, the more omplex motion model are, the le robut to unexpeted environmental ondition their analyi beome. Our analyi algorithm prove to perform robutly thank to their impliity. Thi impliity limit the individual motion undertanding to natural, oherent eye and eyebrow movement; thee ontraint are uitable in human-to-human ommuniation but it may undeirably filter thoe detail that add trength to the expreion, above all, in the preene of extreme emotion (joy, anger, et.). To partially ompenate thi limitation, we alo propoe to exploit the exiting eyeeyebrow motion orrelation to enrih the overall oular expreion undertanding from the individual analyi of eah feature. When the eye are loed the eyelid may behave in two different way, they may be loed without any tenion if the eyebrow are neutral or pulled up; or they may be tenely loed if the eyebrow are puhed down. When the eye are open, the level of the eyebrow height indiate the degree of opening of the eyelid. Figure IV-20 illutrate thi lear eyelid-eyebrow orrelation. Extreme eyebrow ation determine and refine eye motion by: (i) extending the information inide the eye Temporal State Diagram to inlude the interfeature ontraint derived from eyebrow analyi. For intane, having a trong downward eyebrow ation will undoubtedly reult in a loe-eye ation, even if the eye data i not reliable (Figure IV-19), (ii) deriving the final eyelid yntheti behavior from adding to the poition obtained from the pupil loation an extra term aounting for the trength of the eyebrow movement: (IV-13) with y new eyelid former eyelid = y µ fap η y eyelid y MAX eyelid y eyelid y eyelid MAX µ = and η = fap. 0 fap fap fap fap MAX former 0 former y eyelid i the analyzed eyelid y-motion reulted from applying the tandard eyetate analyi algorithm and y new eyelid i the eyelid y-motion obtained from adding the interframe ontraint. The eyebrow vertial ation parameter, denoted a fap, range from fap to fap. 0 MAX former MAX 0

136 Faial Non-rigid Motion Analyi from a Frontal Perpetive 107 S L t =f(x L t,y L t,w L t,h L t ) S R t =f(x R t,y R t,w R t,h R t ) No S L t =S R t? Ye S t =S L t =S R t No X t new = (X t L X t R )/2 S t R next to S t-1? t Ye Y new = (Y t L Y t R )/2 W t = (W t L W t R )/2 S t =f(x t new,y t new,w t,h t ) H t = (H t L H t R )/2 No Y L t & Y R t very low? Or, X L t oppoite extreme of X R t? No Eyebrow tate = downward? No S t =S t-1 Ye Ye S t =S loe S t-1 (next omparion)=s t-2 Figure IV-19. The bai Temporal State Diagram applied to eye analyi and built on only intereye ontraint (Figure IV-3) an be omplemented to take into aount the data obtained from the eyebrow analyi. (a) (b) () (d) (e) (f) Figure IV-20. When the eye i loed (lower row), the eyelid hange due to eyebrow ation an be taken a ome peifi animation. When the eye i open (upper row) it mut be taken into aount to alter the tandard y-motion of the eyelid

137 108 Eye-Eyebrow Spatial Correlation: Studying Extreme Expreion IV.5.1 Experimental evaluation and onluion During the eye-eyebrow ooperation tet, we have ompared the evolution of the eyelid motion derived from the eye-tate analyi againt the reult obtained from adding the influene of the eyebrow. In our pratial enario, ee Figure IV-21, fap = 0, therefore (IV-13) i implified to: 0 (IV-14) with y new eyelid former = y µ ' fap eyelid ( y y ) ' = fap former µ eyelid MAX eyelid. The new eyelid vertial motion, y new eyelid, i the um of it original tandard value, former y eyelid, plu a term proportional to the eyebrow fap magnitude (ranged between 0 and fap MAX ). The proportion oeffiient i dependent on fap MAX, the maximum value for the eyelid motion, and the analyzed eyelid loation. Looking at Figure IV-21 we an learly notie the exiting orrelation between eye and eyebrow motion. The hadowed part highlight the frame where the peron wa loing hi eye. The fap evolution, whih depit the ation of the eyebrow, how up that when eyebrow were moving up the eye were open (Figure IV-20 i taken from frame 221), and when eyebrow where moving down, loed eye were deteted (Figure IV-20f i taken from frame 301). Both Figure IV-21 and Figure IV-22 how that the added term to the final eyelid motion orretly etimate the inreae in motion trength and it doe not interfere with the eye analyi when no eyebrow motion i being deteted. MAX

138 Faial Non-rigid Motion Analyi from a Frontal Perpetive 109 Eyebrow fap Strength Evolution Magnitude Frame # Up Down Figure IV-21. Eyebrow fap magnitude evolution (0-fap MAX ) taken from equene NEON 330 Eyelid Motion Evolution Magnitude Frame # analyzed from 5% auray Eyebrow Correted Figure IV-22. Eyelid tandard analyi reult ompared to the orreted reult after orreting the former with the eyebrow data. Analyi made on equene NEON

139 110 Analyi of mouth and lip motion IV.6 Analyi of mouth and lip motion IV.6.1 Introdution Mouth motion analyi ha long been invetigated in different domain. It ha beome a wide field of reearh beaue many of the tehnique invetigated aim at providing helpful tool for daily-ontrained life, like for example, automati lipreading for the deaf. Related to the ope of reearh preented in thi thei, we fou on thoe algorithm that help to get more effiient way of tranmitting information in video-ommuniation ytem by ubtituting traditional fae-to-fae video by the animation of 3D lone of the peaker. Indeed, mouth analyi play a major role in thi enario beaue the auray of mouth movement and the ynhronization of the mouth ation with the audio generated during the onveration are ruial to obtain pleaant and natural ommuniation. We an onider that the overall movement of the mouth an be een a the reult of two fator: M TOTAL = M peeh M expreion. Where M peeh repreent natural motion related to the artiulation of ound and phoneme while peaking and M expreion, i the part of the motion that how the emotional expreion and peronal behavior of the individual. It i eay to diriminate the omponent of motion oming from expreion when no peeh i preent. It i more diffiult to dedue how ation from both nature interat when appearing together. Looking at thi iue from the invere perpetive, eparating mouth motion omponent baed on their nature (peeh or expreion) during the analyi, i alo a hot topi of reearh in the Faial Animation ommunity. During the reation of automati motion to be yntheized on 3D head model (uually avatar), we ombine phoneti motion information with expreion motion data. Thi ombination mut be done in uh a way that the reulting faial behavior at in a natural human way. In mot ae phoneti and expreion interation do not lead to pleaant and natural reult. The knowledge of muular interation and natural faial behavior mut be ued to dedue the right motion and utomize the animation generated after having analyzed the viual apet of the mouth ation.

140 Faial Non-rigid Motion Analyi from a Frontal Perpetive 111 To develop a omplete analyi framework, we have tudied the advantage and drawbak of mot of the method found in the literature. We have derived an approah that uit our enario by developing a imple motion model to enure that it ation parameter will be robutly deteted during the analyi, regardle of the environmental ondition. Reearh bakground in thi field The aim of mot of the analyi method that were firt developed to tudy global mouth motion wa to generate animation pattern for the reation of automati animation to ue on avatar. Reearh wa done to extrat general motion behavior of mouth. In the one hand, thee tehnique aimed at mathing peeh and phoneti information obtained from real or yntheti peeh with oherent and natural mouth movement (duality phoneme-vieme 2 ); in the other hand, they foued on aoiating fae expreion (joy, happine, adne, et.) to ome peifi related mouth motion. Beide the hardware apture devie ommonly ued, analyi method uing imageproeing were introdued. The firt image-baed tehnique relied on marker and makeup that ould eaily be extrated from the image and then analyzed. The Intitut de la Communiation Parlée (ICP-Grenoble) ha long been uing thi kind of method to deploy realiti mouth motion on 3D head model (Odiio, Eliei, Bailly and Badin, 2001). Some other laboratorie, like the Morihima Laboratory (Japan) alo tarted developing mouth motion pattern from motion aptured information before tarting with other le invaive method. It i, indeed, the invaive nature of the magneti aptor and marker along with the impoibility of eaily deploying a uable ytem in non-teting environment that ha puhed reearher toward the analyi of more or le naturally reorded video equene, either monoular or from multiple point of view. Many of the image analyi tehnique fou on the tudy of lip motion evolution along the time. Some tehnique ue deformable ontour, nake or deformable model to define lip hape and then derive the lip ation on the image (Lai, Ngo & Chan, 1996; Liévin, Delma, Coulon, Luthon & Fritot, 1999). The model an be more or le omplex and maybe baed on ome meh truture that eae deriving motion information from the analyzed data. Chou, Chang and Chen (2001), for intane, ue a meh model to extrat fae animation parameter (FAP). Thee FAP are MPEG-4 ompliant and thu uable for ommuniation appliation. Thi i one fine example of how developing flexible tehnique of analyi enable pratial uage of the obtained reearh reult. 2 Vieme: Lip, teeth and tongue natural motion ynthei (pre-etablihed motion) when pronouning a peifi phoneme. See Pandzi, I. S., & Forhheimer, R. (Ed.). (2002).

141 112 Analyi of mouth and lip motion Flexibility on the analyi implie a omplete undertanding of the analyzed image without ontrolling the environment ondition under whih it ha been reorded. Thi kind of analyi ituation ha brought up the development of everal peifi image proeing tehnique that tudy the mouth area in detail (Pahor and Carrato, 1999). The mot performing reult, thoe analyi tehnique that offer the mot realiti mouth animation are baed on anatomi information of the fae, mouth and jaw interation. Thee method relate the analyzed image data to ome muular motion parameter that exatly reprodue the behavior of the mouth. Morihima, Ihikawa and Terzopoulo (1998) tarted they reearh on monoular image following thi idea. Unfortunately the methodology they expoed had one major weakne to be adapted to any irumtane: they ue optial flow on their image proeing, whih make their tehnique untable to lighting hange. Analyzing image to obtain information from the mouth i retrited by the fat that it i very diffiult and ometime even impoible to oberve the motion of the inner part: teeth and tongue. The animation of teeth and tongue i generally derived from the previou obervation of natural human mouth ation. King and Parent (2001) have developed a omplete parametri tongue model for the peeh animation. Thi kind of ytem along with phoneti analyi of peeh oming from the peaker an help with reproduing the exat motion of the mouth while peaking. Unfortunately, it ue i only poible when peeh i preent. Mouth analyi on image i fundamental to omplement natural mouth ynthei in the abene of peeh; it alo help to utomize the tandard mouth behavior provided by the phoneme-vieme mapping. Mouth motion image analyi tehnique that ompletely neglet tongue and teeth interation annot aim at undertanding real mouth behavior. Any robut image proeing tehnique willing to obtain data to generate natural mouth movement mut alo onider analyzing teeth and tongue, in addition to lip.

142 Faial Non-rigid Motion Analyi from a Frontal Perpetive 113 Our approah to mouth analyi The image proeing tehnique developed for mouth analyi try to take the mot of ome of the tehnique revied in the literature, by analyzing the trength and weaknee and ombining them in an effiient way. Baed on the reult already obtained in other imilar tudie of mouth motion we have developed our image analyi algorithm truturally: (a) We tart with a lip pixel olor and intenity ditribution tudy - analyzing the H&I ditribution - of the mouth area. The goal of thi analyi i to define thoe peifi area belonging to the mouth (teeth, lip and tongue) o they an be egmented and well eparated. The omplexity of the H&I baed egmentation inreae with very ative mouth ation and if lighting ondition are not known, therefore the algorithm involved are adapted along the time in a frame-byframe bai. (b) Pixel ditribution, mouth hape, teeth or tongue loation etimation an only be helpful for ynthei if there i a motion model aoiated to the analyzed image data. We have developed a mathematial mouth motion model to deribe the movement from the image data obtained. Thi model i baed on muular interation and trie to ue the fewet number of ontrol point ueptible of being extrated uing flexible image proeing. Thi model an only expet to etimate the projetion of the motion on the analyzed plane; thi implie that muular information about mouth motion mut be known on the ynthei part to reate realiti reprodution. One of the differene of our model ompared to other i that we try to alo etimate jaw motion from the image of the mouth area. () Finally, it i diret to relate the image data obtained from the egmentation proe to the motion model needed to deribe the ation. Thi etion ontain a new propoal for a mouth motion model that ould be ued in the ontext of our global faial feature analyi. We develop the model baed in muular intra-feature ontraint. We alo provide the tehnial evaluation of the imageproeing algorithm propoed to extrat the data related to thi motion model. The vetor nature of the data provided by the mouth motion template make thi tehnique ueptible of being extended to analyze any other view from the peaker in addition to the evaluated frontal head poition.

143 114 Analyi of mouth and lip motion IV.6.2 Modeling lip motion with omplete mouth ation Mouth movement derived from muular interation an be deribed in Carteian 3D pae a the total vetor diplaement of eah of the point belonging to the omponent of the mouth along the x, y and z-axi. Muular interation during mouth motion i omplex and the oberved final hape of the mouth on the image plane only provide information about the x, and y omponent of the omplete diplaement. Depth information annot be retrieved from a ingle perpetive and therefore omplete motion interation mut be dedued from the projeted hape of the element involved and the tudy of anatomial mouth behavior. Knowing the dual nature of mouth omportment, monoular image analyi an be well omplemented by the phoneme reognition of the peron peeh. A een in ome pratial enario (Goto, Khiragar & Magnenat-Thalmann, 2001; Chen, 2001), the extrated phoneme information from peeh an be mapped to it vieme orrepondene. Phoneti information an learly help undertanding mouth behavior but it i not enough. The phoneme-vieme mapping tehnique an only generate tandard mouth behavior and no trae of emotion; expreion or peronalized ation an be ynthetially generated from it. Neverthele, peeh analyi i the only way to obtain motion information of thoe part that remain inviible to the amera. For our approah, we have tudied and oberved the interation among mule end bone of the head (Figure IV-23) that intervene when the mouth at. We have mathematially modeled mouth muular interation to ynthetially repliate mouth motion through the undertanding of it projeted appearane. The interet of building a mathematial model for image analyi of mouth motion ome from the fat that if there i no reliable peeh to be analyzed; the derivation of the model ation from viually analyzed data an give u the bet approximation to the tudied motion. It alo utomize mouth ation by generating more natural movement alway diretly related to the peaker individual way of peaking. Thi utomization i very diffiult to ahieve by uing only phoneti-baed animation tehnique.

144 Faial Non-rigid Motion Analyi from a Frontal Perpetive 115 (a) (b) Figure IV-23. Thee illutration preent the bone and the mule involved in the generation of mouth ation (from Image of mule and bone of the head, 2002; Simunek, 2003) ()

145 116 Analyi of mouth and lip motion Two part ompoe the motion model herein preented: the mathematial lip motion model and the trutural jaw ation model. To implify the analyi requirement we have developed the lip model that ontain the minimum ontrol point needed to repliate major lip motion due to mule ation. Mot of urrent analyi olution fou on the detailed traking of lip movement along the time; fewer approahe take into aount the motion from the teeth; and even le tudy the jaw ation. The image analyi proeing and it upporting motion model hould not ignore the exitene and interation of teeth and jaw. Although diffiult to analyze, their motion information i fundamental for the right interpretation and ynthei of mouth motion. The jaw model we have propoed trie to tudy the viual information extrated from the mouth analyi, baially teeth loation, to dedue jaw motion. Lip model The propoed model for the lip motion trie to linearly deribe the ation of the mule that play a major role in mouth motion (ee Figure IV-23a to loate the mule): ( a ) Levator Labii Superiori [4] ( b ) Zygomatiu Major [3] ( ) Joint ation from: Zygomatiu Minor & Levator Anguli Ori [5] ( d ) Joint ation from : Buinator & Rioriu [7] ( e ) Depreor Anguli Ori [8] ( f ) Depreor Labii Inferiori [9] ( g ) Orbiulari Ori [6] Figure IV-24. The hoen ontrol point oinide with the ending extreme of the major mule that intervene in mouth motion

146 Faial Non-rigid Motion Analyi from a Frontal Perpetive 117 After tudying the anatomial truture of the mouth and it mule we have deteted eight ritial point next to the lip (ten aounting for the repliation of the interior part of the lip that move orrelated to the exterior part) where the mule exert their final influene (look at Figure IV-24). We have developed the motion template preented in Figure IV V y Y V 0, W T L W Z H x -2.5 Figure IV-25. Shemati repreentation of the mouth lip and the ontrol point ating on their left ide Sine the movement of eah of the eleted ontrol point i due to the ation of peifi group of the major mule whoe ending point fall at that loation of the lip area, we an derive the projeted motion of thi ation tudying the behavior of thee ontrol point along the time. Applying the following mathematial model, whih give a linear approximation of the behavior of the projeted diplaement ( x & y) of the point belonging to the lip model due to muular ation exerted on the ontrol point, we an reprodue ome mouth hape that etimate the muular ation reated 3 : From Buinator & Rioriu ontrol point: T, applied on both lip. x i = x L x /2 [ T ] 3 Thi i the partial model to be applied to the left ide part of the mouth, ontrol point Y, Z & T mut be dupliated on the right ide and the motion model of the x-omponent of that ide i ymmetrial with repet to the y- axi.

147 118 Analyi of mouth and lip motion From Zygomatiu Major or Depreor Anguli Ori ontrol point: T, applied on both lip. y i = x L y /2 [ T ] From Levator Labii Superiori, Zygomatiu Minor & Levator Anguli Ori ontrol point: V&V, applied only on upper lip. y i x = y L / 2 [ V '] From Orbiulari Ori ontrol point Z&Y, applied to eah lip eparately x Lower lip: y i = y[ Z] 1 ab 1 & L / 4 x Upper lip: y i = y[ Y ] 1 ab 1 L / 4 From Depreor Labii Inferiori ontrol point: W&W, applied only on the lower lip. y i x = y L / 2 [ W '] Control point Z & W do not only how the interation amongt mule but alo the phyial diplaement of the jaw. Control point T alo behave differently when jaw rotation exit. It i very diffiult to dedue jaw motion jut by looking at the mouth evolution. A frontal view the fae doe not provide the bet image perpetive to appreiate jaw rotation but the teeth-lip loation an help to dedue jaw motion. There exit other motion template option for the mouth in the literature. Some of them, like the one propoed by Chen (2001), ue a motion model that only ontrol the width and the height of the mouth; thi model give limited information regarding the ation and annot dedue omplex mouth movement. Some other, like the one preented by Chou, Chang and Chen (2001), propoe the ue of the point that already belong to the mouth meh for the ynthei. Their olution ha the advantage of being alable; they inreae or dereae the number of ontrol point depending on the yntheti model omplexity. In their approah, they do not jutify the number of the point they ue from a muular motion perpetive and they do not determine the mot uitable number of ontrol point for the analyi. Our template trie to give a trade-off between impliity, uing a minimum number of ontrol point, and performane, extrating the maximum motion information.

148 Faial Non-rigid Motion Analyi from a Frontal Perpetive 119 (a) (b) () (d) Figure IV-26. Thee image how the reult (red olor) of applying the diplaement hown in the Table preented below onto the ontrol point of a mouth on it neutral tate (grey olor). The global deformation of the mouth i obtained uing the linear approximation propoed. Mouth proportion in neutral tate: L=8 & H=4 (e) (f) (a) x(t) = -1.5; y(t) = 0; y(u') = 0; y(u'') = 0; y(z) = -0.5; y(y) = 0.5; (b) x(t) = 2; y(t) = 0.25; y(u') = 0; y(u'') = 0; y(z) = -0.5; y(y) = 0; () x(t) = -1; y(t) = -0.5; y(u') = 0; y(u'') = -1; y(z) = -1; y(y) = 0; (d) x(t) = -1.5; y(t) = -0.25; y(u') = 0; y(u'') = -1; y(z) = -1; y(y) = 0.5; (e) x(t) = 1; y(t) = 2; y(u') = 0.75; y(u'') = 0; y(z) = 0; y(y) = 0.5; (f) x(t) = 1; y(t) = 2; y(u') = 0.75; y(u'') = -1; y(z) = -1; y(y) = 0.5; In Figure IV-26, we preent the viual reult obtained from applying fore (repreented by ome peifi magnitude value) on the ontrol point of our motion model. With the deigned motion template we an generate a rih variety of mouth expreion that will be uffiient to analyze tandard mouth behavior. Jaw motion: The importane of the referene oordinate ytem for global mouth motion undertanding The propoed mathematial model define lip behavior independently of the origin of the fore for it deformation. The image analyi algorithm will have to evaluate if the movement i due to muular ation, jaw rotation or both. Due to natural ontraint the ation oming from the jaw will be related to the degree of openne of the mouth and the proportion of teeth that are viible behind the lip. The upper part of the mouth and the upper teeth remain alway rigid and table, their motion an only ome from the rigid motion of the head; therefore they an be et a a proper point of referene for the non-rigid analyi. Thi information will help to dedue the omplete ation of the jaw, from the lip loation regarding the upper and the lower teeth.

149 120 Analyi of mouth and lip motion Figure IV-27 preent everal jaw-motion ombination that illutrate the importane of traking jaw during mouth analyi. In the firt row, frontally imilar lipteeth projetion are due to mouth-jaw motion of different nature. We how that viually imilar lip behavior may ome from very different mouth ation and that for omplete mouth motion undertanding, more than jut lip traking i needed. neutral Almot the ame frontal lip poition Axi line of the mouth All lip diplaement i due to jaw motion No jaw motion: it i poible but rare No jaw motion: while miling i a ommon ation Intermediate ituation where both lip trething and jaw motion appear. It i the mot ommon ae. Figure IV-27. Shemati repreentation of how jaw motion influene the teeth-lip loation plotted for ome key mouth movement

150 Faial Non-rigid Motion Analyi from a Frontal Perpetive 121 Final Diuion A lip/teeth/jaw model i apable of repreenting the mot ommon mouth ation, trange mouth hape are diffiult to analyze and onequently are very diffiult to model by jutifying their muular nature. For example, we onider unexpeted mouth ation that ount on the external intervention of the tongue. Neverthele, the omplete feature analyi algorithm mut be aware that the motion template i retrited to the analyi of lip, teeth and jaw but that there alo exit other ation. Above all, it mut detet when the mouth feature i not ompletely viible or doe not how a hape diretly derivable by the motion template. In uh ae, the analyi mut give the bet approximation of the mouth hape, following natural mouth behavior ontraint. It mut not generate weird interpretation in the preene of element that make the image analyi impoible. IV.6.3 Image analyi of the mouth area: Color and intenity-baed egmentation To extrat meaningful information from the mouth area of the fae being analyzed, we have deided to build an image proeing algorithm that robutly egment the region of the mouth in three major part: lip, teeth and unknown dark area inide the mouth. Depending on the part we intend to egment the algorithm ue information from the olor or the intenity of the pixel. To build the egmentation algorithm we firt tudy the hitogram pixel ditribution on HSI olor pae. Eah part of the mouth hare ommon harateriti and are well loated in eah hitogram ditribution. The olor/intenity hitogram of the region of interet around the area i omputed for eah frame. From eah of the hitogram we dedue whih zone of the image belong to the lip, the teeth and the inner part of the mouth. Hitogram omputed frame by frame reflet the olor and intenity ditribution variation due to environmental hange, like for example the different lighting ondition of analyi. We have developed and teted two different algorithm to egment the mouth ROI: ( i ) deduing the egmentation threhold on H and I from the evolution of the hitogram ( ii ) deduing the egmentation threhold on H and I from the tatitial analyi of the hitogram.

151 122 Analyi of mouth and lip motion Hitogram baed algorithm for egmentation To tudy the Hue and the Intenity pixel ditribution of the mouth area we have plotted the hitogram of the image of the mouth area for three different mouth onfiguration: loe, open with no viible teeth, open with viible teeth. We have preferred to ue the implified logarithmi hue tranform propoed by Liévin and Luthon (2000) 2 aro R G B = 2 (2 ( ) ( ) ( ) r g r b g b B G if ( > ) = 2π ele = 255 H = 2π rather than the traditional tranform I G 256 H = R 255 I if G < R if G R beaue it i more robut to lighting ondition. In their work, Liévin and Luthon prove that thank to thi tranform, they are able to detet lip in ituation where the traditional H tranform ould not. It alo generate a hitogram where the lip area i learly differentiated from the other omponent of the mouth. From the tudy of the different hitogram we have onluded: (a) The lower value of the I hitogram learly determine the dark area inide the mouth. (b) We are no longer inide the dark area when the hitogram value tart to inreae oniderably. () The H hitogram how two major hue onentration. The bigget one belong to the kin area of the ROI of the mouth and it dereae when the mouth i open, the mallet one belong to the lip area and it remain table regardle of the tate of the motion of the mouth. (d) Teeth are learly deteted oberving the evolution of the hue hitogram; the preene of teeth inreae the amount of pixel around 255 trongly. (e) The hape of the I hitogram varie depending on the lighting ondition although the darket area (lowet I value) alway belong to the inner part of the mouth.

152 Faial Non-rigid Motion Analyi from a Frontal Perpetive 123 (f) The hape of the H hitogram i quite table againt hange on the lighting ondition; it only hange depending on the natural kin harateriti of the individual being analyzed. If the olor of the kin i loe to the olor of the lip, the ditane between the maxima belonging to eah of the two different H group approah and they may even blend (making egmentation rather diffiult). The overall loation of the hue ditribution (repreented by it mean value) hift to the left or the right alo depending in the general kin harateriti of the peron. Figure IV-29 plot different I-hitogram belonging to the ame peron obtained after analyzing frame of a loed mouth, an open mouth with teeth and an open mouth without teeth. Figure IV-30 how the H-hitogram for the ame individual under the ame analyi ondition. Thi hitogram analyi i made on a tight area (ROI) urrounding the mouth in order to avoid the interferene of undeirable external part of the head like the hair, or objet from the bakground. From the hitogram we an extrat the threhold value that will determine the area belonging to the lip, the dark inner part and the teeth. After threhold have been et we mut label the pixel of the mouth zone. The labeling i performed on a wider area that over a larger extenion of the fae (above all toward the hin). The area of analyi for thi proe i extended beaue mouth movement ould be extreme and go outide the afe zone for the analyi of the mouth hue and intenity propertie (ee Figure IV-28). External analyi area: egmentation of lip, teeth and darkne Internal analyi area: hitogram analyi of hue and intenity Figure IV-28. Area delimited for the hitogram tudy and for the mouth motion analyi

153 124 Analyi of mouth and lip motion Amount of pixel Pixel value (a) loed mouth Amount of pixel Pixel value (b) open mouth with teeth Number of pixel Pixel value () open mouth no teeth Figure IV-29. Intenity hitogram

154 Faial Non-rigid Motion Analyi from a Frontal Perpetive 125 Amount of pixel Pixel value (a) loed mouth Amount of pixel Pixel value (b) open mouth with teeth Amount of pixel Pixel value () open mouth with no teeth Figure IV-30. Hue hitogram

155 126 Analyi of mouth and lip motion To determine the threhold for the labeling proe we propoe two poibilitie: Tangent evolution of the hitogram to determine the threhold for egmentation: Thi approah analyze the tangent of the hitogram at eah point looking for the udden lope inreae that determine the hange from the darkne area to the ret of the mouth area in the I hitogram. The ame approah i utilized to detet the hange of lope that eparate the lip area from the ret of the mouth in the H hitogram (See group of graph in Figure IV-29 and Figure IV-30). Sine hitogram are very noiy the tangent omputation i made after having moothed the hitogram by omputing the loal average of every value (blak urve on the graph). The two empirially hoen threhold are: th I th H = I : tan( Hit( I )) = 50deg = H : tan( Hit( H )) = 40deg Thee value have been hoen baed on the analyi of mouth homogeneouly illuminated and where lear egmentation ould be done. Segmenting with thee threhold ha been afterward teted onto other image with unknown lighting ondition to hek it robutne. Statitial analyi of the hitogram to derive the threhold for egmentation: We propoe another approah that intend to dedue the threhold for the egmentation proe by uing the tatitial value of the hitogram (min, max, mean, mode and tandard deviation). Intead of analyzing the loal behavior of the hitogram to dedue the optimal value of the threhold we tudy it global harateriti. The two empirially hoen threhold are: th I = min( I ) tdev( I ) th H = mode( H )-tdev( H ) One again, thee threhold have been hoen after the analyi of mouth homogenouly illuminated and they have been teted on the ame blok of fae reorded under unknown ondition to ompare it robutne with the previou approah. Reult about the tet performed to tudy the performane of both approahe are given in ubetion Evaluating the performane ( )

156 Faial Non-rigid Motion Analyi from a Frontal Perpetive 127 Labeling mouth part We egment the different part that belong to the mouth following thee riteria: pixel lip if pixel H < th i pixel darkne if pixel I < th i i i Teeth determination (ombination of hue and intenity knowledge) 1 t approah: pixel i teeth if pixel i lip & pixel ih > thteeth where th teeth depend on the intenity ditribution of the mouth area and ha been hoen to be: max(h) (max(h)-min(h))/20. 2 nd approah: pixel teeth if pixel lip & pixel H mode( H ) i i i > H I Evaluating the performane of the propoed approahe: To ompare both approahe and dedut whih one i more onvenient we have utilized a video databae provided by the Mathemati and Computer Siene Department ate the UIB, Univerity of the Baleari Iland, (2002). Thi databae ontain 60 video of everal eond of fae of 60 people - majority of Cauaian kin harateriti. Eah individual wa reorded under homogeneou lighting but illumination ondition differ from ae to ae. The following table preent the algorithmi performane for the firt and eond approah; it i hown a the perentage of video equene where poitive expeted lip, teeth and darkne detetion and egmentation wa oberved. The reult were obtained after tudying qualitatively the egmentation proe. Table IV-5 QUALITATIVE ANALYSIS EXPRESSED AS THE PERCENTAGE OF PERFORMANCE SUCCES ON UIB S DB 1 t approah 2 nd approah Lip 59.32% (1) 30.50% (2) 62.71% (1) 23.72% (2) Teeth 2.32% (1) 23.25% (2) 72.27% (1) 18.18% (2) Darkne 66.10% (1) 22.03% (2) 83.05% (1) 13.55% (2) COMPARING THE 1 ST APPROACH WITH THE SECOND APPROACH MEASURING THE SUCCESS AT: (1) RIGHT DETECTION & RIGHT SEGMENTATION IN MOST FRAMES (2) RIGHT DETECTION BUT INCORRECT SEGMENTATION IN MOST FRAMES We refer the reader to Appendix IV-I to find detail on the tet performed, where reult and omment on the harateriti of the people analyzed have been reorded.

128 Analyi of mouth and lip motion Figure IV-31 ontain ome hot from a few equene of the databae ued where the area labeled a being lip have been urrounded

The tability of the hitogram i too weak loally and the applied moothing depend on the analyi ondition.

Therefore threhold obtained from the analyi of the tangent evolution of the hitogram are not table enough.

Sreen hot of ome of the 60 video analyzed for the tet.

157 128 Analyi of mouth and lip motion Figure IV-31 ontain ome hot from a few equene of the databae ued where the area labeled a being lip have been urrounded and the dark part ha been deteted and marked in blak. Conluion Comparing approahe: After tudying the two approahe propoed, the eond give better reult. The tability of the hitogram i too weak loally and the applied moothing depend on the analyi ondition. Changing the degree of moothne alo implie hanging the threhold value. Therefore threhold obtained from the analyi of the tangent evolution of the hitogram are not table enough. Statitial data eem to remain more robut to all ae and are le dependent on noiy harateriti of the hitogram. (a) (b) () (d) Figure IV-31. Sreen hot of ome of the 60 video analyzed for the tet. In the image the lip area are urrounded by read and the lip eparation and darker inner part of the mouth deteted in blak. The eond approah wa ued for the analyi of the preented hot. (a), () and (d) how the right regimentation of the lip. (a), (b) and (d) illutrate the orret egmentation of the darket part of the mouth. In the four ae we alo oberve the egmentation of other unwanted area, thu reating artifat that ought not to be taken into aount during motion analyi

158 Faial Non-rigid Motion Analyi from a Frontal Perpetive Regarding the tudy of the hue harateriti of the image: Lip egmentation baed on hue analyi i an effiient tehnique a long a there i a notieable tone differene between kin and lip. There are ae where thi tone differene doe not exit: tanned people, lip o thin they do not appear on the image, et. Hue harateriti an alo hange with the influene of the olor nature of the oure of lighting on the fae (neon light veru yellow bulb light, for intane). The hange of olor on the urfae ould interfere and hange the overall hue ditribution hitogram. - Detetion of teeth: The firt approah, whih i baed on plain hitogram harateriti tudy, i able to detet teeth if they are preent but it doe not egment the omplete teeth zone well. The eond approah, baed on the tudy of the hitogram tatiti, detet and egment the omplete teeth area but produe over egmentation introduing noie on the egmented image by aigning the label of teeth to ome pare pixel on the kin but never on the lip. To extrat ueful and omplete information the eond approah hould be ued but the analyi proe mut take into aount the exitene of noie data ituated around the mouth - Reliability of the obtained egmentation: If the influene of the lighting on the hue harateriti i known all the egmented data (from teeth, darkne and lip) have the ame level of reliability. Sine the natural oure of lighting i not generally known, intenity value determining darkne and teeth are in mot ae the mot reliable information and motion analyi hould tart by extrating ueful information from thee egmented part. Statitial analyi method are more reliable than ditribution-baed analyi tehnique.

159

160 V Extending the Ue of Frontal Motion Template to any other Poe Auming that we ontrol the uer poe i an important retrition when doing analyi for videoonferening purpoe. In real enario thi aumption beome a major drawbak. Yet, many virtual teleommuniation heme try to avoid the poeexpreion oupling iue by minimizing it effet. In thi ae, for the analyi algorithm to remain robut, thee heme only allow the uer light hange in poe. In thi hapter we deribe a tehnique that extend the previouly detailed near-tofrontal feature analyi algorithm to any given poe of the head to allow the uer more freedom of movement in front of the amera.

161

162 Extending the Ue of Frontal Motion Template to any other Poe 133 V.1 Introdution In the literature we have found two major approahe to adapt frontal faial motion and expreion analyi algorithm to any poe: 1. Deigning one feature template per eah poe: after developing and teting motion template on frontal fae, they are redefined baed on different predetermined fae poe. For intane, thi i the olution given by Tian, Kanade and Cohn (2001). They overome the poe limitation in their analyi by defining a "multiple tate fae model", where different faial omponent model are ued for different head tate (left, left-front, right, down, et.). Thi analyi trategy i limited. The omplexity of thi olution inreae with the number of tate, whih will be large if muh freedom of movement i wanted. 2. Retifying the input image: the image to be analyzed i tranformed to obtain an approximation of the fae viewed from a frontal perpetive. Then, the image proeing algorithm defined for frontal fae analyze thi new image to obtain the orreponding feature template (Chang, et al., 2000). Thi olution work niely for lightly rigid movement. Signifiant rotation and tranlation annot be ompenated with imple image tranformation beaue: the appearane of eah fae feature doe not only depend on the projetion due to the poe but alo on it 3D hape, therefore a 2D retifiation done without aknowledging the 3D nature of the feature annot be aurate; the retified image may be miing ome area oluded on the original image; and 2D retifiation may alter the lighting pereption and the anatomial hape of the feature, whih i very important in feature-baed image analyi. We propoe a different approah to do frontal motion analyi adaptation. Our olution ue the knowledge of the head poe and the uer phyiognomy to interpret the expreion in 3D pae intead of on the image plane.

163 134 Feature Template Adaptation V.2 Feature Template Adaptation The algorithmi adaptation proe follow thee tep: ( a ) We firt redefine the motion model, region of interet (ROI) and image proeing parameter aoiated with eah feature template in 3D, auming that the head i faing the amera, in it neutral poe. ( b ) Next, we ue information regarding the rigid-motion of the head on the analyzed frame to projet the 3D defined ROI and other analyi ontraint of eah feature on the video image. Then, we apply the image proeing required to extrat the data for the model. ( ) Finally, we invere the projetion and the poe tranformation of thoe data to obtain their 3D equivalent that will be ready to be ontrated againt the motion model already defined in 3D. Figure V-1 preent a graphial interpretation of the adaptation proe applied to the analyi of the eye feature. For the adapted analyi we mut define: ( i ) an obervation model. To develop the adaptation, we onider our analyi enario: one 3D objet (head) in front of one amera that aquire the video image that are analyzed. We etablih the neutral poe of the head, when the fae i ompletely entered on the image and tatially looking toward the amera enter. The obervation model mathematially deribe the relationhip between the oordinate of the head objet in it neutral poe and the final view of the fae on the video or image plane. Thi mathematial model enable u to interpret data aoiated to the head from the modeled 3D pae to the image 2D pae and vie vera. ( ii ) a 3D model of the head. The template motion analyi tehnique defined for a frontal view aume to know the loation of the fae feature on the image plane. Similarly, during the adaptation we need to know the phyiognomy of the peron faing the amera o to be able to aurately loate the feature in 3D pae. We ue the vertex data of a highly realiti 3D repreentation and of the peron and it model texture to determine the poition of the ROI of eah feature. ( iii ) a onvenient urfae approximation per feature. The analyi template are originally defined to analyze the information on the image plane. We an eaily adapt thee motion model by diretly mapping eah one of them on a

164 Extending the Ue of Frontal Motion Template to any other Poe 135 urfae parallel to the image plane and ituated on the determined loation of the feature on the 3D head in it neutral poe. To obtain the mot uitable parallel plane, we develop the linear approximation of the urfae that over the region of motion of eah feature. 2D 3D KALMAN Poe Parameter 2D 3D B A 3 Eye State Parameter Neutral Frontal Figure V-1. Thi diagram illutrate the general adaptation proe applied to the eye analyi algorithm. Firt, the vertie that define the 3D ROI on the linear urfae model are projeted onto the image plane. Then the image-proeing algorithm retrieve the deired information by analyzing inide the delimited area. To undertand the motion of the feature, data are interpreted in 3D pae, over the motion model that ha been defined on the linear urfae approximation of the eye feature viewed from a frontal perpetive. One the motion i interpreted it an be reprodued on a yntheti head model. The projetion and the undertanding of the image information are poible beaue the ytem ontrol the 3D poe of the head with repet to the amera

165 136 Obervation Model V.3 Obervation Model The obervation model utilized to relate the head in it neutral poe (faing the amera) and it projeted repreentation take into aount the rigid motion (tranlation and rotation) of the head oberved from the referene origin and the projetion due to the amera. Although the aquiition amera i not alibrated beaue we do not ontrol the nature of the input equene, we an till onider that it make a perpetive projetion, and not an orthogonal one. The referene origin i ituated along the optial axi of the amera and on the image plane. The image plane repreent the video image where the fae i foued. The foal ditane F, repreent the ditane from that plane to the optial enter of the amera. To deribe the rigid motion of the head we have defined three tranlation, along the X, Y and Z-axi, and three rotation, around thee ame axe. Figure V-2 preent the graphial interpretation of the model and the orientation of the referene axe. Figure V-2. Shema of the referene ytem and amera model ued (of foal length F) for the adaptation proe. It etablihe the relationhip of a point in the Eulidean pae ( x,y ) T x and it projeted ounterpart on the amera image plane n = n n,z n ( x,y ) T F x n F x p = p p =, F z n F negative part of the Z-axi y z n n T. The axi orientation i uh that the amera only ee the

166 Extending the Ue of Frontal Motion Template to any other Poe 137 V.3.1 Mathematial deription of the model We deribe point uing their homogenou oordinate to be able to deribe a perpetive tranform linearly and eaily derive the relationhip between 3D neutral oordinate and 2D projetion. Any vetor T w z y x ),,, ( i a homogenou point if at leat one of it element i not 0. If a i a real number and i not 0, T w z y x ),,, ( and T aw az ay ax ),,, ( repreent the ame homogenou point. The relationhip between a point in 3D or 2D Eulidean pae and it homogenou repreentation i: H D z y x z y x ),1,, ( ),, ( 3 and H D y x y x ),0,1, ( ), ( 2 We an obtain the Eulidean repreentation of a homogenou point only if 0 w : D H w z w y w x w z y x 3 ) /, /, / ( ),,, ( and D H w y w x w z y x 2 ) /, / ( ),,, ( Tranformation matrie that deribe rigid motion Tranlation following vetor T Z Y X t t t ),, ( : = Z Y X t t t t t t Z Y X ),, ( T. Rotation by an angle of rad around the X-axi: = , X R. Rotation by an angle of rad around the Y-axi: = , Y R.

167 138 Obervation Model Rotation by an angle of rad around the Z-axi: = , Z R. Obervation equation The final loation of the head regarding the referene origin i obtained applying the tranlation and rotation matrie upon the oordinate of the head in it neutral poe. T n T tran x G x = where z y x t t t Z Y X,,, ),, ( R R R T G = Then, the poition head i faing the amera i defined when (0,0,0) ),, ( = Z Y X t t t, 0 =, 0 = and. = 0 The oberved projetion on the image plane i: (V-1) T tran F F T p x T P x = ),, ( 0 0, where = = F F F F F F F F F F ),, T ( P repreent the omplete projetion from the ombination of the perpetive projetion matrix, F P, whoe origin i loated on the optial enter of the amera and the tranlation F along the Z-axi, ),, ( F 0 0 T, that reloate the origin of the referene axi on the image plane (jut like it i in our obervation model in Figure V-2). We obtain the following expreion to relate the homogenou oordinate of the point belonging to the head in it neutral poe and their oberved equivalent repreentation on the image plane: = n n n n Z Z Y X p p p p w z y x F t F t Ft F F F Ft F F F w z y x ) ( ) ( ) (

168 Extending the Ue of Frontal Motion Template to any other Poe 139 After tranforming the homogenou oordinate to Eulidean pae oordinate ( 1 = n w and p z i not taken into aount), the obervation T D p p y x 2 ), ( on the image plane of a given point T D n n n z y x 3 ),, ( belonging to the fae in it neutral poe i: (V-2) = Y n n n X n n n D t z y x t z y x N F y x p p ) ( ) ( 2 F t z y x N z n n n = ) ( ) (

169 140 Model Inverion V.4 Model Inverion To find whih i the original neutral oordinate of a given point from the video image of a faial feature, we need to invert the previou projetion and poe tranformation. Rigid motion tranformation invere: T tran T n x G x = 1 ),, (,,, 1 ),, ( 1, 1, 1, 1 Z Y X Z Y X t t t x y z t t t x y z = = T R R R T R R R G = ) ( ) ( ) ( ) ( Z Y X Z Y X Z Y X t t t t t t t t t Thi invere tranform define a bijetive operation in the Eulidean 3D pae, one given neutral point, T n x, and relate to one and unique tranformed point, T tran x. Projetion invere: = = p p p p p F F T tran w z y x F F F F 1/2 1/ /2 1/ / / ] [ 1 ) 0,0,0 T ( x P x Thi invere tranform doe not define a bijetive operation in the Eulidean 3D pae. Inverting the projetion generate a traight line that goe through the optial enter of the amera and that define the ray of poible olution in 3D pae for a given projeted point. By iolating the neutral oordinate in Equation (V-2): (V-3) = P Z p Y P Z p X D n n n P P P P P P y t F y t x t F x t z y x F y F y F y F x F x F x ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 3 we point out the non bijetive nature of the obervation model. The aurate 3D oordinate of one peifi point are the olution to the interetion of the ray of olution with the 3D urfae to whih the point belong.

170 Extending the Ue of Frontal Motion Template to any other Poe 141 V.4.1 Feature urfae approximation To be able to interpret the data obtained from the image plane we need to know the neutral 3D urfae to whih thee data belong. Thi urfae will be the needed ontraint that will enable the ytem to find a olution to the inverion of the obervation model. The motion template will be diretly adapted on thi urfae, therefore the deired harateriti of eah feature urfae are: ( 1 ) it mut ompletely over the region of interet of the faial feature; ( 2 ) it ha to be a loe a poible to the real feature urfae ; and ( 3 ) it mut be defined on the head on it frontal poition a a parallel urfae to the image plane. Foring thee requirement we enure that there will be a bijetive relationhip between the deribed neutral 3D pae of the template motion model and the 2D image of eah feature on the video. To obtain the deired urfae we tudy the anatomial truture of the feature; then, we model it with a urfae that over the analyi area and that i tangent to the motion that i going to be analyzed; and finally, we give a linear approximation of uh a urfae that i parallel to the image plane. A realiti 3D head model of the peron being analyzed i ued to tudy the peifi phyiognomy. Thi 3D head model will alo be the referene 3D objet that will determine the obervation model ued, and it foal length F. The ROI of eah feature and it template parameter are defined on the obtained plane (Setion V.5 and V.6 over both iue in detail). Inverion olution for a general plane The general expreion of a plane i: 0 = D Cz By Ax. If it i deribed in homogenou oordinate, it an be een a the olution to the equation 0 = n T x p, where [ ] T T D C B A = p. Thi urfae ontraint i added to equation ytem (V-3) obtaining (V-4) = P Z P Y P Z P X D n n n P P P P P P y t F y t x t F x t D z y x F y F y F y F x F x F x C B A ) ( ) ( ' ) ( ) ( ) ( ) ( ) ( ) ( 3

171 142 Model Inverion whoe olution give a unique orrepondene D T n 3 ) (x for a given D T p 2 ) (x oberved on the image plane. Inverion olution for a plane parallel to the image plane Surfae parallel to the image plane are thoe that have 0 = = B A, 0 C and 0 D. Impoing M C D z n = =, equation ytem (V-4) i then implified: (V-5) = Mb b Ma a y x b b a a D n n & M z n = X Z p p p p t F F t x a F x a F x a F x a = = = = ) ( ) ( ) ( ) ( Y Z p p p p t F F t y b F y b F y b F y b = = = = ) ( ) ( ) ( ) ( ) ( ) ( To implify the viual preentation, ϕ tand for o(ϕ), ϕ tand for in(ϕ) and t ϕ tand for tan(ϕ), P for projeted oordinate and n for neutral 3D-oordinate. Capital letter repreent matrie and vetor, and lower ae letter oordinate and vetor omponent. F tand for the foal ditane value of the projetion ytem.

172 Extending the Ue of Frontal Motion Template to any other Poe 143 V.5 3D Definition of Feature ROI Defining the ROI over the 3D head model allow the analyi ytem to ontrol the evolution and hange of the analyi area on the video equene aued by the hange on the head poe. We obtain the area to analyze by projeting thee 3D region on the image plane. Thi proedure automatially rehape the area on the video image following the feature appearane. The expreion of the area of one feature (Figure V-3) i: ( x x Area = bae height = A B in(aro( P P 3 p 1 p )( x 2 p x 4 p ) ( y A B P P 3 p y 1 p )( y 2 p y 4 p ) )) where i i A P = 3 p 1p, B P = 2 p 4 p and i = ( x p, y p ). p (x top, y top ) 2 p height 1 p A B 3 p 4 p bae (x bottom, y bottom ) Figure V-3. Example of deformation and framing of one feature ROI Controlling the ize and hape of the feature projetion alo permit to foreee if the information that will be obtained from the analyi of the targeted zone will be relevant. After oberving the graph plotted in Figure V-4, we realize that the area of analyi projeted on the image plane baially reahe it maximum value when the head i around it neutral poe, although the exat maximum depend on the ROI 3D loation. Then, it preent a dereaing evolution when the head move. Motion along the optial axi of the amera doe not follow thi global behavior. Thi inreaing trend in the ROI area repreent the onequene of approahing the amera. If the ROI on the image plane dereae, the amount of detail that the fae feature will preent on the image will diminih a well. Our analyi algorithm may not be able to extrat data from ertain feature if they are too mall. Knowing the poe and how the feature ize behave, it an help to prevent performing analyi that will not ueed. For intane, we an define a threhold area under whih the algorithm will not perform beaue we onider that there will not be enough viible urfae. Thi

173 144 3D Definition of Feature ROI threhold i dependent on the pratial implementation of feature analyi ytem: imageproeing tehnique ued, ize of the input video, et. Area value Area value Area evolution with tx, ty, tz unit ty tx tz Area evolution with angle, and alfa beta gamma alfa-gamma Figure V-4. Thee graph depit the evolution of the feature projeted ROI depending on the poe of the head. We oberve the influene of eah of the poe parameter independently and the angle jointly. The tudied obervation model imulate the ROI of an eye. It ha F = 15 A unit, the major axi ( A ) and the minor axi ( B ) are defined from: 1 = (20,0,4.8) n ; 2 n = (25,7.5,4.8) ; 3 n = (30,0,4.8) and 4n = (25, 7.5,4.8) ; the area of the ROI urfae omputed in 3D i 150 unit For pratial purpoe, to make the deformed area more uitable for image analyi we enloe them in video analyi retangle (x top,y top ) (x bottom,y bottom ): ( x top, y top ) = (min( x ) ROI, max( y) ROI ); ( x bottom, y bottom ) = (max( x ) ROI, min( y) ROI )

174 Extending the Ue of Frontal Motion Template to any other Poe 145 V.6 3D Template Modeling for Eye, Eyebrow For eah one of the faial feature that we want to analyze (eye, eyebrow and mouth) the adaptation of the motion model i done after having developed the plane parallel to the image plane of the obervation model. The hoen plane i the bai of the new adapted template parameter. Originally, the parameter where determined baed upon a loal referene ytem related to the ROI of the feature itelf (and the image frame in general). The firt tep i to reloate the ROI oordinate on the urfae approximation for the peifi urfae and then determine ROI ROI ROI ROI their 3D oordinate on the plane ( x n, yn, M ). x n and y n are obtained from the projetion of the 3D-ROI point eleted from the head model urfae. Any template analyi parameter related to the image proeing involved (it alo inlude the ROI), or any other anatomial motion retrition, are attahed to the phyiognomy of the peron and deribed by the 3D oordinate that relate them to referene ytem of the obervation model. During the analyi, not only the oordinate of the ROI are projeted but alo the oordinate of all thoe parameter that determine the image analyi and that are related to the feature anatomial truture. The point that define thoe parameter are obtained from the head urfae model and alo orthogonally projeted onto the template plane. Thi etion preent the urfae approximation deigned for the implementation of our algorithmi extenion to 3D. Setion V.8 will diu how other urfae deign are poible and what are the impliation of uing non-linear urfae. V.6.1 Eye We model the eye a the phere ( x n x 0 ) ( yn y 0 ) ( z nz 0 ) = Rad that better uit the eye on the 3D head neutral repreentation (ee Figure V-5). We have hoen the pupil in it neutral poition (frontal) a the point around whih we will develop the linear approximation. Thi hoie implie developing the plane tangent to the phere at the pupil point (ee Figure V-5 for a graphial repreentation). A plane tangent to the phere i uh that it normal i vetor r, whih i given by: r = ( A, B, C ) = ( x n x 0, yn y 0, z n z 0 ) where ( x n, yn, z n ) are the expreion of the oordinate in the phere general equation. The group of parallel plane tangent to the phere an be deribed by thi normal a Ax By Cz D = 0. From thi family we take thoe that are parallel to the image plane that i, with the form z n = M, whih generate a family of irle:

146 3D Template Modeling for Eye, Eyebrow 2 2 2 2 0 ) ( yn y 0 ) = Rad ( M 0 ) f ( x, y ) = ( x x z n n n from thi family we are intereted in thoe whoe radiu i equal to 0 beaue they define the point

175 146 3D Template Modeling for Eye, Eyebrow ) ( yn y 0 ) = Rad ( M 0 ) f ( x, y ) = ( x x z n n n from thi family we are intereted in thoe whoe radiu i equal to 0 beaue they define the point of the plane that are tangent to the phere. f ( x n, yn ) = 0 : M = z 0 Rad and M = z 0 Rad The final urfae i the plane nearet to the amera optial enter: z n z Rad = 0 Regarding the adaptation of the image-proeing algorithm involved, we know that all the meaurement needed for the analyi of the eye area in the earh of the minimum point of energy were already dependent on the ROI dimenion. Sine the ROI i rehaped after it projetion on the image plane, the parameter are automatially adapted a well. up Heye enter down up enter down (a) Weye ROI Y Y 1 Rad 4 2 B A 3 X Z ε max_app x 0,y 0,z 0 (b) Figure V-5. The eye ROI on the image mut follow and rehape aording to the view that we have of the feature for a given head poe. Figure (a) hematially how how the originally deigned eye tate analyi algorithm annot be applied diretly on the eye a oon a there exit a rigid motion omponent of the final movement. In Figure (b) the eye model and it linear urfae approximation are preented.

176 Extending the Ue of Frontal Motion Template to any other Poe 147 V.6.2 Eyebrow To generate the urfae, f x, y, z ), that over the area of the eyebrow, we are ( n n n intereted in developing the urfae tangent to the vertial movement of the eyebrow f ( x n, yn, z n ) arh. One of the urfae harateriti i that = 0, the other one, i that it y mut follow the hape of the individual forehead (ee Figure V-6). The proedure to find the mot uitable plane approximation for the development of the template tart by determining the 3D ROI oordinate on that plane, x 1, x 2, x 3 and x 4 ; then, we take z M 1 z z z z n = = ). ( The eyebrow image-proeing algorithm alo need to determine the point of eyebrow denity hange. We obtain point a and b by projeting orthogonally the atual oordinate of the hanging point on the plane. The determination of thee two point automatially divide the ROI in the two required analyi area for the binarization. n (x 1, y 1, z 1 ) (x 2, y 2, z 2 ) B A (x 3, y 3, z 3 ) (x 4, y 4, z 4 ) (a) (b) x 0,y 0 a b Figure V-6. The eyebrow ould be approximated by the urfae that tangently follow the eyebrow moment along the forehead. It plane approximated i the average z value of the point that delimit the eyebrow ROI

177 148 Auray of the Adaptation: A Theoretial Study V.7 Auray of the Adaptation: A Theoretial Study We tudy equation ytem (V-5), to undertand the theoretial performane of the adaptation proe. Firt, we are intereted in evaluating the poition that we obtain on the modeled plane ( n x ) for a given point retrieved from the video image ( p x ), knowing that the point ome from the analyi of the image of a human head projetion and not from it modeling. Seond, we want to etimate the degree of dependene on the poe parameter that the omplete analyi ha. Let u develop the olution to the ytem (V-5) (ompated a B x A = n ): [ ] H B t C H C t F t F t B F M C M B M x Y Y X Z n = ) ( ) ( ) ( ) ( ) ( ) ( ) det( A [ ] H B t C H C t F t F t B F M C M B M y Y Y X X n = ) ( ) ( ) ( ) ( ) ( ) ( ) det( A M z n = where F t H F x C F y B Z p p = = = ; ; and ) ( ) det( y x F F p p = A. It will lead to the development of the expreion analyzed for our auray tudy. V.7.1 Influene of the urfae modeling Stability of the inverion Determining when 0 ) det( = A, we an evaluate in whih irumtane the method i untable beaue the inverion doe not have a olution: (V-6) 0 ) ( = y x F F p p. Analyzing the geometrial nature of ytem (V-5) (ee Figure V-2) we retrit the tudy to the ae where F i a poitive number and vetor ),, ( ) (0,0, 3 F y x F p p D p = = x r r (Figure V-7) i the ray of olution of the projetion

178 Extending the Ue of Frontal Motion Template to any other Poe 149 inverion. The expreion (V-6) repreent the family of plane that are parallel to thi ray. The ombination of rigid motion parameter that generate uh a plane lead to the ingular ae where the ytem will not be able to dedue the poe. Figure V-7. There exit a olution to the inverion a long a the plane that approximate the feature urfae doe not take, after the rigid motion of the head ha been applied, an orientation parallel to vetor r r Geometrial interpretation of the analyi: 1. The tability of the ytem i independent of the angle, whih i the angle that indiate the rotation around Z-axi. Thi i beaue in our obervation model the Z-axi and the amera optial axi oinide. 2. The ytem i untable for thoe ombination of angle and that tranform the plane parallel to the image plane to a plane that ontain vetor r. 3. Some illutrative example of untable reult: If x = 0 & y = 0 (retrieved data along the optial axi) and = π / 2 & p = π /2 then det( A ) = 0 p y p If x p = 0 and = tan( ) then det( A ) = 0 F x p If y p = 0 and = o( ) tan( ) then det( A ) = 0 F 4. A the poe of the plane approahe the ondition under whih the ytem beome untable, the projeted image of the feature being analyzed tart to onentrate toward the ame point. Although mathematially the plane i an infinite urfae, we are only intereted on that part of the plane ontaining the feature template. Therefore, the intability of the ytem an alo be ontrolled from the analyi of the ROI projetion; a it area dereae, the data on the

179 150 Auray of the Adaptation: A Theoretial Study template plane tart onentrating until it reahe the untable point. Figure V-4 alo illutrate the ytem tability dependene on the etimated poe parameter. Figure V-8. With = π / 2 or = π / 2, the plane, a it i loated in the preented example, generate an undetermined olution that doe not permit the inverion of the ytem around the oberved point Linear approximation auray There exit a preiion error due to the model linearization. Thi inauray i known and limited: ε < arg max( M z ) where i : point { feature urfae} max For intane, for the propoed eye feature modeling: i ε max_ eye < Rad V.7.2 Error propagation The inauray of the obtained x n ome from two oure: ( a ) the image proeing i not preie and doe not retrieve the orret projeted loation; or ( b ) the poe parameter that deribe the rigid motion are not aurate enough. To tudy the propagation of thee two oure of error, we evaluate the real value x~ n obtained a the approximation of the theoretial value x n under the influene error that we have eparated into a multipliative error and an additive error: ~ x = x ε ε. n n mult add

180 Extending the Ue of Frontal Motion Template to any other Poe 151 We perform our tudy by developing the error expreion of eah oure ating independently. To interpret the error expreion we alo provide a numerial tudy for a peifi ae. Influene of the analyzed data preiion Error expreion: We expre the retrieved data a a funtion of the error due to the inauray during the image proeing a: x p x p x ε = ~ and y p p y y ε = ~. Expreion of the obtained interpreted data a a funtion of previou error: add x mult x n x n x. ~ ε ε = where ε ε ε y F x p p y x mult x = 1 1 _ ε ε ε ε ε y F x y F x t M H H t M p p y x p p y x x y add x = 1 )) ( ) ( ( ) ) ( ) ( (( _ add y mult y n y n y. ~ ε ε = where ε ε ε y F x p p y x mult y = 1 1 _ ε ε ε ε ε y F x y F x t M H H t M p p y x p p y x x y add y = 1 )) ( ) ( ( ) ) ( ) ( (( _

181 152 Auray of the Adaptation: A Theoretial Study Influene of the poe etimation Error expreion: Expreion of the obtained interpreted data a a funtion of the inauray on the etimation of the rigid motion parameter: ε ~ = add x mult x n x n x ~ ε ε = ε ε ε ε y F x x t y F x y F p p p p p p mult x = ) ( ) (1 1 1 ε ε ε ε ε ε ε y F x x t y F x y F t T H B t F C M y F x F p p p p p p Y p p add x = ) ( ) (1 1 ) ( ) (1 ) ( 1 2 where )] ( ) ( ) ( ) ( ) ( ) ( [ 2 2 t C H C t F t B F M B M T Y X X = add y mult y p n y y ~ ε ε = ε ε ε ε y F x x t y F x y F p p p p p p mult y = ) ( ) (1 1 1 ε ε ε ε ε ε ε y F x x t y F x y F t T H B t F C M y F x F p p p p p p Y p p add y = ) ( ) (1 1 ) ( ) (1 ) ( 1 2 where

182 Extending the Ue of Frontal Motion Template to any other Poe 153 )] ( ) ( ) ( ) ( ) ( ) ( [ 2 2 t C H C t F t B F M B M T Y X X = ε = ~ add x mult x n x n x ~ ε ε = ε ε t y F x y F x p p p p mult x = 1 1 ε ε ε ε ε t y F x y F x t T t C B t t F H C M F M B y F x F p p p p Y X X p p add x = 1 ) ) ( ) ( ( ) (1 ) ( where H B t C t B t F H C t F F M C M B M T Y X Y X = ) ( ) ( add y mult y n y n y ~ ε ε = ε ε t y F x y F x p p p p mult y = 1 1 ε ε ε ε ε t y F x y F x t T t C t B t F H C M F M B y F x F p p p p Y X X p p add y = 1 ) ) ( ) ( ( ) (1 ) ( where H B t C t B t F H C t F F M C M B M T Y X Y X = ) ( ) ( 2 2 2

183 154 Auray of the Adaptation: A Theoretial Study ε ~ = add x mult x n x n x ~ ε ε = ε ε mult x = ε ε H B t C H C t F t F t B F M C M B M y F x F Y Y X X p p add x = ) ( ) ( ) ( ) ( ) ( ) ( [ ) ( add y mult y n y n y ~ ε ε = ε ε mult y = ε ε H B t C H C t F t F t B F M C M B M y F x F Y Y X X p p add y = ) ( ) ( ) ( ) ( ) ( ) ( [ ) ( t x x t x t ε = ~ add x tx n x n x ~ ε = ) ( ) ( ) ( 2 ε ε y F x F B F p p t add x tx x = add y tx n y n y ~ ε = ) ( ) ( ) ( 2 ε ε y F x F B F p p t add y tx x = From the etimated: t x y y t t ε = ~ add x ty n x n x ~ ε =

184 Extending the Ue of Frontal Motion Template to any other Poe 155 ) ( ) ( 2 ε ε y F x F C F p p t add x ty y = add y ty n y n y ~ ε = ) ( ) ( 2 ε ε y F x F C F p p t add y ty y = t z z t z t ε = ~ add x t n n z x x ~ ε = ) ( ) ( ε ε y F x F C B p p t add x t z z = add y t n n z y y ~ ε = ) ( ) ( ε ε y F x F C B p p t add y t z z = Numerial interpretation Error expreion depend on the peifi projetion ytem parameter (F), the poe that the head model how (,,, t X, t Y, t Z ) and the value of the projeted oordinate obtained from the analyi ), ( p p y x. The following table illutrate the error evolution when the analyzed head i almot in neutral poition ( 0 = = = = = Z Y X t t t ), auming only one oure of error at a time. Additive and multiplying error behavior evolve with the hange in the additive error of the analyzed reult. Table V-1 and Table V-2 how the error evolution in term of perentage of error ommitted when analyzing the projeted data ), ( p p y x. Table V-6, Table V-7 and Table V-8 preent the perentage of error from the tranlation parameter etimation; and Table V-3, Table V-4 and Table V-5 how the error in radian unit due to etimation inauray over the poe angle. f( ), f'( ), g( ) and g'( ) are different funtion that depend on the peifi projetion ytem parameter. h( ), h'( ), h''( ), h'''( ), k( ), k'( ), k''( ), k'''( ), m( ), m'( ), n( ), n'( ), q( ), q'( ), p( ) and p'( ) are funtion that depend on the ytem parameter and the analyzed value x p and y p.

185 156 Auray of the Adaptation: A Theoretial Study Table V-1 EVOLUTION OF THE ERROR DUE TO THE INACCURACY ON THE X COMPONENT OF PROJECTED DATA ANALYSIS. error % x p error % x mult error % x add error % y mult error % y add f(ytem) 0 10.f'(ytem) f(ytem) 0 5.f'(ytem) 1 0 f(ytem) 0 f'(ytem) f(ytem) f'(ytem) f(ytem) f'(ytem) f(ytem) f'(ytem) f(ytem) f'(ytem) Table V-2 EVOLUTION OF THE ERROR DUE TO THE INACCURACY ON THE Y COMPONENT OF PROJECTED DATA ANALYSIS. error % y p error % x mult error % x add error % y mult error % y add g(ytem) 0 10.g'(ytem) g(ytem) 0 5.g'(ytem) 1 0 g(ytem) 0 g'(ytem) g(ytem) g'(ytem) g(ytem) g'(ytem) g(ytem) g'(ytem) g(ytem) g'(ytem)

186 Extending the Ue of Frontal Motion Template to any other Poe 157 Table V-3 EVOLUTION OF THE ERROR DUE TO THE INACCURACY ON THE POSE PARAMETER PRECISION. error rad error x mult error x add 1 1/(1-h(y,oor) ) ( h'(y,oor) h''(y,oor))/ (1-h(y,oor) ) 0.5 1/(1-h(y,oor) ) ( h'(y,oor) h''(y,oor))/ (1-h(y,oor) ) 0.1 1/(1-h(y,oor) ) ( h'(y,oor) h''(y,oor))/ (1-h(y,oor) ) /(1-h(y,oor) ) ( h'(y,oor) h''(y,oor))/ (1-h(y,oor) ) /(1-h(y,oor) ) ( h'(y,oor) h''(y,oor))/ (1-h(y,oor) ) error rad error y mult error y add 1 1/(1-h(y,oor) ) ( h'''(y,oor) h''(y,oor))/ (1-h(y,oor) ) 0.5 1/(1-h(y,oor) ) ( h'''(y,oor) h''(y,oor))/ (1-h(y,oor) ) 0.1 1/(1-h(y,oor) ) ( h'''(y,oor) h''(y,oor))/ (1-h(y,oor) ) /(1-h(y,oor) ) ( h'''(y,oor) h''(y,oor))/ (1-h(y,oor) ) /(1-h(y,oor) ) ( h'''(y,oor) h''(y,oor))/ (1-h(y,oor) ) Table V-4 EVOLUTION OF THE ERROR DUE TO THE INACCURACY ON THE POSE-PARAMETER PRECISION. error rad error x mult error x add 1 1/(1-k(y,oor) ) ( k''(y,oor) k''(y,oor))/ (1-k(y,oor) ) 0.5 1/(1-k(y,oor) ) ( k'(y,oor) k''(y,oor))/ (1-k(y,oor) ) 0.1 1/(1-k(y,oor) ) ( k'(y,oor) k''(y,oor))/ (1-k(y,oor) ) /(1-k(y,oor) ) ( k'(y,oor) k''(y,oor))/ (1-k(y,oor) ) /(1-k(y,oor) ) ( k'(y,oor) k''(y,oor))/ (1-k(y,oor) ) error rad error y mult error y add 1 1/(1-k(y,oor) ) ( k'''(y,oor) k''(y,oor))/ (1-k(y,oor) ) 0.5 1/(1-k(y,oor) ) ( k'''(y,oor) k''(y,oor))/ (1-k(y,oor) ) 0.1 1/(1-k(y,oor) ) ( k'''(y,oor) k''(y,oor))/ (1-k(y,oor) ) /(1-k(y,oor) ) ( k'''(y,oor) k''(y,oor))/ (1-k(y,oor) ) /(1-k(y,oor) ) ( k'''(y,oor) k''(y,oor))/ (1-k(y,oor) )

187 158 Auray of the Adaptation: A Theoretial Study Table V-5 EVOLUTION OF THE ERROR DUE TO THE INACCURACY ON THE POSE-PARAMETER PRECISION. error rad error x mult error x add error y mult error y add m(y,oor) m'(y,oor) m(y,oor) m'(y,oor) m(y,oor) m'(y,oor) m(y,oor) m'(y,oor) m(y,oor) m'(y,oor) Table V-6 EVOLUTION OF THE ERROR DUE TO THE INACCURACY ON THE T X POSE-PARAMETER PRECISION. error % t x error % x mult error % x add error % y mult error % y add 10 none 10.n(ytem,oor) none 10.n'(ytem,oor) 5 none 5.n(ytem,oor) none 5.n'(ytem,oor) 1 none n(ytem,oor) none n'(ytem,oor) 0.5 none 0.5.n(ytem,oor) none 0.5.n'(ytem,oor) 0.1 none 0.1.n(ytem,oor) none 0.1.n'(ytem,oor) 0.05 none 0.05.n(ytem,oor) none 0.05.n'(ytem,oor) 0.01 none 0.01.n(ytem,oor) none 0.01.n'(ytem,oor) Table V-7 EVOLUTION OF THE ERROR DUE TO THE INACCURACY ON THE T Y POSE-PARAMETER PRECISION. error % t y error % x mult error % x add error % y mult error % y add 10 none 10.p(ytem,oor) none 10.p'(ytem,oor) 5 none 5.p(ytem,oor) none 5.p'(ytem,oor) 1 none p(ytem,oor) none p'(ytem,oor) 0.5 none 0.5.p(ytem,oor) none 0.5.p'(ytem,oor) 0.1 none 0.1.p(ytem,oor) none 0.1.p'(ytem,oor) 0.05 none 0.05.p(ytem,oor) none 0.05.p'(ytem,oor) 0.01 none 0.01.p(ytem,oor) none 0.01.p'(ytem,oor) Table V-8 EVOLUTION OF THE ERROR DUE TO THE INACCURACY ON THE T Z POSE-PARAMETER PRECISION. error % t z error % x mult error % x add error % y mult error % y add 10 none 10.q(ytem,oor) none 10.q'(ytem,oor) 5 none 5.q(ytem,oor) none 5.q'(ytem,oor) 1 none q(ytem,oor) none q'(ytem,oor) 0.5 none 0.5.q(ytem,oor) none 0.5.q'(ytem,oor) 0.1 none 0.1.q(ytem,oor) none 0.1.q'(ytem,oor) 0.05 none 0.05.q(ytem,oor) none 0.05.q'(ytem,oor) 0.01 none 0.01.q(ytem,oor) none 0.01.q'(ytem,oor)

188 Extending the Ue of Frontal Motion Template to any other Poe 159 Diuion and onluion from the omplete evaluation Bearing in mind that the previou table repreent the reult on a peifi ituation, we an till tudy the influene trend over the error behavior and whih parameter eem to be more ritial for a orret oupling reult. Exat error value depend on the intant ytem harateriti but peifi funtion that repreent thee harateriti have the ame order of magnitude (O(10 0 )), independently of the ytem itelf and the obtained reult. In the evolution tudy, we ee that the intant poe an greatly influene the error behavior. Thi fat i notieable, for intane, when we oberve that the projeted-data multiplying-error diappear when the head i in it neutral poition (Table IV-1 and IV- 2). In general, for mall inauray error the overall error behavior how a lean linear evolution. Poe parameter related to the X and Y-axi (,, t X and t Y ) have imilar omportment. Rotation around the X and the Y-axi have tronger ation over the error reult than the rotation related to the Z-axi. In fixed poe ondition rotation, t X and t Y do not hange the image appearane of the tudied ROI and therefore error due to inauray in their predition have le impat on obtaining the neutral oordinate. The poe-parameter etimation prove to be ritial for the template adaptation. In addition to diretly influening the interpretation of the data analyzed on the image, it i alo diretly reponible of the ue and auray of the image-proeing analyi. The motion template analyi tehnique implemented depend on the orret delimitation of the ROI of the feature that i going to be analyzed. Sine image ROI are obtained from projetion, poe parameter alo intervene in determining their final loation. Thi implie that the auray ( ε x, ε y ) of the analyzed value ( x~ p, y~ p ), although apparently not diretly dependent on the poe, i indiretly dependent on it beaue of the determination of the template ROI. From our theoretial evaluation, we onlude that the template adaptation proedure an only be feaible if the rigid motion tudy of the head on the image generate the required poe parameter with a minimum degree of auray. The level of preiion i et o that ROI are properly traked and do not beome the major oure of error for the image-proeing analyi tehnique involved.

189 160 Uing other urfae for the algorithmi 3D-extenion V.8 Uing other urfae for the algorithmi 3D-extenion The motion model extenion to 3D pae ha been deigned to utilize a linear approximation of the feature urfae beaue we eek to take advantage of the omputational benefit linear urfae provide. In thi etion, we would like to diu how thi ytem ould alo be implemented uing other urfae approximation. We an alway apire to build the urfae that minimize the error derived from the impreiion. Let u reall equation ytem (V-5) in it extended form: (V-7) a1 x n a2 b1 x n b2 y y n n a b 3 3 z z n n = a = b 4 4 Thi ytem repreented the projetion and the poe tranformation inverion needed to reover the motion data obtained from the image plane. The parametri olution to the ytem ould be: A B yn z n( yn ) = C x ( y n n a ) = 4 a 2 y a n 1 a z with A = a4 b1 b4 a1 ; B = a1 b2 b1 a2 and C = a3 b1 b3 a1. Let u expre the urfae that repreent the feature ROI in one of it parametri form: x ( u, v ) = n y ( u, v ) = n n n m i = 0 j = 0 n m i = 0 j = 0 n m i = 0 j = 0 dx dy k l i, j N i ( u)( v ) N j ( u)( v ) k l i, j N i ( u)( v ) N j ( u)( v ) k l i, j i j ) z ( u, v ) = dz N ( u)( v ) N ( u)( v. Where N k i ( u)( v ) and N l j ( u)( v ) are B-Spline value (NURBS) for the hoen urfae ontrol point and dx i, j, dy i, j and dz i, j the weight given to thoe value. 3 n

190 Extending the Ue of Frontal Motion Template to any other Poe 161 and n We would like to add the urfae ontraint to ytem (V-7) to olve for x n, z. We do o by deribing z n ( u, v ) a n m A A k l z n( y( u, v )) = dyi j N i ( u)( v ) N j ( u)( v ) C C i = o j = 0,. Solving the ytem i the equivalent to finding a olution for u and v in the following expreion: A B (V-8) z n( u, v ) = yn( u, v ). C C One way to et u and v i by uing Newton numerial method. To do o, the funtion to optimize A B (V-9) f ( u, v ) = z n ( u, v ) yn( u, v ) = 0 C C mut fit thee riteria: 1. It mut be C 1, both in u and v. 2. It Jaobian mut be invertible at the point ( u 0, v0 ) of the expeted 1 olution, that i: J f u, v ) 0. ( 0 0 Thee riteria are automatially met if the urfae tudied i C 2. Beide the parametri urfae, other urve ould be ued: Cubi, Quadrati, et. B-Spline ould be more advantageou beaue they are already widely ued in Computer Graphi to define urfae and ould preent intereting ide apet to onider. In all ae, numerial method, like the one we have uggeted, will have to be ued to find a proper olution to the ytem. It will alway imply lo in omputation peed. y n Brief review of Newton method Newton method i a root-finding algorithm whih ue the firt few term of the Taylor erie of a funtion f(x) in the viinity of a upeted root to zero in order to find an approximation to that root. It i alo alled the Newton-Raphon method. For f(x) a polynomial, Newton' method i eentially the ame a Horner' method. The Taylor erie of f(x) about the point x x ε i given by = 0 1 f ( x 0 ε ) = f ( x 0 ) f '( x 0 ) ε f ''( x 0 ) ε

191 162 Uing other urfae for the algorithmi 3D-extenion Keeping term only to firt order, (V-10) ε ε ) '( ) ( ) ( x f x f x f. Thi expreion an be ued to etimate the amount of offet ε needed to land loer to the root tarting from an initial 0 x. Setting 0 ) ( 0 = ε x f and olving (V-10) for 0 ε ε give ) '( ) ( x f x f = ε, whih i the firt-order adjutment to the root poition. By letting ε = x x, alulating a new 1 ε, and o on, the proe an be repeated until it onverge to a root uing ) '( ) ( n n n x f x f = ε. The method i eaily extended to olve for two unknown: ), ( ), ( v u f v u f J v u v u = where, in the ae of our urfae, we know that , 1 1, ), ( ), ( ), ( ), ( ), ( ), ( ), ( v u n n n n v u v v u y C B v v u z u v u y C B u v u z v v u f u v u f v u f J = =.

192 VI Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking Thi Chapter ompile the pratial development and the experimental evaluation of the faial motion analyi ytem detailed in the previou hapter. We have tried to tudy the performane, auray and robutne of the propoed head-poe & expreion analyi oupling. The tetbed ha been et to orrepond a muh a poible to the deployment of a real aquiition ytem for teleonferening.

193

194 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 165 VI.1 Introdution The work ompiled in thi thei report ha been the natural ontinuation of the ientifi reearh on faial analyi for yntheti animation that the Image Group of the Multimedia Communiation Department at the Intitut Euréom ha been doing during the pat 6 year. Regarding rigid faial motion analyi, the reearh group had developed an algorithm that utilize a feedbak loop inide a Kalman filter to obtain preie information about the peron loation in pae. Kalman filtering ha been applied to head traking giving good reult (Cordea, Petriu, E. M., Georgana, D., Petriu, D. C., & Whalen, T. E., 2001; Ström, 2002) and it enable the predition of the tranlation and rotation parameter of the head from the 2D traking of peifi point of the fae on the image plane. For non-rigid faial motion analyi, ome intereting tehnique for expreion analyi had already been teted (Valente, 1999) but the PCA-baed approah originally taken had too many retrition when extending it ue to any other poe. Thi led u to develop the poe-expreion oupling tehnique invetigated here. The main pratial drawbak of our head-traking ytem i the need for 3D information about the hape of the head that we are traking. Thi implie that we mut ue a model that provide aurate 3D oordinate of the point whoe projetion i traked on the image and fed to the filter to obtain the predition of the poe parameter. Very often, a general head model i ued. The apparent drawbak an beome a trong advantage if a realiti 3D yntheti repreentation of the uer i available. In (Valente & Dugelay, 2001), we howed that improvement in the amount of freedom of movement in front of the amera i poible if uing the peaker lone during the traking. Model have to be a preie 3D repreentation of the peaker, in hape and texture, beaue our approah ompare head model and video frame at the image level. We have inerted the feature motion analyi algorithm preented in thi report inide the original head-traking framework. Sine the peaker realiti head model i needed for the traking, and therefore alo available during the analyi, the 3D data required for extending the ue of the motion analyi template i obtained from them. For the adaptation proe to be poible, the head-poe traking algorithm and the image proeing mut hare the ame obervation model. Some other performing traking algorithm work uing loal 2D image referene ytem on whih they an only etimate the uer poition on the reen (e.g. Bradki Cam Shift algorithm (1998)). For the equene of proee to follow, i.e., detetion of the feature to be analyzed,

195 166 Introdution analyi and interpretation of the analyzed reult, thi information i not aurate enough. In our approah, the algorithm operating on the image plane extrat the 2D feature to be traked on the video equene from the yntheized image of the model, onto whih the predited poe parameter have already been applied. At thi point, it provide an adjuted view of the uer in it future poe. The ytem truture alo enable to projet point belonging to the modeled area of the fae (thoe ued for feature expreion analyi) and trak the evolution of eah feature ROI. Figure VI-1 how two reen hot where the online adaptation of the 3D model view i oberved. Detail about the Laboratory previou and urrent reearh work an be found at the group webite (Video Cloning, 1999). Figure VI-1. Two reen hot of the tet etting. On the mot left window we preent the video input, the projetion of the analyi reult, and the evolution of the ROI, uitable for viual inpetion of the feature analyi performane; on the mot right window the yntheti reprodution (projeted uing OpenGL) of the uer lone i repreented, allowing u to ontrol the evolution of the head traking algorithm. Sine we ue a highly realiti model to perform the traking we utilize it 3D data to do the algorithmi adaptation: we redefine the motion animation model and their ROI on it

196 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 167 VI.2 Deription of the Sytem The tet-bed for the tehnial evaluation of the propoed ytem omprie two main environment, the video input window and the yntheti rendering window (Figure VI-1). We onider the following tandard aquiition ondition: a mall amera ituated in front of the peaker on top of the monitor around 75 m away from him (Figure VI-2). Figure VI-2. Setting for the tehnial evaluation: jut one omputer and one amera are ued The video-input window how either life input or a video equene reorded by the amera. It alo preent the graphial feedbak from the different analyi method thu allowing u to viually verify the orretne of the expreion analyi. The rendering window diplay the yntheti reprodution of the peaker head model. The model i rendered after the obtained poe parameter have been applied on it. Thi allow u to hek if the traking algorithm till keep trak of the head orretly. Figure VI-3. Syntheti image and video input are blended to be able to initialize the analyi ytem

197 168 Deription of the Sytem Both window are the ame ize beaue the yntheti world i meant to reprodue exatly the real world. Indeed, thi fat i utilized to initialize the omplete ytem. For the firt aquired frame the head poe i ompletely unknown and the filter ha not yet been initialized. Therefore, an initialization tep i required. During thi tep, the fae feature to be traked in 2D mut be deteted on the frame and the etting of the poe parameter to the firt head poe i required. Sine the development of thi proedure i not the purpoe of the reearh preented in thi thei report, we deided to manually initialize the ytem by blending the real video input with the yntheti view of the model. Thi way, we an ak the peaker to move and et himelf in the neutral poe (ee Figure VI-3). VI.2.1 Charateriti of video input The quality of the video input i very important. Different amera have different optial harateriti. Our ytem doe not take thoe harateriti into aount to be a environmentally independent a poible. Eah ytem may alo ue different aquiition equipment and during aquiition, the image an be ignifiatly altered, ometime reulting in the inluion of undeired artifat. A we oberve in Figure VI-4, the image reulting from apturing the ame objet with different equipment may differ highly. The firt image (a) ha been aquired with a tandard aquiition ard and a low ditortion amera. Color and hape are well maintained. Image (b) ha been reorded with a typial web am. We learly ee that the ompreion method needed to lower the network payload during video treaming on the Internet, generally jpeg, damage the image quality. (a) Figure VI-4. Image (a) wa reorded with a tandard amera and aquiition ard. Image (b) wa obtained from a typial web amera (b)

198 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 169 The algorithm developed during our reearh have been teted over video input whoe quality fall in ategory (a). The aquiition ytem did not ditort the final image. Neverthele, we did not aume ertain image quality with repet to olor, ontrat and any other harateriti. Charateriti of the omplete aquiition ytem Camera: Sony EVI-D31 Aquiition Card: Oprey 100 Video Capture Devie Input Stream: PAL-BGHDI Output Stream: BGR32 : [B:8 byte][g:8 byte][r:8 byte][tranpareny:8 byte] Max frame rate: 33 f/ Video Input Size: 384x288 (~) [wxh] We ued DiretX (2003) tehnology to deploy video aquiition and rendering. The DiretShow filter trategy wa utilized. One of the ytem harateriti wa that image were reorded by weeping from down to top, thu inverting the data information in memory. The referene ytem to ae data i illutrated in Figure VI-5. x,. One pixel ha the following oordinate ( ) video y video All thee detail will be taken into aount during the deployment of the algorithm. Comparing the input image to the obervation model oordinate ytem that i ued (Sytem V-2) and that ha been adapted for the yntheti world (VI-6), we know that: x x video = 2 1 w OGL OGL video p y p = 2 1 y h Z X w h Y Figure VI-5. Referene ytem of the video image in memory

199 170 Deription of the Sytem VI.2.2 Head model ynthei Our image analyi tehnique need geometrial information about the peaker head to perform well. Thi information i obtained from the peaker 3D model, whih an be the ame a the model utilized to repreent the peaker during animation but not neearily. To reate the head model we ued the data generated by a CyberwareTM 3D anner (Cyberware, 2003). It provided u with a loud of point that repreented the urfae of the peron head. It alo retrieved a ylindrial image repreentation of the urfae texture whih wa automatially mapped onto the 3D point. Then, thi dene loud of point wa triangulated to obtain a rih wireframe. The wireframe wa ued to extrat all the preie 3D data needed to redefine the motion model ued during the analyi in 3D (Figure VI-6a; Figure VI-6). We ued a redued verion of the original wireframe (Figure VI-6b) to trak the head on the video beaue we wanted to be able to render both video and yntheti feedbak in a reaonable time. (a) (b) (d) () (e) Figure VI-6. Different model were ued for the pratial implementation of the algorithm. Very dene wireframe model were utilized to extend the ue of our expreion analyi algorithm (a-), a le heavy verion of thee model wa rendered during the oupling of head traking and expreion analyi. An animated avatar ubtituted the realiti head model of the peaker to evaluate the naturalne of the animation ynthei reated from the parameter obtained from the analyi

200 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 171 Thee omplete model were ued only for the analyi beaue no animation, beide rigid motion, wa given with them. Their viual feedbak helped to ontrol head poe traking. To tet the naturalne reated from online analyi, we ued a impler model: Olivier (Figure VI-6d; Figure VI-6e). Thi model, whih we onider an avatar, wa provided by Frane Teleom R&D Renne. Due to it wireframe impliity it wa eay to implement ome utomized animation. For example, by uing Olivier we ould oberve the reult from yntheizing eye ation analyzed from video input. MPEG-4 baed head model Our head model have been generated from different 3D data aquiition method but they all have been oded in.mp4 format. Thi i a binary format that inlude the geometry, the texture and the animation rule of 3D head paked following the MPEG- 4 tandard. We deided to adopt the emanti and yntax that MPEG-4 provide for Faial Animation (and not FACS or proprietary emanti and yntax) beaue our analyi/ynthei tehnique are integrated inide a ytem that aim at providing teleommuniation ervie, a it i explained in depth in Chapter III. The repreentation of yntheti viual objet in MPEG-4 i baed on the prior VRML (2003) tandard uing node uh a Tranform, whih define rotation, ale or tranlation of an objet, and IndexedFaeSet deribing 3-D the hape of an objet by an indexed fae et. However, MPEG-4 i the firt international tandard that peifie a ompreed binary repreentation of animated yntheti audio-viual objet. Appendix VI-J ontain ome detail related to the different node involved; thi information ha been taken from ISO/IEC MPEG-4 (1999). Speifiation and animation of fae For a omplete fae objet, MPEG-4 peifie a fae model in it neutral tate, a number of feature point on thi neutral fae a referene point, and a et of FAP, eah orreponding to a partiular faial ation deforming a fae model in it neutral tate. The FAP value for a partiular FAP indiate the magnitude of the orreponding ation, e.g., a big veru a mall mile or deformation of a mouth orner. For an MPEG-4 terminal to interpret the FAP value uing it fae model, it ha to have predefined model peifi animation rule to produe the faial ation orreponding to eah FAP. The terminal an either ue it own animation rule or download a fae model and it aoiated fae animation table (FAT) to have a utomized animation behavior. Sine FAP are required to animate fae of different ize and proportion, the FAP value are defined in fae animation parameter unit (FAPU). The FAPU are omputed from patial ditane between major faial feature on the model in it neutral tate.

201 172 Deription of the Sytem Command:Sene Replae:SFTopNode hildren viewpoint fog ize removechildre addchildre Tranform {Tranform { ## (a many tranform a expliit movement are needed. Ex: FAP 48, hildren tranlation aleorientation ale rotation enter removechildre addchildre Fae fdp fap fit renderedfae ttsoure faedeftable faesenegraph ueorthotexture texturecoord featurepoint tranform hildre tranlation addchildr Tranform { Tra.##. Tranform ## Tranform Tranform ## Tranform ## hape hape (leye) hape (reye) hape hape App. Geom App. Geom App. Geom App. Geom Geometry. Appearane or or IndexedFaeet Sphere Cylinde material texture texturetranfor fapid highlevelsele faedefmeh faedeftranform intervalborder oordindexfaesenegraph diplaement faesenegrap fieldid rotdef aledef tranlationdef Figure VI-7. Tree truture of the MPEG-4 oded head model

202 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 173 In order to ue faial animation in the ontext of MPEG-4 ytem, a BIFS ene graph (ISO/IEC MPEG-4 STD, 1998) ha to be tranmitted to the deoder. The minimum ene graph ontain a Fae node and a FAP node. The FAP deoder write the amplitude of the FAP into field of the FAP node. The FAP node might have the hildren Vieme and Expreion whih are FAP requiring a peial yntax. Thi ene graph would enable an enoder to animate the proprietary fae model of the deoder. If a fae model i to be ontrolled from a TTS ytem, an AudioSoure node i to be attahed to the fae node. In order to download a fae model to the deoder, the fae node require an FDP node a one of it hildren. Thi FDP node ontain the poition of the feature point in the downloaded model, the ene graph of the model and the FaeDefTable, FaeDefMeh and FaeDefTranform node required to define the ation aued by FAP. The typial truture of the data being oded per eah model an be found in Figure VI-7. Neutral fae and Faial Animation Parameter Unit Our model i onidered to be in it neutral tate (ee Figure VI-8) when: the oordinate ytem i right-handed; head axe are parallel to the world axe gaze i in diretion of Z axi, all fae mule are relaxed, eyelid are tangent to the iri, the pupil i one third of the diameter of the iri, lip are in ontat; the line of the lip i horizontal and at the ame height of lip orner, the mouth i loed and the upper teeth touh the lower one, the tongue i flat, horizontal with the tip of tongue touhing the boundary between upper and lower teeth. An FAPU and the feature point ued to derive the FAPU are defined with repet to the fae in it neutral tate. The FAPU allow interpretation of the FAP on any faial model in a onitent way, produing reaonable reult in term of expreion and peeh pronuniation. The meaurement unit are hown in Table VI-1.

203 174 Deription of the Sytem Table VI-1 FACIAL ANIMATION PARAMETER UNITS AND THEIR DEFINITIONS. Deription FAPU Value IRISD0 = 3.1.y 3.3.y = 3.2.y 3.4.y Iri diameter (by definition it i equal to the ditane between upper ad lower eyelid) in neutral fae IRISD = IRISD0 / 1024 ES0 = 3.5.x 3.6.x Eye eparation ES = ES0 / 1024 ENS0 = 3.5.y 9.15.y Eye - noe eparation ENS = ENS0 / 1024 MNS0 = 9.15.y 2.2.y Mouth - noe eparation MNS = MNS0 / 1024 MW0 = 8.3.x 8.4.x Mouth width MW = MW0 / 1024 AU Angle Unit 10-5 rad Figure VI-8. A fae model in it neutral tate and the feature point ued to define FAP unit (FAPU). Fration of ditane between the marked key feature are ued to define FAPU. MPEG-4 Fae Definition Point MPEG-4 peifie 84 feature point, or Fae Definition Point (FDP) on the neutral fae. Feature point are arranged in group like heek, eye, and mouth. The loation of thee feature point ha to be known for any MPEG-4 ompliant fae model. The feature point on the model hould be loated aording to Figure VI-9. The FDP are normally tranmitted one per eion, followed by a tream of ompreed FAP. If the deoder doe not reeive the FDP, the ue of FAPU enure that it an till interpret the FAP tream. Thi inure minimal operation in broadat or

204 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 175 teleonferening appliation. The FDP et i peified in BIFS yntax. The FDP node define the fae model to be ued at the reeiver. Two option are upported: alibration information i downloaded o that the proprietary fae of the reeiver an be onfigured uing faial feature point and optionally a 3D meh or texture; a fae model i downloaded with the animation definition of the Faial Animation Parameter. Thi fae model replae the proprietary fae model in the reeiver. Figure VI-9. At leat the Fae Definition Point mut be peified on the head model wireframe to define it to allow all animation ytem to utomize their own model. Our model inlude thee point in their meh, enuring the orret undertanding of MPEG-4 animation parameter. MPEG-4 Fae Animation Parameter The FAP are baed on the tudy of minimal pereptible ation and are loely related to mule ation. The 68 parameter are ategorized into 10 group related to part of the fae. FAP an alo be ued to define faial ation unit. Exaggerated

205 176 Deription of the Sytem amplitude permit the definition of ation that are normally not poible for human, but are deirable for artoon-like harater. The FAP et ontain two high level parameter: vieme and expreion. A vieme i a viual orrelate to a phoneme. The vieme parameter allow vieme rendering (without having to expre them in term of other parameter) and enhane the reult of other parameter, enuring the orret rendering of vieme. Only tati vieme whih are learly ditinguihed are inluded in the tandard et. Additional vieme may be added in future extenion of the tandard. Similarly, the expreion parameter allow the definition of high level faial expreion. The faial expreion parameter value are defined by textual deription. To failitate faial animation, FAP that an be ued together to repreent natural expreion are grouped together in FAP group, and an be indiretly addreed by uing an expreion parameter. The expreion parameter allow for a very effiient mean of animating fae. FAP 1 (vieme) and FAP 2 (expreion) are high-level animation parameter. A fae model deigner reate them for eah fae model. Uing FAP 1 and FAP 2 together with low-level FAP 3-68 that affet the ame area a FAP 1 and 2, may reult in unexpeted viual repreentation of the fae. Generally, the lower level FAP have priority over deformation aued by FAP 1 or 2. When peifying an expreion with FAP 2, the enoder may end an init_fae bit that deform the neutral fae of the model with the expreion prior to uperimpoing FAP Thi deformation i applied with the neutral fae ontraint of mouth loure, eye opening, gaze diretion and head orientation. Sine the enoder doe not know how FAP 1 and 2 are implemented, it i reommended to ue only thoe low-level FAP that will not interfere with FAP 1 and 2. Our analyi tehnique an generate faial animation parameter related to the FAP inluded in Table VI-2. You mut notie that all FAP involve faial expreion ynthei exept number 48, 49, 50, 101, 102 and 103. Thee latet deribe rigid head motion. FAP 101, 102 and 103 are not peified in the tandard. We have added them to inlude poe tranlation, t X, t Y and t Z (repetively) a if they were FAP intead of having implemented a tranform node on top of the omplete fae objet. FAP 48, 49 and 50 repreent the rotation, and repetively. MPEG-4 FAP are ommutative ation unit whoe enter of oordinate i intimately related to the head model. Kalman' poe traking projetion-tranform world origin i related to the ituation of the amera. Thi implie that all ation are expreed a tranlation and rotation from the amera perpetive. MPEG-4 define movement from the head perpetive. It doe not provide FAP for the tranlation of the head beaue it onider it an external movement exerted over the head element.

206 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 177 (a) World amera perpetive (b) World amera perpetive Global oordinate ytem: general ation view perpetive Global oordinate ytem: general ation view perpetive T k R k T 1 R 1 HEAD T 2 R 2 T R Loal oordinate ytem: loal ation HEAD OTHER ELEMENTS Figure VI-10. (a) Kalman' tranform and (b) MPEG-4 tranform ation Shematially we ould undertand Kalman and MPEG-4 tranform ation over the head like it i hown in Figure VI-10. Due to the different nature of the tranformation ytem, predited tranlation parameter, t X, t Y and t Z whih are repreented by the T k matrix, and rotation parameter, and whih onform the R k matrix mut be orretly interpreted to expre the ame movement on the MPEG-4 ytem. In the MPEG-4 ene world, T 1 and R 1 repreent different global tranlation and rotation ation to do over all element in the ene, to obtain the deired view and emplaement of all the objet. T 2 and R 2 repreent tranlation and rotation ation over the head model with referene to the loal head oordinate ytem. Kalman filter predition obtain the poe parameter per frame related to the amera model oordinate. We aoiate Kalman tranlation to an external ation performed over the whole head (T 2 = T k ). We aoiate Kalman rotation to the ation of FAP 48, 49 and 50 beaue thee FAP expre natural head rotation ation, all FAP are expreed over the loal head oordinate ytem. More generi information about FAP and FDP an be found in Appendix VI-H. Some deriptive table plu the omplete lit of FAP ha been inluded in it.

207 178 Deription of the Sytem Table VI-2 FAP DEFINITIONS, GROUP ASSIGNMENTS AND STEP SIZES # FAP name FAP deription unit Uni-or Bidir Poitive motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 3 open_jaw Vertial jaw diplaement (doe not affet mouth opening) 4 lower_t_midlip Vertial top middle inner lip diplaement 5 raie_b_midlip Vertial bottom middle inner lip diplaement 6 treth_l_ornerlip Horizontal diplaement of left inner lip orner 7 treth_r_ornerlip Horizontal diplaement of right inner lip orner 8 lower_t_lip_lm Vertial diplaement of midpoint between left orner and middle of top inner lip 9 lower_t_lip_rm Vertial diplaement of midpoint between right orner and middle of top inner lip 10 raie_b_lip_lm Vertial diplaement of midpoint between left orner and middle of bottom inner lip 11 raie_b_lip_rm Vertial diplaement of midpoint between right orner and middle of bottom inner lip 12 raie_l_ornerlip Vertial diplaement of left inner lip orner 13 raie_r_ornerlip Vertial diplaement of right inner lip orner MNS U down MNS B down MNS B up MW B left MW B right MNS B down MNS B down MNS B up MNS B up MNS B up MNS B up

208 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 179 # FAP name FAP deription unit Uni-or Bidir Poitive motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 14 thrut_jaw Depth diplaement of jaw 15 hift_jaw Side to ide diplaement of jaw 16 Puh_b_lip Depth diplaement of bottom middle lip 17 Puh_t_lip Depth diplaement of top middle lip 18 depre_hin Upward and ompreing movement of the hin MNS U forward MW B right MNS B forward MNS B forward MNS B up (like in adne) 19 Cloe_t_l_eyelid Vertial diplaement of top left eyelid 20 Cloe_t_r_eyelid Vertial diplaement of top right eyelid 21 Cloe_b_l_eyelid Vertial diplaement of bottom left eyelid 22 Cloe_b_r_eyelid Vertial diplaement of bottom right eyelid 23 yaw_l_eyeball Horizontal orientation of left eyeball 24 yaw_r_eyeball Horizontal orientation of right eyeball 25 Pith_l_eyeball Vertial orientation of left eyeball 26 pith_r_eyeball Vertial orientation of right eyeball 31 raie_l_i_eyebrow Vertial diplaement of left inner eyebrow 32 raie_r_i_eyebrow Vertial diplaement of right inner eyebrow 33 raie_l_m_eyebrow Vertial diplaement of left middle IRISD B down IRISD B down IRISD B up IRISD B up AU B left 3 na AU B left 3 na AU B down 3 na AU B down 3 na ENS B up ENS B up ENS B up

209 180 Deription of the Sytem # FAP name FAP deription unit Uni-or Bidir Poitive motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value eyebrow 34 raie_r_m_eyebrow Vertial diplaement of right middle eyebrow 35 raie_l_o_eyebrow Vertial diplaement of left outer eyebrow 36 raie_r_o_eyebrow Vertial diplaement of right outer eyebrow 37 queeze_l_eyebrow Horizontal diplaement of left eyebrow 38 queeze_r_eyebrow Horizontal diplaement of right eyebrow 48 head_pith Head pith angle from top of pine 49 head_yaw Head yaw angle from top of pine 50 head_roll Head roll angle from top of pine 51 lower_t_midlip _o Vertial top middle outer lip diplaement 52 raie_b_midlip_o Vertial bottom middle outer lip diplaement ENS B up ENS B up ENS B up ES B right ES B left AU B down 7 na AU B left 7 na AU B right 7 na MNS B down MNS B up treth_l_ornerlip _o 54 treth_r_ornerlip _o Horizontal diplaement of left outer lip orner Horizontal diplaement of right outer lip orner MW B left MW B right lower_t_lip_lm _o Vertial diplaement of midpoint between left orner and middle of top outer lip MNS B down lower_t_lip_rm _o Vertial diplaement of midpoint between right orner and middle of top outer lip MNS B down

210 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 181 # FAP name FAP deription unit Uni-or Bidir Poitive motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 57 raie_b_lip_lm_o Vertial diplaement of midpoint between left orner and middle of bottom outer lip 58 raie_b_lip_rm_o Vertial diplaement of midpoint between right orner and middle of bottom outer lip 59 raie_l_ornerlip_o Vertial diplaement of left outer lip orner MNS B up MNS B up MNS B up raie_r_ornerlip _o Vertial diplaement of right outer lip orner MNS B up tx Horizontal diplaement along the x-axi 102 ty Vertial diplaement along the y-axi MNS B left MNS B up 103 tz Depth diplaement along the z-axi MNS B forward FAPS NAMES MAY CONTAIN LETTERS WITH THE FOLLOWING MEANING: l = left, r = right, t = top, b = bottom, i = inner, o = outer, m = middle. THE SUM OF TWO CORRESPONDING TOP AND BOTTOM EYELID FAPS MUST EQUAL 1024 WHEN THE EYELIDS ARE CLOSED. INNER LIPS ARE CLOSED WHEN THE SUM OF TWO CORRESPONDING TOP AND BOTTOM LIP FAPS EQUALS ZERO. FOR EXAMPLE: (lower_t_midlip raie_b_midlip) = 0 WHEN THE LIPS ARE CLOSED. ALL DIRECTIONS ARE DEFINED WITH RESPECT TO THE FACE AND NOT THE IMAGE OF THE FACE. HEAD POSE EYES EYEBROWS MOUTH OpenGL implementation We ue OpenGL to render the peaker 3D head model and imulate the video view aquired from the amera. OpenGL ue it own perpetive projetion model, a we ee it illutrated in Figure VI-11. It i apable of rendering all element that fall inide the volume defined by the znear plane, the zfar plane and the fovy angle. The objet are rendered on the viewport of the appliation, or rendering window whoe harateriti are determined by b=bottom, t=top, l=left and r=right. The viewport ontain the image repreentation of the objet a if it had been foued on the znear plane. The fovy angle, the znear plane, and the viewport are highly related. One two of them are et, the third

211 182 Deription of the Sytem one i uniquely determined. We ontrol the final projetion etting uing OpenGL all glfrutum(l,r,b,t,znear,zfar)that diretly etablihe the volume that will be rendered on the viewport thank to the fovy angle being automatially dedued. OpenGL Y fovy b, l Y t Z X l Z r X znear b zfar viewport Figure VI-11. Perpetive projetion model and referene ytem in the yntheti world generated by OpenGL. The objet are foued on the znear plane and they are rendered on the viewport. The viewport i determined by t=top, r=right, b=bottom and l=left, and take the ize that will be preented on the reen window that mut math the ize harateriti of the video data. Sine the yntheti rendering mut adjut to reality, the yntheti head repreentation mut fit into the video input dimenion regardle of the dimenion of the 3D model itelf; otherwie the initialization tep, the poe-traking and the expreion analyi would not be feaible. Syntheti world are expreed in generi unit, here alled w_u, that do not repreent any peifi magnitude by themelve, therefore, before doing any analyi, we need to related thoe generi unit to the real input we get from the amera. To math the yntheti world with real life, we onider the propoed tandard aquiition ondition: a mall amera ituated in front of the peaker on top of the monitor around 75 m away from him. To rereate the real world during the ynthei independently of the model harateriti, that i, it ize, we rely on the following onideration: Firt, we et an anthropometri generalization, we etablih that the ditane ES0 i equivalent to 10 m for all head model ES0 = ES 1024 = 10 m. Then, our real aquiition ondition an be ummarized by determining fovy = 40. Thi hoie reult in the following parameter if we take into aount that the width of the input image i around 384 pel and that we onider a reen reolution of 32 pel/m:

212 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 183 w( pix ) 10 ES0( w _ u) r( w _ u) = 2 32( pel / m) 10( m) h( pix ) 1 ES0( w _ u) t( w _ u) = 2 32( pel / m) 10( m) l = r b = t znear w _ u) = r f = ot fovy = ( value obtained from the fovy : ( /2) zfar( w _ u) = 20 ES0, whih et a value large enough to enure the omplete model viualization. NOTE: The rendered data i aeed in memory like eah video frame (Figure VI-5) but it format i RGB32 intead of BGR32. VI.2.3 Deription of the viual tet-bed ued and how it real implementation would be A it i illutrated in Figure VI-1 and already mentioned in the beginning of thi etion, the experimental etting for our tet onit of two part: the video window and the rendering window. Let u deribe their relevane: Video Window: On thi window, we plot the video input from the aquiition ytem. On top of it we draw the following: o Red/blue quare: thee are the blok ued to perform the blok-mathing in 2D to trak the fae feature that will be ued during the Kalman-baed poe-traking algorithm. o Wider blue/yellow retangle: thee are the feature ROI for eye, eyebrow and mouth. They are obtained from projeting the 3D ROI defined over the peaker head model onto the video image plane. o White line/point inide the ROI: they are the reult from drawing the output from then the image proeing analyi algorithm ued on eah feature. o Green line/point inide the ROI: with them, we draw the final appearane of the projetion of the feature 3D neutral-motion model after having applied on them the motion parameter Theoretially, white and green line hould be drawn alike on the ROI, thu indiating that the omplete proe: undoing poe and projetion plu motion analyi, ha worked well.

213 184 Deription of the Sytem Rendering Window: Thi window how the ynthei of the 3D head ued for the analyi. Thi model i only animated with the rigid-motion parameter predited by the Kalman poe-traker. Exeptionally, an avatar i ued during ome of the tet for the eye-motion analyi. In thi ae, the avatar eye are alo animated. Thi experimental framework differ from what it hould be expeted on a real life appliation implementation. Indeed, it ha only been built for experimental purpoe and no optimization ha been applied. A general appliation would develop the analyi and ynthei required for the oding in bakground mode. Thi would imply that there would be no need for a viual implementation related to the analyi or the yntheti reult involved during the enoding of motion parameter, unle we would expliitly like to ontrol the analyi proedure reult. The only required viual implementation would be related to the reeiver part. It would fundamentally inlude the rendering of the peaker lone and any virtual element, uh other peaker and ommon environmental pae to hare during ommuniation.

214 185 Head-Poe Traking Baed on an Extended Kalman Filter VI.3 Head-Poe Traking Baed on an Extended Kalman Filter Thi etion briefly review the algorithm utilized for the rigid-motion traking in our ytem. The head traker i the reult of previou work at the Image Group. Some of the information preented ome from S. Valente Ph.D. thei (Valente, 1999). VI.3.1 Theoretial review To ummarize the theoretial bae of Kalman filtering let u onider that the omplete proedure i about etimating the tate Ψ of a ertain ytem at intant t. Ψ i not diretly aeible, but it i the reult of everal obervation t h( Ψt ) alo a rough idea of the general ytem evolution along the time, following = ( ) t =. We have Ψ t 1 a Ψ t. The unertainty related to the obervation equation and the evolution equation i added through the inorporation of two Gauian white noie, v t and w t, repetively, whoe ovariane are R and Q. t (VI-1) Ψt t = h = ( Ψt ) a( Ψ ) v t w 1 t t The obervation funtion and the dynami evolution funtion are linearized around the a-priori etimation Ψ t / t 1 and the a-poteriori etimation Ψ t / t repetively, with (VI-2) where H t = h Ψ Ψt =Ψ t / t 1 h a and ( Ψt ) h( Ψt t 1 ) H t ( Ψt Ψt / t 1 ) ( Ψ ) a( Ψ ) A ( Ψ Ψ ) t A t / & t / t a Ψ Ψt =Ψ t / t t t t / t = are the Jaobian of funtion h () and a (). After ome omputation (Kay, 1993), the a-poteriori etimation of Ψ t and the ovariane matrix of the error aoiated are given by the following filter equation: (VI-3) T K t = Pt / t 1H t Ψt / t = Ψt / t 1 P = t / t P t / t T 1 ( R H t Pt / t 1H t ) K ( h( Ψ )) ( I K H ) t t t t P t / t 1 and the a-priori etimation, with the error ovariane equation: t / t 1 P t 1/ t i given by the predition

215 186 Head-Poe Traking Baed on an Extended Kalman Filter (VI-4) P t Ψ 1/ t t 1/ t = a = A P t t / t ( Ψ ) t / t T t A Q We mut reall that there i no guaranty for thee etimation to be optimal after the linearization. Equation (VI-2) may not make any ene in a pratial ytem, or be numerially untable, depending on the appliability of the linearization. In our ae, the ytem will remain table a long a rigid movement of the head are mooth between oneutive frame. VI.3.2 Ue of the extended Kalman filter in our ontext Within the ontext of our appliation, the Kalman filter i the entral ore of the head traker, and baially, ha three tak: 1. it etimate the 3D loation and orientation of the peaker from 2D region traked on the video input; 2. it predit the 2D interet enter of the point hoen from the fae to be traked to help in the blok mathing proedure on the video; and 3. it enure that the model will be at the ame ale, loation and orientation on the yntheized image a on the input video image, even though image aquiition ha been made with an unalibrated amera. Conidering equation (VI-3), we ee that the filter produe Ψ t / t by retifying the predited tate Ψ t / t 1 by the orreting term K t ( t h( Ψ t / t 1 )), whih take into aount the differene between obervation t obtained at intant t and their predition h ( Ψ t / t 1 ). Therefore, the ation of the Kalman filter an be onidered a an iterative proe that adjut the ytem tate Ψ t / t to make it orrepond to the obervation t and to the dynami evolution model of equation (VI-4) at the ame time. Thi interpretation, whih take the ytem a an iterative adjutment, help to undertand how the filter i able to align the yntheti model of the peaker with hi head on the real image, by etimating the 3D loation and orientation of the real head, and by enuring both objet to be the ame ize, regardle of an unalibrated amera whoe foal length i unknown. Thi i ahieved by uing the obervation model h() that orrepond to the geometrial tranformation performed by the ynthei (in our ae the OpenGL engine) to projet the head model on the image plane, and not the equation (undetermined, beaue there i no alibration) of the amera perpetive projetion. The filter will align automatially the model and the head on the yntheti reprodution by taking the loation and the orientation of the yntheti model inide the

216 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 187 Eulidean pae ued by OpenGL and h( Ψ ) t =, the vetor of 2D oordinate of the faial feature being traked on the model yntheti image, a the tate vetor. Dynami evolution model The 6 parameter that are needed to ontrol the ynthei of the lone are: ( t, t, t,, ) T ψ =, X Y whih repreent the 3 degree of freedom of the model tranlation and the 3 degree of freedom of it rotation, related to the x-, y- and z-axi of the yntheti world. Thee are onatenated to their firt and eond derivative inide the filter tate vetor: T T T Ψ = ψ, ψ, ψ, ψ that follow the ytem dynami, whih i baed on the hypothei that the ytem work under ontant aeleration: Z t T (VI-5) 2 ψ t dt = ψ t ψ dt ψ dt. Obervation model Although the aquiition amera i not alibrated, we an onider that it make a perpetive projetion of the real world and not an orthographi projetion. The yntheti module mut then mime the perpetive projetion, whoe foal i F, taken by the filter. Thi projetion i the obervation model utilized for the adaptation proe of the feature motion analyi algorithm (IV-2) and that Figure VI-12 reall. Thi requirement i needed beaue we want to ue the predited poe parameter to extend the uage of the expreion analyi algorithm developed to tudy a frontal view of the fae to any other poe oberved on the video image. OpenGL implementation of the model ued The obervation model that i ued during the analyi mut be implemented in OpenGL to enable the yntheti rendering of the 3D head model to be the ame ize and to have the ame poe a the peaker head on eah video frame. A een in Setion VI.2.2, OpenGL handle the projetion and the manipulation of the 3D head model in a peifi way. In Figure VI-13, we ompare the referene ytem and the omponent of the obervation model propoed for head-poe traking and algorithmi extenion of expreion analyi to the referene ytem and the omponent of it OpenGL pratial implementation.

217 188 Head-Poe Traking Baed on an Extended Kalman Filter Figure VI-12. Shema of the referene ytem and amera model ued (of foal length F) for the adaptation proe. It etablihe the relationhip of a point in the Eulidean pae ( x,y ) T x and it projeted ounterpart on the amera image plane n = n n,z n ( x,y ) T F x n F x p = p p =, F z n F negative part of the Z-axi y z n n T. The axi orientation i uh that the amera only ee the Head Poe Traking Y w, h Y 1 w Z X h -1 Z 1 X F -1 bakward OpenGL Y b, l Y 1 l Z X b -1 Z 1 X znear -1 zfar Figure VI-13. Side view of the propoed obervation model and it OpenGL pratial implementation. In both ytem, referene ytem and omponent lightly differ but poe and motion deription tay the ame

218 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 189 After taking into onideration the new referene, the final expreion of the 3D 2D relationhip are x verion of ytem (V-2): p OGL '' x p 2D = and l y p OGL '' y p 2 D = from the adapted b (VI-6) x y p p '' 2D znear = N ( x ) x n OGL n OGL ( y n OGL z n OGL ) y n OGL t X OGL z n OGL t Y OGL N = ( ) x n ( ) y n z n t z znear bakward OGL OGL OGL OGL where znear = F, l = w 2, b = h 2 and t X = t OGL X, Y t OGL Y t = & t = t bakward. Z OGL Z VI.3.3 Influene of the traking dynami on the expreion analyi Valente (1999) made a omplete and exhautive analyi about the omportment of the extended Kalman Filter of the head poe traker utilized. From thi analyi, we would like to point out than even when the traker work fine, the obtained reult are lightly noiy and the tronget artifat appear in the preene of rapid tranition, thu indiating that the predition model of the Kalman filter, auming ontant aeleration, i not appropriate at thoe point (ee Figure VI-14). The interferene of the noie of the predited poe parameter in the feature motion interpretation i omparable to the image-proeing auray. Strong traking artifat, whih are not ommon, mak any other effet and frequently lead to erroneou reult. Figure VI-14. Real and reovered Y poition of a ample equene

219 190 Head-Poe Traking Baed on an Extended Kalman Filter To better undertand the dynami nature of the inauray introdued by the filter during the traking, we tried to reprodue the moment at whih poe traking tart. All poe parameter were et to zero and the head wa faing the amera in it neutral tate. The peaker remained idle for the firt eond of the equene. The poe parameter predited during the firt 3 eond are plotted in Figure VI-22. Although no rigid motion hould have been notied, the ytem noie introdue flutuation that alter the expeted parameter value, whih theoretially hould tay at zero. The oberved flutuation ome from two different origin. In the one hand, we ee the inuoidal deviation aued by the filter itelf (mot notieable during the firt eond); in the other hand, we oberve that the noiy data aquired during the traking i tranlated into more noie added to the final value (notieable during the third eond). Fortunately, after tudying the magnitude of the error ommitted, we realize that the flutuation introdued an be negleted during feature image-proeing if we ompare their influene to that of the artifat. We refer the reader to Setion 7 in Chapter V where the theoretial evaluation of the poe-traking inauray influene on the algorithmi extenion of the feature proeing ha been developed. There, the reader will find the bae upon whih we have been able to judge the Kalman interferene in the omplete analyi. Thi evaluation ha allowed u to upport our firt aumption regarding uing a poe-traker baed on an extended Kalman filter for our oupled poe-expreion analyi. If the aquiition i mooth, and the filter dynami are adapted aordingly, the algorithmi extenion of the image proeing involved during feature analyi doe not uffer muh from the Kalman filter dynami nature.

220 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 191 Parameter evolution at initializing the neutral poe radian frame # alfa beta gamma Parameter evolution at initiliazing the neutral poe magnitude (m_u) frame # tx ty tz Figure VI-15. Thee two graph how the flutuation that the Kalman filter introdue in the poe value utilized for the head traking and the expreion analyi algorithmi extenion. Head model dimenion: WIDTH = ; HEIGHT = 2871 m_u & DEPTH = m_u

221 192 Evaluating the Motion Template Extenion VI.4 Evaluating the Motion Template Extenion The pratial implementation of algorithm introdue a new oure of inauray: the double preiion ued during omputer mathematial operation. Inide our ytem, error oming from the mathematial omputer manipulation are equivalent to error oming from the inauray of the data obtained during the image-proeing manipulation on the frame. Therefore, they have the form of the expreion treated and tudied in Setion 7.2 of Chapter V: ~. ~ x p = x p ε x and y p = y p ε y We will reall that the multipliative error and the additive error derived from thi impreiion have a different impat on the final value reovered depending on the tate of the ytem: poe parameter, foal length, et. Under our analyi ondition, the omputational inauray i viually tranlated a a ± 1 pel differene between the original data analyzed and the data obtained after undoing the projetion and redoing the projetion of thee data. It i diffiult to et a quantitative method to enable the evaluation of the auray with whih the algorithm perform after having been oupled with the poe. The ue of real input i the bet way to evaluate the real performane of the tehnique but doe not allow u to ontrol beforehand the orret outome of the analyzed expreion. Moreover, it beome hard to detet and undertand the origin of the inauray; doe it ome from poe oupling inadequay, image-proeing failure or impreiion during the Kalman poe predition? Neverthele, we have tried to evaluate a oniely a poible how motion template analyi behavior i influened by the adaptation. Firt, in Subetion VI.4.1, we diu how the evolution of the area of analyi alter the proeing. Then, in Subetion VI.4.2 we tudy a quantitatively a poible the limitation of the adaptation in term of freedom of movement. Qualitative tet are eaier to perform. The viual feedbak obtained from the analyzed data an be plotted on the video input. It allow u to verify the performane of the algorithm. The reult from the viual evaluation tehnique are diued in Subetion VI.4.3.

222 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 193 VI.4.1 Interferene in the image-proeing: Deformation of the ROI and introdution of artifat from other feature The firt interferene in the deployment of the image-proeing analyi ome from the adaptation of the theoretial ROI to the phyial quare nature of frame in video memory. For pratial purpoe, to make the deformed area more uitable for image analyi we enloe them in video analyi retangle (x top,y top ) (x bottom,y bottom ): ( x top, y top ) = (min ( x ) ROI, max ( y ) ROI ); ( x, ) = (max ( x ) ROI, min ( y ) ROI ). bottom y bottom Thi enure that the ROI and it feature are ompletely inide the analyzed area. Unfortunately it alo implie the inluion of ome artifat oming from other faial feature next to the one being analyzed. In ome ae, like with hair and eye when analyzing eyebrow, it an be taken into aount during the image proeing; otherwie they will be poible oure of error that the ytem will have to ontrol. Figure VI-16 illutrate an example where the eye feature i very muh inluded inide the eyebrow ROI. In thi ae, the algorithm ha reolved poitively for the right eye but it i not able to reover the right hape from the left eye, thi ROI i not framed well enough. Corret analyi Inorret analyi Figure VI-16. The eyebrow-motion analyi algorithm ha been able to avoid the influene of the eye feature that i alo overed by the eyebrow ROI when analyzing the right eyebrow. For the analyi of the left eyebrow, the inauray of the ROI determination prevent the algorithm from properly deteting the eyebrow and it detet the eye intead

223 194 Evaluating the Motion Template Extenion VI.4.2 Influene of the urfae linear approximation The tudy of the effet of the urfae linear approximation on the performane of the adapted algorithm i not an eay tak, beaue the ue of the analyi depend on everal fator at the ame time: orret poe predition, right image-proeing reult, aurate urfae approximation, et. We were apable of tudying the algorithmi performane of the template deign and their related image-proeing tehnique independently of the poe when analyzing fae from a frontal perpetive but it i impoible to detah the influene of the algorithmi extenion proe of thee algorithm from their own performane when they work oupled with the poe. Neverthele, we have tried to et a quantitative evaluation to etimate how important the influene of the urfae deign i in obtaining proper reult from the oupled analyi. To do o, we have et ome experiment on the oupled eye-tate traking-algorithm uing the realiti 3D head model of the peaker a information oure to determine the 3D-ROI. We have taken into aount the data obtained from the analyi of the left and the right eye of an individual whoe poe had been pre-etablihed and fixed during the reording of the analyzed equene. Thee data are repreented in Figure VI-17 and determine the phyial relationhip between the extrated pupil loation and the dimenion of the 3D-ROI on the urfae linear approximation. Figure VI-18 depit the exat 3D-ROI oordinate value taken during the tet. 1 (X,Y) Figure VI-17. Data extrated during the eye-tate traking-algorithm tudy Table VI-3 F A P FAP SET TO 0 E Y E LEFT RIGHT M A G RESULTS FOR A FACE IN NEUTRAL POSITION STATS X Y mean max min tdev mean max min tdev

224 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 195 RIGHT LEFT (-491.7, 262.0) (-230.9, 278.0) (125.4, 302.0) (389.8, 294.0) (-513.4, 134.0) (-218.2, 134.0) (124.1, 180.0) (405.7, 148.0) z n = M = z n = M = Figure VI-18. Coordinate extrated from the 3D head model of the peaker ued to onform the ROI for the eye-tate traking-algorithm adaptation We preet a orret data the reult obtained when the tehnique i applied on the fae in it neutral tate ( = 0, = 0, = 0, t X = 0, ty = 0 and t Z = 0 ), on Table VI-3. We have ompared them againt the reult obtained from varying the tranlation and the rotation parameter of the peaker, eah parameter at a time, the t X evolution on Table VI-4, the t Y evolution on Table VI-5, the t Z evolution on Table VI-6, the evolution on Table VI-7, and the evolution on Table VI-8,. Rotation around the z- axi ( ) over more than 10 degree i a phyial movement diffiult to make by human head. We have not provided data related to thi rigid movement beaue, in thi ae, the urfae approximation i parallel to the image plane, and the rotation only implie a imple urfae rotation that alter neither the urfae projetion nor it ize on the image. After tudying the different reult obtained, we point out that it doe not eem to exit any orrelation between the performane of the analyi tehnique and the poe at whih the urfae i at the moment of the oupling. Thi implie that the linear urfae approximation work fine a long a the feature i ompletely viible on the image (a it i with the FAP value teted). Interetingly, we oberve that the performane i better for the right eye than for the left eye. We refer the reader to Figure VI-19 where we have plotted the maximum error found in the mean of the X, and Y pupil omponent. Thi reult allow u to think that the eleted urfae approximation for the right eye wa better than for the left one; in equal analyi ondition left analyi performed poorly ompared to right analyi and only the urfae approximation differed. The orret deign of the urfae approximation eem to be important although not ritial for the ue of the algorithm beaue the behavior of the method and it 3D extenion an be ontrolled beforehand. The experiment wa arried out on the eyetate traking-algorithm: it ould have been made on the eyebrow and mouth a well. The mot outtanding differene among the feature algorithmi extenion i the degree of omplexity and the amount of viual information required for eah. Thi i one of the

225 196 Evaluating the Motion Template Extenion reaon why quantitative teting method may help u to dedue weaknee and improve algorithm but qualitative method, where we an oberve the dynami evolution of the implanted analyi, uually are more helpful to et the limitation of the olution propoed. The urrent tudy let u onider the benefit from utilizing different urfae more onveniently adapted to the feature under analyi to tudy the poibility of not being ontrained by the projetion of the feature movement on a plane. From our analyi reult, we onlude that the peaker ha almot omplete freedom of movement regarding the tranlation along the x-, the y- and the z-axi, and regarding rotation, we have found a limit of around π / 4 rad. Thee reult will be alo orroborated with the viual inpetion made during the qualitative evaluation. Max error deteted 45 perentage with repet to the 3D-ROI width (X) and height (Y) FAP X (L) X (R) Y (L) Y (R) Max err X ( L ) Max err X ( R ) Max err Y ( L ) Max err Y ( R ) FAP magnitude π / 4 π / 10 π / 10 π / π / 10 π / 10 π / 8 π / 6 Figure VI-19. Maximum error in the average X and Y omponent found during the tudy. We have alo indiated the FAP magnitude at whih thee value ourred

226 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 197 Table VI-4 F A P 101 E Y E LEFT RIGHT M A G RESULTS FOR THE EVOLUTION OF THE t X PARAMETER STATS X Y mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev

227 198 Evaluating the Motion Template Extenion Table VI-5 F A P 102 E Y E LEFT RIGHT M A G RESULTS FOR THE EVOLUTION OF THE t Y PARAMETER STATS X Y mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev

228 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 199 Table VI-6 F A P 103 E Y E LEFT RIGHT M A G RESULTS FOR THE EVOLUTION OF THE t Z PARAMETER STATS X Y mean max min tdev mean max min tdev mean max min mean mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev

229 200 Evaluating the Motion Template Extenion Table VI-7 F A P 48 E Y E LEFT RIGHT M A G -π/10 π/10 π/8 π/6 π/4 -π/10 π/10 π/8 π/6 π/4 RESULTS FOR THE EVOLUTION OF THE PARAMETER STATS X Y mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev

230 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 201 Table VI-8 F A P 49 E Y E LEFT RIGHT M A G π/10 π/8 π/6 π/10 π/8 π/6 RESULTS FOR THE EVOLUTION OF THE PARAMETER STATS X Y mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev mean max min tdev

231 202 Evaluating the Motion Template Extenion VI.4.3 Qualitative evaluation of the motion template algorithmi extenion to 3D To tudy the performane of the propoed algorithmi extenion, we et everal experiment where viual feedbak from the analyi wa provided. We implemented the adaptation for eye and eyebrow like it i explained in Setion VI.2. The mouth, whoe motion i very omplex to analyze, i a eparate topi of reearh and it would require more detailed teting and to the poibility of omplementing the analyi with ome peeh proeing. Evaluating the eye-tate traking-algorithm uing an avatar We applied the eye animation parameter (FAP: 19, 20, 21, 22, 23, 24, 25 & 26) to the head model Olivier (ee Figure VI-20) to tudy the degree of naturalne that we ould obtain from the eye-tate traking-algorithm. Firt, we looked at the effiieny of the algorithm when it wa only applied without taking into aount the poe (VIDEO: frontal.2.avi). Reult were enouraging and demontrate that immediate undertanding and repliation of eye motion learly deliver a fine ene of life to the avatar. Then, we performed the ame kind of tet but having oupled the analyi algorithm with the rigid-motion information provided by the poe preditor baed on an extended Kalman filter (VIDEO: eyeoupling.avi). The naturalne ahieved by the oupling i outtanding. Eye and poe motion applied together add to the avatar a natural feeling diffiult to obtain with automati tandard faial animation tehnique. Regarding tehnial iue, the pratial implementation of thi tet-bed allowed u to examine how performing the method beome when utilizing 3D data that ha been extrated from a head model other than the peaker lone. We onlude that not uing Figure VI-20. Faial animation parameter for the eye were extrated uing the eye-tate traking-algorithm and immediately applied and rendered on the Olivier avatar

232 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 203 the peaker realiti head model wa not a hard ontraint for the eye analyi algorithm beaue thi latet i imple enough to eaily be adapted to any model available. Neverthele, to ue a different 3D-head implie retrition in the peaker movement. The poe traking algorithm ould not reover and trak bak to the neutral head poition after the peaker head had rotated ( parameter) more than ± 20 deg. The ue of a realiti 3D repreentation of the peaker permit higher freedom of movement; a it ould be appreiated during the tet deribed in the next etion. Evaluating the performane of the eyebrow motion analyi algorithm To viually hek the performane of the eyebrow analyi algorithm after it had been oupled with poe information, we deigned a graphial feedbak on the video window of our tet platform. The arh obtained from the analyi ( urrent frame arh ) wa drawn on eah video frame along with the arh ( modeled arh ) reulting from applying the motion parameter extrated during the urrent analyi on the ar extrated while the eyebrow wa not moving ( neutral arh ). Thee two arhe were the bai of the quantitative analyi preented in Setion 4.3 of Chapter IV, when we expoed the objetive tet arried out to evaluate the algorithm in fae analyzed from a frontal perpetive. In Figure VI-21, we an ee a erie of hot extrated from one of the final tudied equene. The urrent frame arh i plotted in white, the modeled arh in green. Ideally, if the algorithm work perfetly both arhe hould be drawn very loe to eah other. From the tet we have made, whether it wa in the laboratory environment or away, we have been able to onlude that the oupled analyi an extrat meaningful motion data a long a the head doe not rotate more than ± π / 4 deg. The analyi i tolerant to tranlation. During thee experiment we ued a realiti 3D-head model of the peaker. Thi i the reaon why it wa poible to reover a neutral poition from a wider ranger of rotation than uing an avatar. The algorithm wa able to extrat oherent motion parameter even when the eyebrow wa not ompletely viible. Although the extrated data are not aurate, they provide meaningful information. They repreent the bet approximation to the oberved movement that we an obtain. The algorithm ha been deigned not to draw error through the analyi of the equene. If the proe doe not ueed in analyzing orretly one peifi frame, the ret of the equene doe not uffer from thi inident. We ould oberve thi behavior when the proeing reovered graiouly from an inorret reult.

233 204 Evaluating the Motion Template Extenion (It ontinue on the next page)

234 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 205 Figure VI-21. Sequene of hot extrated from eye&eyebrowoupled.avi. The right eyebrow ha been extrated. We oberve the evolution of the head rotation at the ame time a the eyebrow move upward. The white line repreent the arh extrated from the image proeing analyi, the green line i the reult from the projetion of the neutral motion model after having applied on it the motion parameter. Ideally, both arhe hould have the ame hape and loation. The pupil traking from the right eye i alo plotted. The blue retangle are the eye and eyebrow ROI and the red quare are the blok utilized during the poe traking

235 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 206 VI.5 Analyzing Real-time Capabilitie The algorithm developed and teted in thi thei aim at providing olution to deploy virtual teleonferening ytem. When tudying poible new teleom appliation, it beome important to evaluate the potential of the ytem to run realtime. The algorithmi analytial truture wa kept imple and effiient to make realtime apabilitie feaible. It ha alo been developed flexibly to enable the inreae in the algorithmi omplexity a the omputing power available augment. Two more reaon led u to keep omputing omplexity a a priority: 1. Kalman filtering for head traking i a dynami ytem highly dependent on the moothne of the aquired video and the time involved in the feature 2D traking. Although moothne an be imulated by reording video at 30 f/ and analyzing the image afterward at a lower rate, the filter harateriti are et o that it dynami fit the peed of traking. Analyzing very lowly would not imulate what would really happen to the filter in real life ommuniation; 2. if very low analyi algorithm had been implemented, no tudy about the rendering of fap ould have been made beaue no ubjetive evaluation of the naturalne of the analyi reult ould have been made. The purpoe of thi etion i to detet whih are the key part of the ytem in term of omputational peed. We want to detet the bottlenek of the analyi-ynthei hain to etablih the real-time viability of the propoed olution. We know that the algorithm are not environment-dependent beaue it ha been determined a a premie; here we tudy if they are pratially deployable. The algorithm have uefully worked on line on the following omputer: PC Intel Pentium III 700 MHz eah, with aquiition ard Laptop Intel 1.6 GHz with FireWire amera PC Intel Pentium 2.0 GHz with aquiition ard

236 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 207 VI.5.1 Time performane evaluation of the algorithm Let u review the relevane of eah of the module that ompoe our ytem. One module ontain the video proeing involved during the faial analyi, from the poe traking to the expreion analyi algorithmi implementation, plu the video rendering itelf. The other module ontain the yntheti rendering of the 3D head model after having been animated uing the fap obtained from the analyi. The main proee involved are equential. One video frame i analyzed, the fap obtained are applied on the model; and then, the model i rendered. No new frame i analyzed until the yntheti rendering i finihed; therefore, tudying the frame rate evolution of the video beome a imple way to evaluate the peed performane of eah module. Although the ode utilized for the tet ha not been optimized, the tudie performed helped u to roughly determine the mot important point to take into aount for pratial implementation. The blok mathing needed for head traking wa implemented to trak blok in parallel. The bi-proeor omputer utilized for the tet profited from thi operation (ee Table VI-9). The remaining proee were implemented equentially. The main goal of a orret faial video analyi for online appliation i to implement algorithm that perform fater that the video aquiition frame rate. At the ame time, we need to have a yntheti rendering engine that render faial animation at leat a fat a the video i been aquired, otherwie a low down effet will appear on the ynthei. Table VI-9 Proeor: Memory: COMPUTER CHARACTERISTICS FOR THE TESTS PC Intel Pentium III 700 MHz eah 756 MB RAM Aquiition Card: Oprey 100 Video Capture Devie Video Card: nvidia GeFore2 MX/MX 400 OS: MS Window 2000

237 208 Analyzing Real-time Capabilitie The firt tep wa to etablih the influene of the rendering inide the omplete ytem. The rendering peed of faial animation i inverely proportional to the number of vertie that ompoe the head model. Several tudie about rendering performane of faial animation already exit (Breton, 2002). Our experiment were intended to find what wa the model ize that would not interfere in the evaluation of the omputational performane of the video analyi. We ued different verion of the ame model, eah one being of different ize. Figure VI-22 plot the video frame rate veru the number of vertie of the model rendered. A expeted, analyzing thi graph, we oberve that the larger the number of vertie, the lower the ytem beome. When we redue the number of vertie to inreae the performane, we reah a point after whih no peed improvement in ahieved (~10000 vertie). At thi point, we an affirm the following: The rendering interferene into the final ytem i almot minimal and therefore it determine the proper ondition to evaluate the ytem faial analyi rate. The head model ize i not the optimal option in term of realiti animation a the viual apet of the model beome le pleaant a the number of vertie i redued. Let u reall that fae loning require very dene wire frame. We hoe the head model made of 2574 vertie for the faial expreion tet. We onidered it to be the bet trade-off between having the fewet vertie and the bet viual appearane. In our eond tep, we divided the tudy of the video analyi blok in two. Firt, we evaluated the influene of the poe-traking algorithm (Figure VI-23) and then, we invetigated the time performane of eah feature analyi proedure (Figure VI-24). After examining the graph, we onluded that the poe-traking algorithm, and more onretely, the blok mathing required to trak the fae feature on the video equene aue the bottlenek in the proeing peed. In fat, although parallel programming ha been ued to peed up the proeing, thi harateriti i arely exploited beaue our omputer ha only two proeor. Ideally, dediated implementation for blok mathing hould be ued to improve the poe traking performane. The influene of the feature analyi algorithm remain very marginal due to the time ontraint impoed by the poe traking. We would like to imply point out that the eye tate analyi algorithm i omputationally more demanding than the eyebrow motion analyi.

238 Tehnial Evaluation of Coupling Faial Expreion Analyi and Head Poe Traking 209 Computing peed veru number of vertie # vertie f/ f/ # of vertie Figure VI-22. Evolution of the ytem peed veru the omplexity of the head model being rendered. Kalman poe traking wa done with 10 feature Computing peed veru number of traked point # traked point f/ f/ y = x x # feature Figure VI-23. Evolution of the ytem peed veru the number of feature utilized during the head traking. The model ued had 2574 vertie. No expreion analyi wa made

239 210 Analyzing Real-time Capabilitie # analyzed feature f/ 1 eyebrow eyebrow eyebrow 1 Eye eyebrow 2 Eye f/ Computing peed veru number of analyzed feature 1 eyebrow 2 eyebrow 2 eyebrow 1 Eye feature 2 eyebrow 2 Eye Figure VI-24. Evaluation of the omputing peed ot of the expreion analyi. Kalman poe traking wa done with 10 feature and the model ued had 2574 vertie

240 211 Conluion VI.6 Conluion We have deployed an appliation that ha allowed u to evaluate the performane of the oupling between expreion analye and poe traking. The main purpoe of the algorithm developed for the analyi and the methodology ued for the oupling i to ome up with a global olution that ould be a flexible a poible and uable in any irumtane. Aiming at thi goal, we preented our tet-bed platform a a demo during ACM Multimedia 2002 (André del Valle & Dugelay, 2002). Depite that the environmental ondition were unknown, the platform, whih ha been deigned to work almot realtime, allowed u to verify that the algorithmi olution propoed ould alo work in another loation rather than the laboratory. Treating poe and expreion eparately ha many advantage. We have taken the mot out of them by deigning peifi faial feature analyi tehnique that aim at extrating the mot intereting motion information without building too ompliated motion template. We have proved that expreion-poe oupling by extending the definition of motion template in 3D i feaible and give very good reult. Repliating human faial motion from the analyi of viual data i one of the bet way to generate natural animation beaue it i a non-invaive method. The ue of faial animation in ommuniation that ue virtual environment i neeary if we aim at reating a feeling of trut among peaker. Tet that meaure the quality of the analyi for appliation like our in a quantitative way are diffiult to et. Our tet were deigned to give an objetive perpetive of the performane and the improvement poibilitie of the algorithm preented. From a ubjetive point of view, the tet-bed platform allowed u to evaluate qualitatively our algorithm applied on eye and eyebrow. The olution propoed ha proved to be flexible, robut and it ha the potential to be extended to the analyi of other feature like the mouth, wrinkle, et. The main onluion from our tet were: The poe traker ued for the oupling i ritial. The tability of the head poe traking will diretly influene the expreion analyi reult in the preented work frame. The theoretial retrition in the head rotation oberved during the equation analyi, π / 2, i inreaed to π / 4 due a to the partial or omplete hiding of the feature to be analyzed from the image plane.

241

242 Conluion and Future Work 1 Conluion from main ontribution Two premie have driven reearh in videoonferening. Tehnially, more effiient way of oding and tranmitting video information are ought; oially, new ommuniation framework where viual information inreae peaker interation are invetigated. New teleommuniation trend onider ahieving thee goal by uing yntheti data. In virtual teleonferening, whih i the framework of our reearh, peaker are ubtituted by 3D yntheti model lone (realiti) or avatar (ymboli). Regular video data i replaed by a limited number of ation parameter determining faial motion, thu reduing the bandwidth required for tranmiion. Furthermore, when yntheizing the model in a ommon interative environment a natural ommuniation ituation i rereated. It i quite imple to identify an avatar; it i more diffiult to determine when a realiti head repreentation of a peron an be onidered a a lone. Cloning omeone fae i not imply a matter of uing a viually realiti 3D-meh repreentation of hi head but alo the aquired apability of repliating eah one of hi movement and expreion with detail. Unfortunately, it doe not exit a general rule that etablihe the differene between a 3D-head realitially animated and a lone. The differene may be a ubjetive iue and it determination may fall in the domain of pyhology and behavior learning more than engineering. Neverthele, it remain lear that the bet way to reate the animation for a lone i from the analyi of media obtained from reording the individual in daily life. From all the media available to generate faial animation (peeh, magneti aptor, et.) monoular image/video i the mot intenely ued beaue of it availability and it non-invaive nature. When uing faial animation in teleommuniation we tranmit faial animation parameter from an fap generator and we render them on the reeiver. Both ender and reeiver mut hare the ame yntax and emanti for the fap in order to et oherent ommuniation. Indeed, the oding tehnique utilized during the ignal proeing mut not alter the yntax or the emanti; otherwie it will diturb ommuniation ignifiantly. The lo of data or interferene in fap tream might have muh wore

243 214 Conluion and Future Work effet than urrent video diruption due to frame dropping. The MPEG-4 tandard ha etablihed ome peifi deoding iue to be ued for faial animation. The tandard peifie ommon yntax to deribe fae behavior, thu permitting interoperability among different fae animation ytem. At thi point of the evolution and deployment of appliation ompliant with MPEG-4 everal onern have appeared: Ha the tandard given a global olution that all peifi fae animation ytem an adopt? Or, doe the yntax retrit the emanti of the poible ahievable motion too muh? No matter the anwer, the exitene of all thee doubt how that there i till a long way to go to mater fae animation and more onretely, the automati generation of realiti human-like fae motion. The undertanding of faial nonverbal behavior, epeially from eye and eyebrow, from monoular image beome ritial to generate natural and oherent faial animation on yntheti head model when uing laial teleonferening input (monoular video). Speifially, faial expreion analyi on monoular image ha beome a major key point to takle in the following field: Computer Graphi (CG): to reate realiti animation; Image Proeing (IP): for model-baed oding; Computer Viion (CV): for expreion analyi in image reprodution; Human-Computer Interation (HCI): to make mahine reat to human behavior. The different approahe taken to perform expreion analyi in eah field depend on two fator. Firt, it depend on the amount of detailed motion data needed; for intane, more motion information i needed in CG than in HCI. Seond, the method differ baed on the level of undertanding required in their appliation; in HCI we need to omprehend what kind of ation ha happened, by reognizing a feeling for example, wherea in CG or IP, we only require repliation. In all ae, it beome ruial to ontrol the influene of the poe on the final expreion that appear in the fae on the image. Developing a video analyi heme where head poe traking and fae feature expreion analyi are treated eparately permit to deign peialized image analyi algorithm adjuted to peifi need, feature harateriti, et. Algorithm that are univerally ueable generally lak preiion. Indeed, if no previou aumption are taken then, making uitable the analyi to all ae implie lot of omputation and therefore the lo of real-time poibilitie. To ompenate thi retrition, we may generate le preie analyi algorithm (uing imple motion model) but keeping in mind the poibility of improving the omplexity of the ytem; a the omputational requirement

244 Conluion and Future Work 215 beome le and le retritive, a flexible analyi heme will allow u to inreae the omplexity and to extrat more detailed motion data. Thi thei ha preented a ytem that aim at analyzing faial motion and expreion following the aforementioned idea. It profit a muh a poible from intra-feature ontraint (like natural eye motion) and inter-feature ontraint (peifi eyelid motion from eyebrow analyi). The algorithm have firt been defined for a frontal poition and then, they have been extended to work integrated into a poetraking ytem. The obtained experimental reult are very poitive and enourage the author to keep developing the ame trategy to analyze other faial feature: mouth, wrinkle,... whoe analyi and modeling eem a-priori more omplex. The inter-feature information will then be enrihed and the obtained motion information more aurate. The ue of natural intra-feature ontraint ha allowed u to develop a gaze traking algorithm apable of deduing eyelid motion from the undertanding of pupil ation. Muular-baed intra-feature retrition have et the bai for the mathematial template model that analyze eyebrow and mouth movement. The orrelation that exit among faial feature i exploited to et inter-feature ontraint that help to omplement the analyi of thee feature. We have developed and teted algorithm that ued inter-feature ontraint. For intane, we improve eyelid motion extration by adding eyebrow behavior information to the eye-tate trakingalgorithm we have deigned. The faial feature motion analyi template need to be adapted to extend their ue from imply analyzing a frontal view of a fae to tudy faial motion from a head howing a different poe on the image. Our approah to perform thi algorithmi extenion ha been to redefine the deigned motion template in 3D pae, over a realiti head repreentation of the peaker that preent it neutral poition (faing the amera). Thank to the rigid-motion data provided by a poe traking algorithm that ue an extended Kalman filter, we were able to relate the 2D information extrated from eah video frame to the 3D deription of the motion template and vie vera. The analyi proe an be ummarized a follow: All data from the motion template needed to make the analyi, for intane the feature ROI, are projeted from 3D onto the video frame. Uing image-proeing tehnique peifi to eah feature and already teted for a frontal perpetive, we extrat the data required to interpret the movement. Uing the poe information, we undo the projetion and the poe and we interpret thee data on the 3D motion template redefined on a frontal headmodel.

245 216 Conluion and Future Work Undoing the projetion and the poe i not a imple tak. It i impoible to derive 3D information from imple 2D data. Thi i the reaon why it i neeary to model the urfae upon whih the motion template will be redefined. To keep the analyi imple and the extenion from 2D to 3D of the motion template traightforward, we approximate the feature urfae by the plane that bet fit the feature. The ue of a poe-traking algorithm baed on Kalman filtering introdue ome noie in the extrated data. Neverthele, it influene in the final reult i minimal a long a the traking i not lot. Our olution i able to analyze faial feature and to trak the head imultaneouly for almot any poe not retrited by the theoretial limitation of the ytem (rotation greater than ± π / 2 rad). The evolution of the ROI projetion help to ontrol the performane of the analyi. After having implemented a pratial enario to tet our algorithm, we have notied that the algorithm limitation inreae to ± π /4 rad beaue the analyi performane i ontrolled by the viual repreentation of the feature. Beyond thi range, faial feature are not ompletely preent in the image. Our reearh ha proved that thi tehnique i flexible, robut and ha the potential to be exploited for the analyi of other faial feature.

246 Conluion and Future Work Future work The propoed faial motion and expreion analyi framework ha been teted on the mot ative feature; it an alo be eaily extended to analyze any other part on the fae, for example, wrinkle and furrow. A future hallenge will appear at the moment when more faial feature tart being oupled with the propoed Kalman-baed poe traking ytem. A more feature are ueptible of moving, le fixed faial trakingpoint will be available for the head traking algorithm. At that point, tudie about the robutne of the poe traking veru the number of feature and the freedom of movement for the peaker will have to be done. Complementary olution might be ued. We will ite, for example, the inertion of omplete viual feedbak from the lone, that i, getting not only viual feedbak from the rigid-motion but alo from the faial expreion a a omplement for the Kalman-baed poe-traker. We might even earh for a poible head traker ubtitute that relying on the ame geometrial harateriti ould alo work in the preent environment. Motion template adaptation to 3D pae ha been done with the approximation of eah feature urfae by a plane. Thi wa a imple, traightfoward and robut olution but it introdued ome inauraie in the reult. To redue the impreiion that thi approximation reate, template ould be redefined on urfae that hape better the area that will be analyzed. Regardle of the urfae ued, we hould keep in mind that the utilization of monoular video i by itelf a retrition. Feature might be partially or ompletely oluded at ome point of the analyi. To ompenate for thi lak of viual information, we ould reinfore ymmetry ontraint to generate motion data of the miing feature; we ould ue the analyzed parameter obtained from their ymmetri ounterpart. The ytem preented in thi thei i the integration of everal module that an work independently from eah other. We an profit from thi fat by reutilizing the analyi module eparately in different ontext and appliation. For intane, the propoed eye-tate traking-algorithm ould be ued on high-peed video reording of eye equene from medial patient in the earh for orrelated pattern of brain ativity. Although the tehnique herein deribed ha been addreed a a olution to analyze faial expreion to obtain animation parameter for yntheti faial motion, it an alo be extended to other ientifi field where the knowledge of the intant ation of the peron in front of the amera i deired, for intane, in Human Computer Interation analyi (André del Valle & Dugelay, 2003). People an undertand faial ation even when fae are under very bad lighting or in the preene of diturbing objet over them. Thi i baially due to the fat that human are able to automatially

247 218 Conluion and Future Work redue the omplexity of the analyi into different part and to do thi analyi progreively. Firt, we examine the ondition under whih the fae i and we deide if further undertanding i poible; then, we loate the head and get it rigid motion (it poe) and finally, we pay attention to the different detail of the fae that are intereting to u beaue they ontain expreion information. When human are not able to perform an exhautive analyi (lighting i very bad, or a ignifiant part of the fae i oluded), they make up for the miing information (generally auming tandard human behavior) or they imply aept that they annot undertand the fae motion they are oberving. The preented framework i deigned to perform faial motion and expreion analyi on monoular image trying to repliate thi natural and intuitive human behavior. The interet in faial motion undertanding i inreaing. New reearh in thi field i in it way. The new European Network of Exellene (NoE) Similar (2003) ha joined the effort of everal intitution that aim at developing tool like the virtual teleonferening ytem aimed by our reearh. Among other ativitie, thi NoE will be developing appliation baed on tehnique imilar to the one preented here, will alo do reearh to improve urrent algorithm and will et the bae for a global European network in multimodal human-omputer interation.

248 Conluion and Future Work Publiation derived from thi reearh Book hapter Tehnique for Fae Motion & Expreion Analyi on Monoular Image Ana C. André del Valle and Jean-Lu Dugelay To appear in N. Sarri & M. G. Strintzi (Ed.) 3D Modeling and Animation: Synthei and Analyi Tehnique for the Human Body Idea Group Publiher. Journal Effiient Oular Expreion Analyi for Syntheti Reprodution A. C. André del Valle and Jean-Lu Dugelay Submitted to IEEE Tranation on Multimedia. Analyi and Reprodution of Faial Expreion for Realiti Clone Stéphane Valente, Ana C. André del Valle and Jean-Lu Dugelay The Journal of VLSI Signal Proeing Augut 2001 Vol. 29 Iue:1/2 pp: Conferene Making Mahine Undertand Faial Motion Like Human Do A. C. André del Valle & J.-L. Dugelay Human Computer Interation International, HCI International 2003 June 21t-23rd, 2003 Crete - Greee Eyebrow Movement Analyi over Real-time Video Sequene for Syntheti Repreentation A. C. André del Valle & J.-L. Dugelay AMDO 2002 November 21t-23rd, 2002 Palma de Mallora - Spain Faial Expreion Analyi Robut to 3D Head Poe Motion A. C. André del Valle & J.-L. Dugelay ICME 2002 Augut 16th-29th, 2002 Lauanne - Switzerland

249 220 Conluion and Future Work Eye State Traking for Fae Cloning A. C. André del Valle & J.-L. Dugelay ICIP 2001 Otober 7th - 10th, 2001 Thealoniki- Greee Analyi-Synthei Cooperation for MPEG-4 Realiti Clone Animation J.-L. Dugelay & A. C. André del Valle Euroimage ICAV3D 2001 May 30th - June 1t, 2001 Mykono - Greee Aquiition et Animation de Clone Réalite pour le Téléommuniation A. C. André del Valle & J.-L. Dugelay & E. Garia & S. Valente Compreion et Repréentation de Signaux Audioviuel, CORESA 2000 Otober 19, 2000 Poitier - Frane Tutorial Human movement. Fae reognition and animation «Faial Animation. Analyi and Synthei Method to Repliate Human Communiation» A. C. André del Valle & R. Ma & F. J. Perale Seond International Workhop on Artiulated Motion and Deformable Objet, AMDO 2002 November 2002 Palma de Mallora - Spain Tehnial Demo Online Fae Analyi: Coupling Head Poe-Traking with Fae Expreion Analyi A. C. André del Valle & J.-L. Dugelay ACM Multimedia Deember 2002 Juan-Le-Pin - Frane

250 Conluion and Future Work 221 Tehnial Report Poe Coupling with Eye Movement Traking A. C. André del Valle & J.-L. Dugelay February 2002 Eureom - Frane RR A Video Conferene Sytem under MPEG-4; Overview of Fae Animation in MPEG-4; and Study of the Compliane Level of Eureom' Fae Animation-Teleonferening Sytem A. C. André del Valle & J.-L. Dugelay & D. Pelé Marh 2001 Eureom/Frane Teleom - Frane RR ; FT/BD/DIH/HDM/11/DP

251

252

253

254 Appendie

255

256 Appendix I-A Camera Calibration During amera alibration we mut link the real world referene frame to the image referene frame in order to find the relationhip between the oordinate of the point in 3D pae and the oordinate of the ame point in the image. To do o, we introdue the amera referene frame beaue there i no diret relation between the previouly mentioned referene frame. Then, we an find an equation linking the amera referene frame with the image referene frame (LinkI), and another equation linking the world referene frame with the amera referene frame (LinkE). Identifying LinkI and LinkE i equivalent to finding the amera harateriti, alo known a the amera extrini and intrini parameter. We generally define thee parameter a follow (Truo & Verri, 1998): Extrini parameter are the parameter that define the loation and orientation of the amera referene frame with repet to a known world referene frame. Intrini parameter are the parameter neeary to link the pixel oordinate of an image point with the orreponding point in the amera referene frame. There exit many alibration tehnique that have been reported in the pat two deade. The developed method an be roughly laified into two group: photogrammeti alibration and elf-alibration. Photogrammeti alibration: thi type of alibration i performed by oberving an objet whoe geometry in 3D-pae i known with very good preiion. The alibration objet uually onit of two or three plane orthogonal to eah other. Sometime, a plane undergoing a preiely known tranlation i alo ued. Thi type of approah require an elaborate etup, but an be done very effiiently. Self-alibration: thi method doe not ue a alibration objet. By moving a amera in a tati ene, the rigidity of the ene already provide two ontraint of the amera parameter from one amera diplaement by uing image information alone. Three image taken by the ame amera with fixed intrini parameter are uffiient to reover both intrini and extrini parameter. Thi approah i very flexible, however, it i not a aurate a the photogrammeti one.

257

258 e Appendix I-B Illumination Model There are two major ategorie of refleted light: ( i ) Diffue Reradiation (attering): thi our when the inident light penetrate the urfae and i refleted equally in all diretion. The light interat trongly with the urfae, o it olor i affeted by the urfae olor. Thi kind of illumination behavior predominate on unpolihed urfae. ( ii ) Speular Refletion: light doe not penetrate the objet, but it i intead diretly refleted from it outer urfae. It make the objet look hinny and it ha the ame olor a the light oure. Mirror are totally peular. The total light refleted in a ertain diretion i the um of the diffue and the peular omponent in that diretion. The intenity of the pixel that we get from the image of the fae i the reult of the light from the reorded ene (i.e. the fae) attered toward the amera len. The nature of the refletion phenomenon require the knowledge of ome vetor magnitude (Figure App-1): the normal n r to the urfae at the point p being tudied; the vetor v r from p to the amera; and the vetor r from p to the light oure. n r v r r ϕ ϑ θ p r Figure App-1. The refleted light that reahe the amera len depend on the diretion of the normal to the urfae ( n r ) the vetor from the tudied point to the light oure ( r ) and the vetor from the point to the amera len (v r ). ϑ = ϕ for perfetly peular refletion. A fairly extended approah to appreiate the reult of lighting on fae in to analyze illumination by trying to ynthetially reprodue it on the realiti 3D-model of the uer head. Phong refletion model i the 3D hading model mot heavily ued to aign hade to eah individual pixel of the yntheti fae. It i haraterized by implifying eond order refletion introduing an ambient refletion term that

259 f imulate the pare (diffue) refletion oming from oure whoe light ha been o dipered that it i very diffiult to determine it origin. In a more or le implified way we an undertand the final intenity of eah of the pixel a: l l r r r r f ( 0-1 ) I = I r f I r ( n ) r ( v ) a a l L att where eah addition term repreent the intenity ontribution of the eah light oure (l) being refleted (L) by the urfae: d I ar a : i the alar produt of the intenity of the ambient light and the ambient refletion oeffiient for the urfae; it i a ingle value independent of other light oure. l l f att : i the light attenuation oeffiient. Energy from a point light oure reahing a piee of a urfae fall off a the invere quare of the ditane the light ha traveled (d l ). I ( 0-2 ) I, att =. 2 d Real world light oure are not point. Generally, the attenuation i then approximated by: l l, ( 0-3 ) f 1 min, 1 att 2 1 2d l 3d, l where 1, 2 and 3 and ome pre-etablihed model value. l I : i the intenity magnitude of the light (l) of oure. r ( r r d l n ) : repreent the ontribution of the diffue reradiation. I l ( r r r n ) d l, alo known a Lambert Law, tate that if a urfae i turned away from a light by ome angle θ, the area ubtended by the light i o(θ ) le than before, whih implie that the brightne of the light oure dereae by r r the ame amount. o(θ ) i l n (normalized). rd i the diffue refletion oeffiient for the urfae. r r r ( ) f l v : repreent the ontribution due to the peular refletion. f i the parameter that ontrol the hinine of the urfae. Larger f value repreent hinier urfae, and will lead to maller peular highlight. Thi i a implified expreion that trie to model the fat that the amount of light goe down a the angle ϕ between r l and v r goe up. The atual expreion of ϕ i very omplex.

g Appendix I-C Morphologial Mathemati: the Waterhed Tranformation Here, we will diu the mot extenively ued algorithm for egmentation that i baed on mathematial morphology: waterhed, o to get an idea

260 g Appendix I-C Morphologial Mathemati: the Waterhed Tranformation Here, we will diu the mot extenively ued algorithm for egmentation that i baed on mathematial morphology: waterhed, o to get an idea of the trength of the math tool that are propoed. The waterhed tranformation Priniple Any grey tone image an be onidered a a topographi urfae. If we flood thi urfae from it minima and, if we prevent the merging of the water oming from different oure, we partition the image into two different et: the athment bain and the waterhed line. ( i ) If we apply thi tranformation to the image gradient, the athment bain hould theoretially orrepond to the homogeneou grey level region of thi image. However, in pratie, thi tranform produe an important over-egmentation due to noie or loal irregularitie in the gradient image. Marker-ontrolled waterhed A major enhanement of the waterhed tranformation onit in flooding the topographi urfae from a previouly defined et of marker. Doing o, we prevent any over-egmentation. Figure App-2. Marker of the blob and of the bakground and marker-ontrolled waterhed of the gradient image. The egmentation paradigm Segmenting an image by the waterhed tranformation i therefore a two-tep proe:

h 1) Finding the marker and the egmentation riterion (the riterion or funtion whih will be ued to plit the

2) Performing a marker-ontrolled waterhed with thee two element.

the ditane funtion to the initial image. Figure App-3.

261 h 1) Finding the marker and the egmentation riterion (the riterion or funtion whih will be ued to plit the region - it i mot often the ontrat or gradient, but not neearily). 2) Performing a marker-ontrolled waterhed with thee two element. The diffiulty of the tehnique lie on how to determine the image harateriti that will permit an automati marking proe. In the ae illutrated in Figure App-3, where offee bean are deteted, the riterion ued to mark the orret area i the ditane funtion to the initial image. Figure App-3. To ount the offee bean the waterhed marking deiion follow a riterion baed on the ditane funtion to the initial image. Figure App-4. Thi graph i a oneptual deription of the omplete egmentation proe. The mathematial morphology tool provide the bai for the proeing but there mut exit and intelligent deiion part to evaluate the oherene and lead the analyi

i Hierarhial egmentation The waterhed tranformation an alo be ued to define a hierarhy among the athment bain.

From thi image, a new riterion funtion i built (baed on the relative height of the wall eparating the initial athment bain).

Many other tehnique and tool an be ued to define a hierarhy on an image. Mot of them are baed on a flooding proe.

262 i Hierarhial egmentation The waterhed tranformation an alo be ued to define a hierarhy among the athment bain. Starting from the initial waterhed tranformation of the gradient image, a moai image an be defined, and then it aoiated gradient. From thi image, a new riterion funtion i built (baed on the relative height of the wall eparating the initial athment bain). The waterhed tranformation applied to thi image provide a higher level of hierarhy in the egmented image (thu uppreing muh of the over-egmentation). Many other tehnique and tool an be ued to define a hierarhy on an image. Mot of them are baed on a flooding proe. Feature-baed fae motion analyi an profit from mathematial morphology tool. Like we ee in (Ravye, Sahli, Reinder, & Corneli, 2000), where the author do their eye geture analyi with a mathematial morphology ale-pae approah, forming patio-temporal urve out of ale meaurement tatiti. Figure App-5. Top: Initial image (left) and initial waterhed of the gradient (right). Bottom: Moai image (left) and firt level of hierarhy (right).

263

264 k Appendix I-D Etimator Linear Let u all r λ the vetor of parameter obtained from the image analyi and r µ the vetor of FA parameter for the ynthei oberved by r λ. The uual way to r r ontrut the linear etimator L, whih bet atifie µ = L λ on the training databae, i to find a olution in the leat quare ene. We verify that thi linear etimator i given by ( 0-4 ) T T L = ΜΛ ( ΛΛ ) r r r r where Μ = µ K µ ] and Λ = λ K λ ] are the matrie obtained by [ 1 d 1 [ 1 d onatenating all µ r and λ r vetor from the training et. Neural network Neural network are algorithm inpired on the proeing truture of the brain. They allow omputer to learn a tak from example. Neural network are typially organized in layer. Layer are made up of a number of interonneted node whih ontain an ativation funtion, ee Figure App-6. Mot artifiial neural network, or ANN, ontain ome form of 'learning rule that modifie the weight of the onnetion aording to the input pattern that it i preented with. The mot extenively ued rule i the delta rule. It i often utilized by the mot ommon la of ANN alled 'bakpropagational neural network' (BPNN). Bakpropagation i an abbreviation for the bakward propagation of error. With the delta rule, a with other type of bakpropagation, 'learning' i a upervied proe that our with eah yle or 'epoh' (i.e. eah time the network i preented with a new input pattern) through a forward ativation flow of output, and the bakward error propagation of weight adjutment. More imply, when a neural network i initially preented with a pattern it make a random 'gue' a to what it might be. It then ee how far it anwer wa from the atual one and make an appropriate adjutment to it onnetion weight. More graphially, the proe look like Figure App-7. Information partially taken from the GMSlab univerity of Univerity of Illinoi Urbana - Champaign image are opyrighted.

265 l Bakpropagation perform a gradient deent within the olution' vetor pae toward a 'global minimum' along the teepet vetor of the error urfae. The global minimum i that theoretial olution with the lowet poible error. The error urfae itelf i a hyperparaboloid but i eldom 'mooth' a i depited in Figure App-8. Indeed, in mot problem, the olution pae i quite irregular with numerou 'pit' and hill, whih may aue the network to ettle down in a 'loal minimum, whih i not the bet overall olution. Figure App-6. Pattern are preented to the network via the 'input layer', whih ommuniate to one or more 'hidden layer' where the atual proeing i done via a ytem of weighted 'onnetion'. The hidden layer then link to an 'output layer' where the anwer i output a hown in the graphi below. Sine the nature of the error pae annot be known a priori, neural network analyi often require a large number of individual run to determine the bet olution. Mot learning rule have built-in mathematial term to ait in thi proe, whih ontrol the 'peed' (Beta-oeffiient) and the 'momentum' of the learning. The peed of learning i atually the rate of onvergene between the urrent olution and the global minimum. Momentum help the network to overome obtale (loal minima) in the error urfae and ettle down at or near the global minimum. One a neural network i 'trained' to a atifatory level it may be ued a an analytial tool on other data. To do thi, the uer no longer peifie any training run and intead allow the network to work in forward propagation mode only. New input are preented to the input pattern where they filter into and are proeed by the middle layer a though training were taking plae, however, at thi point the output i retained and no bak propagation our. The output of a forward propagation run i the predited model for the data, whih an then be ued for further analyi and interpretation.

266 m Figure App-7. Note that within eah hidden layer node i a igmoidal ativation funtion that polarize network ativity and help tabilize it. Figure App-8. Graphial interpretation of the earh for a minimum

267

268 o Appendix I-E Fuzzy Logi It i important to point out the ditintion between fuzzy ytem and probability. Both operate over the ame numeri range, and have imilar value: 0.0 repreenting Fale (or not memberhip), and 1.0 repreenting True (or memberhip). However we mut differentiate that probability etablihe the hane for a tatement to be true wherea fuzzy logi deribe the degree of onretene of the tatement itelf. For intane, the entene: There i an 80% hane that Pablo i tall orrepond in fuzzy terminology to Pablo degree of memberhip within the et of tall people i The emanti differene i ignifiant: the firt view uppoe that Pablo i either tall or not tall, and that we only have 80% hane of knowing whih et he i in. Fuzzy terminology uppoe that Pablo i more or le tall, or ome term orreponding to the value of Fuzzy ytem try to gather mathematial tool to repreent natural language, where the onept of Truth and Fale are too extreme and intermediate or more vague interpretation are needed. Let u tate ome formal definition: 1. Let X be a et of objet, with element noted a x. Thu X={x}. 2. A fuzzy et A in X i haraterized by a memberhip funtion ma(x) whih map eah point in X onto the real interval [0.0, 1.0]. A ma(x) approahe 1.0, the grade of memberhip of x in A inreae. 3. A i EMPTY iff for all x: ma(x)= A=B iff for all x: ma(x)=mb(x) [or, ma=mb]. 5. ma =1-mA. 6. A i CONTAINED in B iff ma<=mb. 7. C=A UNION B, where: mc(x)=max(ma(x), mb(x)). 8. C=A INTERSECTION B where: mc(x)=min(ma(x), mb(x)). Beide the bai operation amongt et, fuzzy ytem permit the definition of hedge, or modifier of fuzzy value. Thee operation are provided in an effort to maintain loe tie to natural language, and to allow for the generation of fuzzy tatement through mathematial alulation. A uh, the initial definition of hedge and operation upon them i quite a ubjetive proe and may vary from one appliation to another. Hedge mathematially model the onept of very, omewhat, ort of, and o on. For intane, m very A(x) = ma(x) 2.

269

270 q Appendix I-F Hidden Markov Model The problem arie when wanting to probabilitially model a peifi problem and we only ount on the output to do o. To take advantage of HMM we an do in two way: From output to tate We want to determine the et of internal tate that mot likely gave rie to a partiular equene of output. The Viterbi algorithm i the method ued for olving thi problem. It i lear that for any equene of output and tate the probabiliti weighting an be alulated with no muh diffiulty. But for a given long equene of output, there i an immene number of poible equene of tate to hooe in order to find the mot probable. The Viterbi algorithm help to eae thi earh. If we onider the poibilitie for the firt n tate, we retain not jut the et of tate with the highet weight, but alo the et of tate with the highet weight for all other poibilitie for the tate at time n in addition to the one in the et of tate with overall highet weight. To obtain the et of tate with overall highet weight for the n1 tate, and alo the et of tate with the highet weight for any poible tate at time n1, we only need to onider poibilitie involving the et of tate from time 1 to time n that we previouly retained. From output to model Thi i the mot ompliated of the problem. We aume that the model ha one or more variable parameter in it deription, and we are looking for the value of thoe parameter that would make an oberved equene of output the mot likely. Two major method are ued. One, the egmental K-mean method, obtain an initial approximation to the model, and involve auming that a partiular et of tate aompanie the known output. The other, the Baum-Welh etimation algorithm, i ued to obtain the bet fit of the model to the output equene onidering all poible equene of tate that ould have produed the known output.

271

272 Appendix IV-G ORIGINAL DATA FROM APPLYING THE PUPIL-SEARCH ALGORITHM ON VIDEO SEQUENCE NEON numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

273 t numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

274 u numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

275 v numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

276 w numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

277 x numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

278 y numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

279 z numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

280 aa numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

281 bb numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

282 numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

283 dd numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

284 ee numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL

285 ff numframe In.avi diff X diff Y diff X diff Y x:3% y:3% TOTAL x:5% y:5% TOTAL x:10% y:10% TOTAL StateDiagr am % % % 83.67% 87.69% 89.69% Diffx_1: X L -X R 0: DIFFERENCE FAILS TO BE < % Diffy_1: Y L -Y R 1: DIFFERENCE SUCCEDS TO BE < % Diffx: Diffx_1/WIDTH TOTAL: X% Y% Diffy: Diffy_1/HEIGHT

286 gg Appendix IV-H NEON GRAPH DATA 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult

287 hh 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data from previou reult from previou reult loed loed from previou reult from previou reult

288 ii 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult

289 jj 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data from previou reult from previou reult loed loed loed loed loed loed loed

290 kk 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data loed loed from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult

291 ll 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult loed loed

292 mm 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult

293 nn 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult

294 oo 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult loed loed loed loed loed loed 2

295 pp 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data loed loed loed from previou reult from previou reult from previou reult loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult from previou reult loed loed loed loed loed loed loed loed loed loed loed loed loed loed

296 qq 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed 2

297 rr 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed loed 1

298 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data loed loed loed loed loed loed from previou reult loed loed loed

299 tt 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data loed loed loed loed loed loed loed loed loed loed loed loed loed

300 uu 3% 5% 10% frame # in.avi ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data ORIGIN X Y X Y extra data loed loed loed loed loed Cloed1 & Cloed2: indiate at whih point of the algorithm the loe tate i deteted INCORRECT STATE CORRECT CLOSE STATE DATA ALTHOUGH THE ALGORITHMS DID NOT DETECT IT

301 vv

302 ww Appendix IV-I TEST: Segmentation algorithm (I) Legend: m: male : Cauaian a: Afrian f: female h: Hipani a: Aian OK KO Corret detetion and orret egmentation Neither orret detetion nor orret egmentation ~ Corret detetion but no orret egmentation Not evaluated Video equene Charateriti Lip Teeth Darkne Note 1 f/ OK OK 2 m/ OK OK not well foued 3 m/ OK ~ OK 4 f/ KO KO OK tanned 5 f/ ~ ~ OK tanned / <50% 6 m/ KO KO OK 7 m/h OK ~ ~beard 8 f/ OK KO OK 9 10 f/ KO ~ OK 11 f/ ~ KO ~ tongue / bad quality 12 m/a it i out of trak 13 f/ OK ~ OK 14 m/ ~ KO OK 15 f/h OK OK 16 m/ OK OK ~ 17 m/ OK OK 18 f/h OK OK 19 f/ OK KO OK 20 f/ ~ KO OK tanned/pale lip 21 m/h ~ ~ 22 m/ OK ~ OK 23 m/ OK KO OK 24 m/ ~ KO OK 25 f/ ~ OK 26 f/ OK ~ KO 27 f/ OK ~ OK

303 xx Video equene Charateriti Lip Teeth Darkne Note 28 m/ OK ~ OK 29 m/ OK ~ OK 30 m/ ~ ~ OK 31 m/ OK KO ~ beard 32 m/ OK ~ beard 33 m/ OK ~ ~ moutahe 34 m/ OK ~ OK 35 m/ ~ KO ~ tanned/beard 36 f/ OK KO ~ tanned 37 m/ ~ KO ~ white beard 38 f/ KO KO OK tanned 39 f/ ~ KO OK 40 m/a KO KO KO 41 m/ ~ KO tanned 42 f/ ~ KO OK 43 f/ ~ KO OK 44 f/ OK KO OK 45 f/ OK KO OK 46 f/ OK KO OK tanned/moutahe 47 m/ ~ KO OK 48 f/ OK OK 49 f/ ~ KO OK 50 m/ OK KO OK moutahe 51 m/ OK OK 52 m/ OK KO KO beard 53 m/ OK KO 54 f/ KO KO OK 55 f/ OK OK 56 f/ OK OK f/ OK ~ 59 m/ OK KO 60 m/ OK KO ~ beard 61 m/ ~ KO ~ beard 62 m/ ~ KO KO moutahe

304 yy Tet: egmentation algorithm (II) Video equene Charateriti Lip Teeth Darkne Note 1 f/ OK OK OK 2 m/ OK OK OK not well foued 3 m/ OK OK OK 4 f/ ~ OK OK tanned 5 f/ ~ OK OK tanned / <50% 6 m/ KO KO OK white beard 7 m/h OK ~ ~beard 8 f/ ~ OK OK 9 KO ~ OK 10 f/ ~ OK OK 11 f/ ~ OK OK tongue / bad quality 12 m/a it i out of trak 13 f/ OK OK OK 14 m/ ~ OK OK 15 f/h OK OK very mall 16 m/ ~ ~ OK 17 m/ OK OK 18 f/h OK OK 19 f/ OK OK ~ tanned 20 f/ ~ OK ~ tanned/pale lip 21 m/h ~ OK 22 m/ OK OK OK 23 m/ OK OK OK 24 m/ OK OK OK 25 f/ KO ~ ~ too mall 26 f/ OK OK ~ 27 f/ OK OK OK 28 m/ OK OK OK 29 m/ OK OK OK 30 m/ OK ~ ~ 31 m/ OK OK OK beard 32 m/ OK OK OK beard 33 m/ ~ OK OK moutahe 34 m/ OK OK OK

305 zz Video equene Charateriti Lip Teeth Darkne Note 35 m/ OK OK OK tanned/beard 36 f/ OK OK ~ tanned 37 m/ KO ~ KO white beard 38 f/ KO OK OK tanned 39 f/ ~ OK OK 40 m/a KO KO KO very dark kin 41 m/ ~ OK tanned 42 f/ ~ OK 43 f/ KO ~ OK 44 f/ OK OK OK 45 f/ OK OK OK 46 f/ OK OK OK tanned/moutahe 47 m/ KO ~ OK 48 f/ OK OK 49 f/ OK OK 50 m/ OK OK moutahe 51 m/ OK OK 52 m/ OK OK OK beard 53 m/ OK OK 54 f/ KO KO KO 55 f/ OK ~ OK 56 f/ OK OK f/ OK ~ 59 m/ OK OK 60 m/ OK OK OK beard 61 m/ ~ OK OK beard 62 m/ OK OK OK moutahe

306 aaa Appendix VI-J BIFS Syntax for Fae Animation Fae { expoedfield SFNode fit NULL expoedfield SFNode fdp NULL expoedfield SFNode fap NULL expoedfield SFNode ttsoure NULL expoedfield MFNode renderedfae NULL } NOTE For the binary enoding of thi node ee Doument MPEG-4 NODES A Funtionality and emanti The Fae node i ued to define and animate a fae in the ene. In order to animate the fae with a faial animation tream, it i neeary to link the Fae node to a BIFS-Anim tream. The node hall be aigned a nodeid, through the DEF mehanim. Then, a for any BIFS-Anim tream, an animation mak i ent in the objet deriptor of the BIFS-Anim tream (peifiinfo field). The animation mak point to the Fae node uing it nodeid. The terminal hall then onnet the faial animation deoder to the appropriate Fae node. The FAP field hall ontain a FAP node, deribing the faial animation parameter (FAP). Eah Fae node hall ontain a non-null FAP field. The FDP field, whih define the partiular look of a fae by mean of downloading the poition of fae definition point or an entire model, i optional. If the FDP field i not peified, the default fae model of the terminal hall be ued. The FIT field, when peified, allow a et of FAP to be defined in term of another et of FAP. When thi field i non-null, the terminal hall ue FIT to ompute the maximal et of FAP before uing the FAP to ompute the meh. The ttsoure field hall only be non-null if the faial animation i to determine the faial animation parameter from an audio TTS oure (ee ISO/IEC , etion 6). In thi ae the ttsoure field hall ontain an AudioSoure node and the fae hall be animated uing the phoneme and bookmark reeived from the TTS. renderedfae i the ene graph of the fae after it i rendered (all FAP applied). FaeDefMeh FaeDefMeh { field SFNode faesenegraphnode NULL field MFInt32 intervalborder [] field MFInt32 oordindex [] field MFVe3f diplaement [] } NOTE For the binary enoding of thi node ee Doument MPEG-4 NODES A.1.37.

307 bbb Funtionality and emanti The FaeDefMeh node allow for the deformation of an IndexedFaeSet a a funtion of the amplitude of a FAP a peified in the related FaeDefTable node. The FaeDefMeh node define the piee-wie linear motion trajetorie for vertie of the faesenegraphnode field, whih hall ontain an IndexedFaeSet node. Thi IndexedFaeSet node belong to the enegraph of the faesenegraph field of the FDP node. The intervalborder field peifie interval border for the piee-wie linear approximation in inreaing order. Exatly one interval border hall have the value 0. The oordindex field hall ontain a lit of indie into the Coordinate node of the IndexedFaeSet node peified by the faesenegraphnode field. For eah vertex indexed in the oordindex field, diplaement vetor are given in the diplaement field for the interval defined in the intervalborder field. There mut be exatly (num(intervalborder)-1)*num(oordindex) value in thi field. In mot ae, the animation generated by a FAP annot be peified by updating a Tranform node. Thu, a deformation of an IndexedFaeSet node need to be performed. In thi ae, the FaeDefTable hall define whih IndexedFaeSet are affeted by a given FAP and how the oord field of thee node are updated. Thi i done by mean of table. If a FAP affet an IndexedFaeSet, the FaeDefMeh hall peify a table of the following format for thi IndexedFaeSet: Table 0-1 VERTEX DISPLACEMENTS Vertex no. 1t Interval [I1, I2] 2nd Interval [I2, I3] Index 1 Diplaement D11 Diplaement D12 Index 2 Diplaement D21 Diplaement D22 Exatly one interval border I k mut have the value 0: [I 1, I 2 ], [I 2, I 3 ], [I k-1, 0], [0, I k1 ], [I k1, I k2 ], [I max-1, I max ] During animation, when the terminal reeive a FAP, whih affet one or more IndexedFaeSet of the fae model, it hall piee-wie linearly approximate the motion trajetory of eah vertex of the affeted IndexedFaeSet by uing the appropriate table. Figure 0-9. An arbitrary motion trajetory i approximated a a piee-wie linear one. If P m i the poition of the m th vertex in the IndexedFaeSet in neutral tate (FAP = 0), P m the poition of the ame vertex after animation with the given FAP and D mk the 3D diplaement in the k th interval, the following algorithm hall be applied to determine the new poition P m. Determine, in whih of the interval lited in the table the reeived FAP i lying.

308 If the reeived FAP i lying in the j th interval [Ij, Ij1] and 0=Ik Ij, the new vertex poition P m of the mth vertex of the IndexedFaeSet i given by: P m = FAPU * ((I k1-0) * D m,k (I k2 -I k1 ) * D m, k1 (I j - I j-1 ) * D m, j-1 (FAP-I j ) * D m, j ) P m. (Eq. 1) If FAP > I max, then P m i alulated by uing equation Eq. 1 and etting the index j = max. If the reeived FAP i lying in the jth interval [I j, I j1 ] and I j1 I k =0, the new vertex poition P m i given by: P m = FAPU * (( I j1 - FAP) * D m, j (I j2 - I j1 ) * D m, j1 (I k-1 - I k-2 ) * D m, k- 2 (0 - I k-1 ) * D m, k-1 ) P m (Eq. 2) If FAP < I 1, then P m i alulated by uing equation Eq. 1 and etting the index j1 = 1. If for a given FAP and IndexedFaeSet the table ontain only one interval, the motion i tritly linear: P m = FAPU * FAP * Dm1 Pm. EXAMPLE FaeDefMeh { objetderiptorid UpperLip intervalborder [ -1000, 0, 500, 1000 ] oordindex [ 50, 51] diplaement [1 0 0, , , , , ] } Thi FaeDefMeh define the animation of the meh UpperLip. For the pieewie-linear motion funtion three interval are defined: [-1000, 0], [0, 500] and [500, 1000]. Diplaement are given for the vertie with the indie 50 and 51. The diplaement for the vertex 50 are: (1 0 0), ( ) and ( ), the diplaement for vertex 51 are ( ), ( ) and (2 0 0). Given a FAPValue of 600, the reulting diplaement for vertex 50 would be: diplaement(vertex 50) = 500*( ) T 100 * ( ) T = ( ) T. If the FAPValue i outide the given interval, the boundary interval are extended to I or -I, a appropriate. FaeDefTable { field SFInt32 fapid 0 field SFInt32 highlevelselet 0 expoedfield MFNode faedefmeh [] expoedfield MFNode faedeftranform [] } NOTE For the binary enoding of thi node ee Doument MPEG-4 NODES A Funtionality and emanti The FaeDefTable node define the behavior of a faial animation parameter FAP on a downloaded fae model in faesenegraph by peifying the diplaement vetor for moved vertie inide IndexedFaeSet objet a a funtion of the FAP fapid and/or peifying the value of a field of a Tranform node a a funtion of FAP fapid.

309 ddd The FaeDefTable node i tranmitted diretly after the BIFS bittream of the FDP node. The FaeDefTable lit all FAP that animate the fae model. The FAP animate the downloaded fae model by updating the Tranform or IndexedFaeSet node of the ene graph in faesenegraph. For eah lited FAP, the FaeDefTable node deribe whih node are animated by thi FAP and how they are animated. All FAP that our in the bittream have to be peified in the FaeDefTable node. The animation generated by a FAP an be peified either by updating a Tranform node (uing a FaeDefTranform), or a a deformation of an IndexedFaeSet (uing a FaeDefMeh). The FAPU hall be alulated by the terminal uing the feature point that hall be peified in the FDP. The FAPU are needed in order to animate the downloaded fae model. Semanti The fapid field peifie the FAP, for whih the animation behavior i defined in the faedefmeh and faedeftranform field. If fapid ha value 1 or 2, the highlevelselet field peifie the type of vieme or expreion. In other ae thi field ha no meaning and hall be ignored. The faedefmeh field hall ontain a FaeDefMeh node. The faedeftranform field hall ontain a FaeDefTranform node. FaeDefTranform { field SFNode faesenegraphnode NULL field SFInt32 fieldid 1 field SFRotation rotationdef 0, 0, 1, 0 field SFVe3f aledef 1, 1, 1 field SFVe3f tranlationdef 0, 0, 0 } NOTE For the binary enoding of thi node ee Doument MPEG-4 NODES A Funtionality and emanti The FaeDefTranform node define whih field (rotation, ale or tranlation) of a Tranform node (faesenegraphnode) of faesenegraph (defined in an FDP node) i updated by a faial animation parameter, and how the field i updated. If the fae i in it neutral poition, the faesenegraphnode ha it tranlation, ale, and rotation field et to the neutral value (0,0,0) T, (1,1,1) T, (0,0,1,0), repetively. The faesenegraphnode field peifie the Tranform node for whih the animation i defined. The node hall be part of faesenegraph a defined in the FDP node. The fieldid field peifie whih field in the Tranform node, peified by the faesenegraphnode field, i updated by the FAP during animation. Poible field are tranlation, rotation, ale. If fieldid==1, rotation hall be updated uing rotationdef and FAPValue. If fieldid==2, ale hall be updated uing aledef and FAPValue. If fieldid==3, tranlation hall be updated uing tranlationdef and FAPValue.

310 eee The rotationdef field i of type SFRotation. With rotationdef=(r x,r y,r z,θ), the new value of the rotation field of the Tranform node faesenegraphnode i: rotation: =(r x,r y,r z,θ*fapvalue*au) [AU i defined in ISO/IEC FCD ] The aledef field i of type SFVe3f. The new value of the ale field of the Tranform node faesenegraphnode i: ale:= FAPValue*aleDef The tranlationdef field i of type SFVe3f. The new value of the tranlation field of the Tranform node faesenegraphnode i: tranlation:= FAPValue*tranlationDef FAP { ExpoedField SFNode vieme NULL ExpoedField SFNode expreion NULL expoedfield SFInt32 open_jaw I expoedfield SFInt32 lower_t_midlip I expoedfield SFInt32 raie_b_midlip I expoedfield SFInt32 treth_l_orner I expoedfield SFInt32 treth_r_orner I expoedfield SFInt32 lower_t_lip_lm I expoedfield SFInt32 lower_t_lip_rm I expoedfield SFInt32 lower_b_lip_lm I expoedfield SFInt32 lower_b_lip_rm I expoedfield SFInt32 raie_l_ornerlip I expoedfield SFInt32 raie_r_ornerlip I expoedfield SFInt32 thrut_jaw I expoedfield SFInt32 hift_jaw I expoedfield SFInt32 puh_b_lip I expoedfield SFInt32 puh_t_lip I expoedfield SFInt32 depre_hin I expoedfield SFInt32 loe_t_l_eyelid I expoedfield SFInt32 loe_t_r_eyelid I expoedfield SFInt32 loe_b_l_eyelid I expoedfield SFInt32 loe_b_r_eyelid I expoedfield SFInt32 yaw_l_eyeball I expoedfield SFInt32 yaw_r_eyeball I expoedfield SFInt32 pith_l_eyeball I expoedfield SFInt32 pith_r_eyeball I expoedfield SFInt32 thrut_l_eyeball I expoedfield SFInt32 thrut_r_eyeball I expoedfield SFInt32 dilate_l_pupil I expoedfield SFInt32 dilate_r_pupil I expoedfield SFInt32 raie_l_i_eyebrow I expoedfield SFInt32 raie_r_i_eyebrow I expoedfield SFInt32 raie_l_m_eyebrow I expoedfield SFInt32 raie_r_m_eyebrow I expoedfield SFInt32 raie_l_o_eyebrow I expoedfield SFInt32 raie_r_o_eyebrow I expoedfield SFInt32 queeze_l_eyebrow I expoedfield SFInt32 queeze_r_eyebrow I

311 fff } expoedfield SFInt32 puff_l_heek I expoedfield SFInt32 puff_r_heek I expoedfield SFInt32 lift_l_heek I expoedfield SFInt32 lift_r_heek I expoedfield SFInt32 hift_tongue_tip I expoedfield SFInt32 raie_tongue_tip I expoedfield SFInt32 thrut_tongue_tip I expoedfield SFInt32 raie_tongue I expoedfield SFInt32 tongue_roll I expoedfield SFInt32 head_pith I expoedfield SFInt32 head_yaw I expoedfield SFInt32 head_roll I expoedfield SFInt32 lower_t_midlip_o I expoedfield SFInt32 raie_b_midlip_o I expoedfield SFInt32 treth_l_ornerlip I expoedfield SFInt32 treth_r_ornerlip_o I expoedfield SFInt32 lower_t_lip_lm_o I expoedfield SFInt32 lower_t_lip_rm_o I expoedfield SFInt32 raie_b_lip_lm_o I expoedfield SFInt32 raie_b_lip_rm_o I expoedfield SFInt32 raie_l_ornerlip_o I expoedfield SFInt32 raie_r_ornerlip_o I expoedfield SFInt32 treth_l_noe I expoedfield SFInt32 treth_r_noe I expoedfield SFInt32 raie_noe I expoedfield SFInt32 bend_noe I expoedfield SFInt32 raie_l_ear I expoedfield SFInt32 raie_r_ear I expoedfield SFInt32 pull_l_ear I expoedfield SFInt32 pull_r_ear I NOTE For the binary enoding of thi node ee Doument MPEG-4 NODES A Funtionality and emanti Thi node define the urrent look of the fae by mean of expreion and FAP and give a hint to TTS ontrolled ytem on whih vieme to ue. For a definition of the faial animation parameter ee ISO/IEC , Annex C. The vieme field hall ontain a Vieme node. The expreion field hall ontain an Expreion node. The emanti for the remaining field are deribed in the ISO/IEC , Annex C and in partiular in Table C-1. A FAP of value I hall be interpreted a indiating that the partiular FAP i uninitialized. FDP { expoedfield SFNode featurepointcoord NULL expoedfield SFNode texturecoord NULL expoedfield SFBool ueorthotexture FALSE ExpoedField MFNode faedeftable [] ExpoedField MFNode faesenegraph []

312 ggg } NOTE For the binary enoding of thi node ee Doument MPEG-4 NODES A Funtionality and emanti The FDP node define the fae model to be ued at the terminal. Two option are upported: 1. If faedeftable i NULL, alibration information i downloaded, o that the proprietary fae of the terminal an be alibrated uing faial feature point and, optionally, the texture information. In thi ae, the featurepointcoord field hall be et. featurepointcoord ontain the oordinate of faial feature point, a defined in ISO/IEC , Annex C, Figure C-1, orreponding to a neutral fae. If a oordinate of a feature point i et to I, the oordinate of thi feature point hall be ignored. The texturecoord field, if et, i ued to map a texture on the model alibrated by the feature point. The texturecoord point orrepond to the feature point. That i, eah defined feature point hall have orreponding texture oordinate. In thi ae, the faesenegraph hall ontain exatly one texture image, and any geometry it might ontain hall be ignored. The terminal hall interpret the feature point, texture oordinate, and the faesenegraph in the following way: Feature point of the terminal fae model hall be moved to the oordinate of the feature point upplied in featurepointcoord, unle a feature point i to be ignored, a explained above. If texturecoord i et, the texture upplied in the faesenegraph hall be mapped onto the terminal' default fae model. The texture oordinate are derived from the texture oordinate of the feature point upplied in texturecoord. The ueorthotexture field provide a hint to the deoding terminal that, when FALSE, indiate that the texture image i bet obtained by ylindrial projetion of the fae. If ueorthotexture i TRUE, the texture image i bet obtained by orthographi projetion of the fae. 2. A fae model a deribed in the faesenegraph i deoded. Thi fae model replae the terminal' default fae model in the terminal. The faesenegraph hall ontain the fae in it neutral poition (all FAP = 0). If deired, the faesenegraph hall ontain the texture map of the fae. The funtion defining the way in whih the faesenegraph hall be modified, a a funtion of the FAP, hall alo be deoded. Thi information i deribed by faedeftable that define how the faesenegraph i to be modified a a funtion of eah FAP. By mean of faedeftable, IndexedFaeSet and Tranform node of the faesenegraph an be animated. Sine the amplitude of FAP i defined in unit that are dependent on the ize of the fae model, the featurepointcoord field define the poition of faial feature on the urfae of the fae deribed by faesenegraph. From the loation of thee feature point, the terminal ompute the unit of the FAP. Generally, only two node type in the ene graph of a deoded fae model are affeted by FAP: IndexedFaeSet and Tranform node. If a FAP aue a deformation of an objet (e.g. lip trething), then the oordinate poition in the affeted IndexedFaeSet hall be updated. If a FAP aue a movement whih an be deribed with a Tranform node (e.g. FAP 23, yaw_l_eyeball), then the appropriate field in thi Tranform node hall be updated. It hall be aumed that thi Tranform node ha it rotation, ale, and tranlation field et to neutral value if the fae i in it neutral poition. A unique nodeid hall be aigned via the DEF tatement to all IndexedFaeSet and Tranform node whih are affeted by FAP o that they an be aeed unambiguouly during animation. The featurepointcoord field hall ontain a Coordinate node that peifie feature point for the alibration of the terminal' default fae. The oordinate are peified in the point field of the Coordinate node in the preribed order, that a feature point with a lower label number i lited before a feature point with a higher label number.

313 hhh EXAMPLE Feature point 3.14 before feature point 4.1 The texturecoord field hall ontain a Coordinate node that peifie texture oordinate for the feature point. The oordinate are lited in the point field in the Coordinate node in the preribed order, that a feature point with a lower label i lited before a feature point with a higher label. The ueorthotexture field may ontain a hint to the terminal a to the type of texture image, in order to allow better interpolation of texture oordinate for the vertie that are not feature point. If ueorthotexture i FALSE, the terminal may aume that the texture image wa obtained by ylindrial projetion of the fae. If ueorthotexture i 1, the terminal may aume that the texture image wa obtained by orthographi projetion of the fae. The faedeftable field hall ontain FaeDefTable node. The behavior of FAP i defined in thi field for the fae in faesenegraph. The faesenegraph field hall ontain a Group node. In the ae of option 1 (above), thi may be ued to ontain a texture image a deribed above. In the ae of option 2, thi hall be the grouping node for the fae model rendered in the ompoitor and hall ontain the fae model. In thi ae, the effet of faial animation parameter i defined in the faedeftable field. FIT { expoedfield MFInt32 FAP [] expoedfield MFInt32 graph [] expoedfield MFInt32 numeratorterm [] expoedfield MFInt32 denominatorterm [] expoedfield MFInt32 numeratorexp [] expoedfield MFInt32 denominatorexp [] expoedfield MFInt32 numeratorimpule [] expoedfield MFFloat numeratorcoef [] expoedfield MFFloat denominatorcoef [] } NOTE For the binary enoding of thi node ee Doument MPEG-4 NODES A Funtionality and emanti The FIT node allow a maller et of FAP to be ent during a faial animation. Thi mall et an then be ued to determine the value of other FAP, uing a rational polynomial mapping between parameter. In a FIT node, rational polynomial are ued to peify interpolation funtion. EXAMPLE The top inner lip FAP an be ent and then ued to determine the top outer lip FAP. Another example i that only vieme and/or expreion FAP are ent to drive the fae. In thi ae, low-level FAP are interpolated from thee two highlevel FAP. To make the heme general, et of FAP are peified, along with a FAP interpolation graph (FIG) between the et that peifie whih et are ued to determine whih other et. The FIG i a graph with direted link. Eah node ontain a et of FAP. Eah link from a parent node to a hild node indiate that the FAP in the

314 iii hild node an be interpolated from the parent node. Expreion (FAP#1) or Vieme (FAP #2) and their field hall not be interpolated from other FAP. In a FIG, a FAP may appear in everal node, and a node may have multiple parent. For a node that ha multiple parent node, the parent node are ordered a 1t parent node, 2nd parent node, et. During the interpolation proe, if thi hild node need to be interpolated, it i firt interpolated from 1t parent node if all FAP in that parent node are available. Otherwie, it i interpolated from 2nd parent node, and o on. An example of FIG i hown in expreion (1) lower_t_midlip 1 1 (2) raie_b_midlip (3) top_inner_lip FAP (4) 1 1 top_outer_lip FAP (5) bottom_inner_lip FAP (6) 1 1 bottom_outer_lip FAP (7) Figure Eah node ha a nodeid. The numerial label on eah inoming link indiate the order of thee link. expreion (1) lower_t_midlip (2) 1 1 raie_b_midlip (3) 2 top_inner_lip FAP (4) top_outer_lip FAP (5) bottom_inner_lip FAP (6) bottom_outer_lip FAP (7) Figure A FIG example The interpolation proe baed on the FAP interpolation graph i deribed uing peudo-c ode a follow: do { interpolation_ount = 0; for (all Node_i) { // from Node_1 to Node_N for (ordered Node_i parent Node_k) { if (FAP in Node_i need interpolation and FAP in Node_k have been interpolated or are available) {

315 jjj funtion interpolate Node_i from Node_k; //uing interpolation interpolation_ount ; break; } } } } while (interpolation_ount!= 0); // table here Eah direted link in a FIG i a et of interpolation funtion. Suppoe F 1, F 2,, F n are the FAP in a parent et and f 1, f 2,, f m are the FAP in a hild et. Then, there are m interpolation funtion denoted a: f 1 = I 1 (F 1, F 2,, F n ) f 2 = I 2 (F 1, F 2,, F n ) f m = I m (F 1, F 2,, F n ) Eah interpolation funtion I k () i in a rational polynomial form if the parent node doe not ontain vieme FAP or expreion FAP. I ( F K 1 n P1 n l ij mij 1, F2,..., Fn ) = ( i F j ) ( bi F j ) i = 0 j = 1 i = 0 j = 1 Otherwie, an impule funtion i added to eah numerator polynomial term to allow eletion of expreion or vieme. K 1 n i= 0 i j= 1 P1 I( F1, F2,..., Fn ) = δ ( F ai )( i F ij j ) ( b ij i F j ) l n i= 0 j= 1 In both equation, K and P are the number of polynomial produt, i and bi are the oeffiient of the ith produt. l ij and m ij are the power of F j in the ith produt. An impule funtion equal 1 when F i = ai, otherwie, equal 0. F an only be vieme_elet1, i vieme_elet2, expreion_elet1, and expreion_elet2. a i i an integer that range from 0 to 6 when F i expreion_elet1 or expreion_elet2, range 0 to 14 when i F i i vieme_elet1 or vieme_elet2. The enoder hall end an interpolation funtion table whih ontain K, P, a i, i, i, b i, l ij, m ij to the terminal. To aid in the explanation below, it i aumed that there are N different et of FAP with index 1 to N, and that eah et ha n i, i=1,..,n parameter. It i alo aumed that there are L direted link in the FIG and that eah link point from the FAP et with index P i to the FAP et with index C i, for i = 1,.., L The FAP field hall ontain a lit of FAP-indie peifying whih animation parameter form et of FAP. Eah et of FAP indie i terminated by 1. There hall be a total of N n 1 n 2 n N number in thi field, with N of them being 1. FAP#1 to FAP#68 are of indie 1 to 68. Field of the Vieme FAP (FAP#1), namely, vieme_elet1, vieme_elet2, vieme_blend, are of indie from 69 to 71. Field of the Expreion FAP (FAP#2), namely, expreion_elet1, expreion_elet2, expreion_intenity1, expreion_intenity2 are of indie from 72 to 75. When the parent node ontain a Vieme FAP, three indie, 69, 70, 71, hall be inluded in the m

316 kkk node (but not index 1). When a parent node ontain an Expreion FAP, four indie, 72,73,74,75, hall be inluded in the node (but not index 2). The graph field hall ontain a lit of pair of integer, peifying a direted link between et of FAP. The integer refer to the indie of the et peified in the FAP field, and thu range from 1 to N. When more than one diret link terminate at the ame et, that i, when the eond value in the pair i repeated, the link have preedene determined by their order in thi field. Thi field hall have a total of 2L number, orreponding to the direted link between the parent and hildren in the FIG. The numeratorterm field hall be a lit ontaining the number of term in the polynomial of the numerator of the rational funtion ued to interpolate parameter value. Eah element in the lit orrepond to K in equation 1 above). Eah link i (that i, the ith integer pair) in the graph field mut have n Ci value peified, one for eah hild FAP. The order in the numeratorterm lit hall orrepond to the order of the link in the graph field and the order that the hild FAP appear in the FAP field. There hall be n C1 n C2 n CL number in thi field. The denominatorterm field hall ontain a lit of the number of term in the polynomial of the denominator of the rational funtion ontrolling the parameter value. Eah element in the lit orrepond to P in equation 1. Eah link i (that i, the ith integer pair) in the graph field mut have n Ci value peified, one for eah hild FAP. The order in the denominatorterm lit orrepond to the order of the link in the graph field and the order that the hild FAP appear in the FAP field. There hall be n C1 n C2 n CL number in thi field. The numeratorimpule field hall ontain a lit of impule funtion in the numerator of the rational funtion for link with the Vieme or Expreion FAP in parent node. Thi lit orrepond to the δ ( F i ai ). Eah entry in the lit i ( i, a i ). The numeratorexp field hall ontain a lit of exponent of the polynomial term in the numerator of the rational funtion ontrolling the parameter value. Thi lit orrepond to l ij. For eah hild FAP in eah link i, n Pi *K value need to be peified. The order in the numeratorexp lit hall orrepond to the order of the link in the graph field and the order that the hild FAP appear in the FAP field. NOTE K may be different for eah hild FAP. The denominatorexp field hall ontain a lit of exponent of the polynomial term of the denominator of the rational funtion ontrolling the parameter value. Thi lit orrepond to m ij. For eah hild FAP in eah link i, n Pi *P value need to be peified. The order in the denominatorexp lit hall orrepond to the order of the link in the graph field and the order that the hild FAP appear in the FAP field. NOTE P may be different for eah hild FAP. The numeratorcoef field hall ontain a lit of oeffiient of the polynomial term of the numerator of the rational funtion ontrolling the parameter value. Thi lit orrepond to i. The lit hall have K term for eah hild parameter that appear in a

317 lll link in the FIG, with the order in numeratorcoef orreponding to the order in graph and FAP. NOTE K i dependent on the polynomial, and i not a fixed ontant. The denominatorcoef field hall ontain a lit of oeffiient of the polynomial term in the numerator of the rational funtion ontrolling the parameter value. Thi lit orrepond to b i. The lit hall have P term for eah hild parameter that appear in a link in the FIG, with the order in denominatorcoef orreponding to the order in graph and FAP. NOTE P i dependent on the polynomial, and i not a fixed ontant. EXAMPLE Suppoe a FIG ontain four node and 2 link. Node 1 ontain FAP#3, FAP#3, FAP#5. Node 2 ontain FAP#6, FAP#7. Node 3 ontain an expreion FAP, whih mean ontain FAP#72, FAP#73, FAP#74, and FAP#75. Node 4 ontain FAP#12 and FAP#17. Two link are from node 1 to node 2, and from node 3 to node 4. For the firt link, the interpolation funtion are F 6 = ( F3 2F4 3F5 4F3 F4 ) /(5F5 6F3 F4 F5 ) F 7 = F 4. For the eond link, the interpolation funtion are F F 2 = δ ( F72 6)(0.6F74 ) δ ( F73 6)(0.6 75) 12 F = δ ( F72 6)( 1.5F74 ) δ ( F73 6)( ) 17 F The eond link imply ay that when the expreion i urprie (FAP#72=6 or FAP#73=6), for FAP#12, the value i 0.6 time of expreion intenity FAP#74 or FAP#75; for FAP#17, the value i 1.5 tim of FAP#74 or FAP#75. After the FIT node given below, we explain eah field eparately. FIT { FAP [ ] graph [ ] numeratorterm [ ] denominatorterm [ ] numeratorexp [ ] denominatorexp [ ] numeratorimpule [ ] numeratorcoef [ ] denominatorcoef [ ] } FAP [ ] Four et of FAP are defined, the firt with FAP number 3, 4, and 5, the eond with FAP number 6 and 7, the third with FAP number 72, 73, 74, 75, and the fourth with FAP number 12, 17. graph [ ].

318 mmm The firt et i made to be the parent of the eond et, o that FAP number 6 and 7 will be determined by FAP 3, 4, and 5. Alo, the third et i made to be the parent of the fourth et, o that FAP number 12 and 17 will be determined by FAP 72, 73, 74, and 75. numeratorterm [ ] The rational funtion that define F6 and F7 are eleted to have 4 and 1 term in their numerator, repetively. Alo, the rational funtion that define F12 and F17 are eleted to have 2 and 2 term in their numerator, repetively. denominatorterm [ ] The rational funtion that define F6 and F7 are eleted to have 2 and 1 term in their denominator, repetively. Alo, the rational funtion that define F12 and F17 are eleted to both have 1 term in their denominator. numeratorexp [ ] The numerator eleted for the rational funtion defining F6 i F3 2F4 3 F5 4F3F42. There are 3 parent FAP, and 4 term, leading to 12 exponent for thi rational funtion. For F7, the numerator i jut F4, o there are three exponent only (one for eah FAP). Value for F12 and F17 are derived in the ame way. denominatorexp [ ] The denominator eleted for the rational funtion defining F6 i 5F5 6F3F4F5, o there are 3 parent FAP and 2 term and hene, 6 exponent for thi rational funtion. For F7, the denominator i jut 1, o there are three exponent only (one for eah FAP). Value for F12 and F17 are derived in the ame way. numeratorimpule [ ] δ ( F For the eond link, all four numerator polynomial term ontain impule funtion 72 6) or δ ( F 73 6). numeratorcoef [ ] There i one oeffiient for eah term in the numerator of eah rational funtion. denominatorcoef [ ] There i one oeffiient for eah term in the denominator of eah rational funtion.

319 nnn

320 ooo Appendix VI-K Table K-2 FAP DEFINITIONS, GROUP ASSIGNMENTS AND STEP SIZES # FAP name FAP deription unit Uni-or Bidir Po motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 1 Vieme Set of value determining the mixture of two vieme for thi frame (e.g. pbm, fv, th) na na na 1 na 1 vieme_ blend: 63 vieme _blend: expreion A et of value determining the mixture of two faial expreion 3 open_jaw Vertial jaw diplaement (doe not affet mouth opening) 4 lower_t_midlip Vertial top middle inner lip diplaement 5 raie_b_midlip Vertial bottom middle inner lip diplaement 6 treth_l_ornerli p 7 treth_r_ornerli p Horizontal diplaement of left inner lip orner Horizontal diplaement of right inner lip orner 8 lower_t_lip_lm Vertial diplaement of midpoint between left orner and middle of top inner lip 9 lower_t_lip_rm Vertial diplaement of midpoint between right orner and middle of top inner lip Na na na 1 na 1 exprei on_inte nity1, exprei on_inte nity2: 63 exprei on_inte nity1, exprei on_inte nity2: -63 MNS U down MNS B down MNS B up MW B left MW B right MNS B down MNS B down

321 ppp # FAP name FAP deription unit Uni-or Bidir Po motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 10 raie_b_lip_lm Vertial diplaement of midpoint between left orner and middle of bottom inner lip 11 raie_b_lip_rm Vertial diplaement of midpoint between right orner and middle of bottom inner lip 12 raie_l_ornerlip Vertial diplaement of left inner lip orner 13 raie_r_ornerlip Vertial diplaement of right inner lip orner MNS B up MNS B up MNS B up MNS B up thrut_jaw Depth diplaement of jaw MNS U forw ard hift_jaw Side to ide diplaement of jaw MW B right puh_b_lip Depth diplaement of bottom middle lip 17 puh_t_lip Depth diplaement of top middle lip MNS B forw ard MNS B forw ard depre_hin Upward and ompreing movement of the hin (like in adne) MNS B up loe_t_l_eyelid Vertial diplaement of top left eyelid 20 loe_t_r_eyelid Vertial diplaement of top right eyelid IRIS D IRIS D B down B down

322 qqq # FAP name FAP deription unit Uni-or Bidir Po motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 21 loe_b_l_eyelid Vertial diplaement of bottom left eyelid 22 loe_b_r_eyelid Vertial diplaement of bottom right eyelid IRIS D IRIS D B up B up yaw_l_eyeball Horizontal orientation of left eyeball 24 yaw_r_eyeball Horizontal orientation of right eyeball 25 pith_l_eyeball Vertial orientation of left eyeball 26 pith_r_eyeball Vertial orientation of right eyeball AU B left 3 na AU B left 3 na AU B down 3 na AU B down 3 na thrut_l_eyeball Depth diplaement of left eyeball 28 thrut_r_eyeball Depth diplaement of right eyeball ES B forwa rd ES B forwa rd 3 na na dilate_l_pupil Dilation of left pupil 30 dilate_r_pupil Dilation of right pupil 31 raie_l_i_eyebrow Vertial diplaement of left inner eyebrow 32 raie_r_i_eyebrow Vertial diplaement of right inner eyebrow 33 raie_l_m_eyebro w 34 raie_r_m_eyebro w Vertial diplaement of left middle eyebrow Vertial diplaement of right middle eyebrow IRIS D IRIS D B B growi ng growi ng ENS B up ENS B up ENS B up ENS B up

323 rrr # FAP name FAP deription unit Uni-or Bidir Po motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 35 raie_l_o_eyebrow Vertial diplaement of left outer eyebrow 36 raie_r_o_eyebro w 37 queeze_l_eyebro w 38 queeze_r_eyebro w Vertial diplaement of right outer eyebrow Horizontal diplaement of left eyebrow Horizontal diplaement of right eyebrow 39 puff_l_heek Horizontal diplaement of left heek 40 puff_r_heek Horizontal diplaement of right heek 41 lift_l_heek Vertial diplaement of left heek 42 lift_r_heek Vertial diplaement of right heek 43 hift_tongue_tip Horizontal diplaement of tongue tip 44 raie_tongue_tip Vertial diplaement of tongue tip 45 thrut_tongue_tip Depth diplaement of tongue tip 46 raie_tongue Vertial diplaement of tongue 47 tongue_roll Rolling of the tongue into U hape 48 head_pith Head pith angle from top of pine ENS B up ENS B up ES B right ES B left ES B left ES B right ENS U up ENS U up MW B right MNS B up MW B forw ard MNS B up AU U ona ve upwar d 6 3, AU B down 7 na

324 # FAP name FAP deription unit Uni-or Bidir Po motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 49 head_yaw Head yaw angle from top of pine 50 head_roll Head roll angle from top of pine 51 lower_t_midlip _o Vertial top middle outer lip diplaement 52 raie_b_midlip_o Vertial bottom middle outer lip diplaement 53 treth_l_ornerli p_o 54 treth_r_ornerli p_o Horizontal diplaement of left outer lip orner Horizontal diplaement of right outer lip orner 55 lower_t_lip_lm _o Vertial diplaement of midpoint between left orner and middle of top outer lip 56 lower_t_lip_rm _o Vertial diplaement of midpoint between right orner and middle of top outer lip 57 raie_b_lip_lm_o Vertial diplaement of midpoint between left orner and middle of bottom outer lip 58 raie_b_lip_rm_o Vertial diplaement of midpoint between right orner and middle of bottom outer lip AU B left 7 na AU B right 7 na MNS B down MNS B up MW B left MW B right MNS B down MNS B down MNS B up MNS B up

325 ttt # FAP name FAP deription unit Uni-or Bidir Po motion Grp FDP ubgrp num Quant tep ize Min/Max I- Frame quantized value Min/Max P- Frame quantized value 59 raie_l_ornerlip_ o 60 raie_r_ornerlip _o Vertial diplaement of left outer lip orner Vertial diplaement of right outer lip orner 61 treth_l_noe Horizontal diplaement of left ide of noe 62 treth_r_noe Horizontal diplaement of right ide of noe 63 raie_noe Vertial diplaement of noe tip 64 bend_noe Horizontal diplaement of noe tip 65 raie_l_ear Vertial diplaement of left ear 66 raie_r_ear Vertial diplaement of right ear 67 pull_l_ear Horizontal diplaement of left ear 68 pull_r_ear Horizontal diplaement of right ear MNS B up MNS B up ENS B left ENS B right ENS B up ENS B right ENS B up ENS B up ENS B left ENS B right

326 uuu Table K-3 FAP GROUPING Group Number of FAP 1: vieme and expreion 2 2: jaw, hin, inner lowerlip, ornerlip, midlip 16 3: eyeball, pupil, eyelid 12 4: eyebrow 8 5: heek 4 6: tongue 5 7: head rotation 3 8: outer lip poition 10 9: noe 4 10: ear 4 Table K-4 Expreion _elet Expreion name VALUES FOR EXPRESSION_SELECT Textual deription 0 na Na 1 joy The eyebrow are relaxed. The mouth i open and the mouth orner pulled bak toward the ear. 2 adne The inner eyebrow are bent upward. The eye are lightly loed. The mouth i relaxed. 3 anger The inner eyebrow are pulled downward and together. The eye are wide open. The lip are preed againt eah other or opened to expoe the teeth. 4 fear The eyebrow are raied and pulled together. The inner eyebrow are bent upward. The eye are tene and alert. 5 digut The eyebrow and eyelid are relaxed. The upper lip i raied and urled, often aymmetrially. 6 urprie The eyebrow are raied. The upper eyelid are wide open, the lower relaxed. The jaw i opened.

327 vvv Table K-5 VALUES FOR VISEME_SELECT Vieme_elet Phoneme Example 0 none na 1 p, b, m put, bed, mill 2 f, v far, voie 3 T,D think, that 4 t, d tip, doll 5 k, g all, ga 6 ts, dz, S hair, join, he 7, z ir, zeal 8 n, l lot, not 9 r red 10 A: ar 11 e bed 12 I tip 13 Q top 14 U book

328

329

330 Bibliographial Referene

331

332 III A--G Ahlberg, J. (2002). An ative model for faial feature traking. Euraip Journal on Applied Signal Proeing, Vol. 6, pp Al-Qayedi, A. M.,& Clark, A. (2000). Contant-rate eye traking and animation for modelbaed-oded video. Proeeding of the IEEE International Conferene on Aouti Speeh and Signal Proeing. André del Valle, A. C& Dugelay, J.-L.(2002) Online fae analyi: oupling head poe-traking with fae expreion analyi [Tehnial demo] ACM Multimedia. André del Valle, A. C& Dugelay, J.-L.(2003) Making mahine undertand faial motion and expreion like human do. 10 th International Conferene in Human-Computer Interation,, HCI Theory and Pratie (Part II), Vol. 2, pp Avaro, Bao, Caner, Civanlar, Gentri, Herpel, et al. (2001) RTP payload format for MPEG-4 tream [work in progre]. IETF: draft-gentri-avt mpeg4-multisl-03.txt Bartlett, M. S. (2001). Fae image analyi by unupervied learning. Boton: Kluwer Aademi Publiher. Bartlett, M. S., Braathen, B. LittleWort-Ford, G., Herhey, J., Fael, I., Mark, T., Smith, E., Sejnowki, T. J., & Movellan, J. R. (2001). Automati analyi of pontaneou faial behavior: A final projet report. (Teh. Rep. No ). San Diego, CA: Univerity of California, San Diego, MPLab. Bale B., & Blake A. (1998). Separability of poe and expreion in faial traking and animation. Proeeding of the 6th International Conferene on Computer Viion, Benede, O. (1999). Watermarking of 3-D polygon baed model with robutne againt meh implifiation. Proeeding of SPIE: Seurity and Watermarking of Multimedia Content, pp Blak, M. J., & Yaoob, Y. (1997). Reognizing faial expreion in image equene uing loal parameterized model of image motion. International Journal of Computer Viion, 25(1), Breton, G. (2002). Animation de viage 3D parlant pour nouveaux IHM et ervie de téléommuniation [Animation of 3D talking head for new HCI and teleom ervie]. (Dotoral diertation. Univerité de Renne 1, 2002). Brulé, J. F. (1985). Fuzzy ytem A tutorial. Retrieved Otober, 1, 2002, from autinlink.om/fuzzy/tutorial

333 IV Chen, K., & Kambhamettu, C. (1997). Real-time faial animation through the Internet. (Teh. Rep. No ). DE: Univerity of Delaware, Department of Computer and Information Siene. Chen, L. S., & Huang, T. S. (2000). Emotional expreion in audioviual human omputer interation. Proeeding of IEEE the International Conferene on Multimedia and Expo. Chen, T. (January, 2001). Audioviual peeh proeing. Lip reading and lip ynhronization. IEEE Signal Proeing Magazine,pp Chou, J.-C., Chang, Y.-J., & Chen, Y.-C. (2001). Faial feature point traking ad expreion analyi for virtual teleonferening ytem. Proeeding of the International Conferene on Multimedia and Expo, pp Cordea, M. D., Petriu, E. M., Georgana, N. D., Petriu, D. C., & Whalen, T. E. (2001). 3D head poe reovery for interative virtual reality avatar. Proeeding of the IEEE Intrumentation and Meaurement Tehnology Conferene Curinga, S. (1998). Ue of a tatitial model for lip ynthei. IEEE. Cyberware Home Page (2003). Retrieved July, 3 rd 2003 from: Debeve, P., Hawkin, T., Thou, C., Duiker, H.-P., Sarokin, W., & Sagar, M. (2000). Aquiring the refletane field of a human fae. Proeeding of SIGGRAPH 2000, ACM Pre/ACM SIGGRAPH/Addion Weley Longman. DeCarlo, D., & Metaxa, D. (1996). The integration of optial flow and deformable model with appliation to human fae hape and motion etimation. Proeeding of the IEEE Conferene on Computer Viion and Pattern Reognition, DiretX Home Page (2003). Mirooft webite about DiretX. Retrieved June, 16 th 2003 from: Dubu, C., & Budreau, D. (2001, Deember). The deign and imulated performane of a mobile video telephony appliation for atellite third-generation wirele ytem. IEEE Tranation on Multimedia, 3(4). Dugelay, J.-L., & André del Valle, A. C. (2001) Analyi-ynthei ooperation for MPEG-4 realiti lone animation. Proeeding of the ICAV3D.

334 V D--I Dugelay, J.-L., Fintzel, K., & Valente, S. (1999). Syntheti Natural Hybrid video proeing for virtual teleonferening ytem. Piture Coding Sympoium Dugelay, J.-L., Garia, E., & Mallauran, C. (2002). Protetion of 3D objet uage through texture watermarking. Proeeding of the XI European Signal Proeing Conferene Eiert, P., & Girod, B. (1998). Analyzing faial expreion for virtual onferening. Proeeding of the IEEE Computer Graphi & Appliation, Eiert, P., & Girod, B. (2002). Model-baed enhanement of lighting ondition in image equene. Proeeding of the Viual Communiation and Image Proeing Ekman, P., & Frieen, W. V. (1978). The faial ation oding ytem. Palo Alto, Ca.: Invetigator Guide, Conulting Pyhologit Pre. Ea, I., Bau, S., Darrel, T., & Pentland, A. (1996). Modeling, traking and interative animation of fae and head uing input from video. Proeeding of Computer Animation. Eveno, N., Caplier, A. & Coulon, P. Y. (2002) Key point baed egmentation of lip. IEEE International Conferene on Multimedia and Expo. Eveno, N., Caplier, A., & Coulon, P. Y. (2001). A new olor tranformation for lip egmentation. Workhop on Multimedia Signal Proeing. Eye movement in HCI. (2003) Regular eion at the 5 th Int. Conf. on Engineering Pyhology and Cognitive Ergonomi. 10 th International Conferene in Human-Computer Interation, Vol. 3. Ezzat, T., & Poggio, T. (1996a). Faial analyi and ynthei uing image baed model. Proeeding of the Seond International Conferene on Automati Fae and Geture Reognition. Ezzat, T., & Poggio, T. (1996b). Faial analyi and ynthei uing image baed model. Proeeding of the Workhop on the Algorithm Foundation of Roboti. Fellenz, W. A., Taylor, J. G., Cowie, R., Dougla-Cowie, E., Piat, F., Kollia, S., Orova, C., & Apolloni, B. (2000). On emotion reognition of fae and of peeh uing neural network, fuzzy logi and the ASSESS ytem. Proeeding of the IEEE-INNS-ENNS International Joint Conferene on Neural Network. Fidaleo, D., & Neumann, U. (2002). Co-Art: Co-artiulation region analyi for ontrol of 2D harater. Proeeding of IEEE Computer Animation,

335 VI Garau, M., Slater, M., Bee, S., & Sae, M. A. (2001). The impat of eye gaze on ommuniation uing humanoid avatar. Proeeding of the SIG-CHI Conferene on Human Fator in Computing Sytem, Garia, C., & Tzirita, G. (1999, September). Fae detetion uing quantized kin olor region merging and wavelet paket analyi. IEEE Tranation on Multimedia. 1(3), Gemmell, J., Zitnik, L., Kang, T., & Toyama, K. (2000). Software-enabled gaze-aware videoonferening. IEEE Multimedia, 7(4), Goto, T., Eher, M., Zanardi, C., & Magnenat-Thalmann, N. (1999). MPEG-4 baed animation with fae feature traking. In Computer Animation and Simulation. Goto, T., Khiragar, S., & Magnenat-Thalmann, N. (2001, May). Automati fae loning and animation. IEEE Signal Proeing Magazine Haverlant, V., & Dax, P. (1999). Portage de VRENG ur RTP [Porting VRENG over RTP]. (Diploma thei, Teleom Pari, September 1999). Holbert, S., & Dugelay, J.-L. (1995). Ative ontour for lip-reading: ombining nake with template. Quinzième olloque GRETSI, Huang, C.-L., Huang, Y.-M. (1997, September). Faial expreion reognition uing modelbaed feature extration and ation parameter laifiation. Journal of Viual Communiation and Image Repreentation, 8(3), Huang, F. J., & Chen, T. (2000). Traking of multiple fae for human-omputer interfae and virtual environment. Proeeding of the IEEE International Conferene and Multimedia Expo. Huang, Y. S., Tai, Y. H., & Shieh, J. W. (2001). Robut fae reognition with light ompenation. Proeeding of the Seond IEEE Paifi-Rim Conferene on Multimedia Huntberger, T. L., Roe, J., & Ramaka, A. (1998). Fuzzy-Fae: A hybrid wavelet/fuzzy elforganizing feature map ytem for fae proeing. Journal of Biologial Sytem Image of mule and bone of the head. (2002). Retrieved: Deember, 2002, from ICA (2003) Independent Component Analyi. Retrieved: Otober, 2003, from INTERFACE IST (1999) IST-European Projet. Retrieved May, 13, 2003, from

336 VII I--N ISO/IEC MPEG-4. (1998, November). Part 1: Sytem. Atlanti City ISO/IEC MPEG-4. (1999, Deember). Part 2: Viual. Maui IST-European Projet: INTERFACE IST Retrieved from: Jone, M. J., & Rehg, J. M. (1999). Statitial olor model with appliation to kin detetion. Proeeding of the Computer Viion and Pattern Reognition Kalra, P., Mangili, A., Magnenat-Thalmann, N., & D. Thalmann. (1992). Simulation of faial mule ation baed on rational free-form deformation. Eurographi. Kampmann, M. (2002) Automati 3-D fae model adaptation for model-baed oding of videophone equene. IEEE Tranation on Ciruit and Sytem for Video Tehnology, 12(3): Kay, S. M. (1993). Chapter 13 in Fundamental of tatitial ignal proeing etimation theory (pp ). Englewood Cliff, NJ: PTR Prentie Hall. King, S. A., & Parent, R. E. (2001). A parametri tongue model for animated peeh. Journal of Viualization and Computer Animation, 12(3), Kouvela, I., Hardman, V., & Waton, A. (1996). Lip Synhronization for Ue over the Internet: Analyi and Implementation. Proeeding of the IEEE Globeom Lee, C. (Produer). (2001) Final Fantay the pirit within [Motion piture]. United State: Square Piture, In. Columbia Tritar Interative Leroy, B., & Herlin, I. L. (1995). Un modèle deformable paramétrique pour la reonnaiane de viage et le uivi du movement de lèvre [A parametri deformable model for fae reognition and lip motion traking]. Quinzième olloque GRETSI, Li, H., Roivainen, P., & Forhheimer, R. (1993). 3-D motion etimation in model-baed faial image oding. IEEE Tranation on Pattern Analyi and Mahine Intelligene, 15(6), Liévin, M. & Luthon, F. (2000) A hierarhial egmentation algorithm for fae analyi. Proeeding of the IEEE International Conferene in Multimedia and Expo, Vol. 2, pp

337 VIII Liévin, M., Delma, P., Coulon, P. Y., Luthon, F., & Fritot, V. (1999) Automati lip traking: Bayeian egmentation and ative ontour in a ooperative heme. Proeeding of the IEEE Int. Conf. on Multimedia Computing and Sytem, Vol. 1, pp Luettin, J., Thaker, N. A., & Beer, S. W. (1996). Statitial lip modeling for viual peeh reognition. Proeeding of the VIII European Signal Proeing Conferene. Luong, Q.-T., & Faugera, O. D. (1997). Self-alibration of a moving amera from point orrepondene and fundamental matrie. International Journal of Computing Viion, 22(3), Luong, Q.-T., Fua, P., & Leler, Y. (2002, January). The radiometry of multiple image. IEEE Tranation on Pattern Analyi and Mahine Intelligene, 24(1), Maliu, M., & Prêteux, F. (2001). MPEG-4 ompliant traking of faial feature in video equene. Proeeding of the International Conferene on Augmented Virtual Environment and 3-D Imaging, Metaxa, D. (1999). Deformable model and HMM-baed traking, analyi and reognition of geture and fae. Proeeding of the International Workhop on Reognition, Analyi, and Traking of Fae and Geture in Real-Time Sytem. Morihima, S. (2001, May). Fae analyi and ynthei for dupliation expreion and impreion. IEEE Signal Proeing Magazine Morihima, S., Ihikawa, T., & Terzopoulo, D. (1998). Phyi model baed very low bit rate 3D faial image oding. Very Low Bit Video Workhop. Moe, Y., Reynard, D., & Blake, A. (1995). Determining faial expreion in real time. Proeeding of the International Workhop on Automati Fae and Geture Reognition, Motion apture webite (2002): / MPEG-4. (2000, January) Signal Proeing: Image Communiation. Tutorial Iue on the MPEG-4 Standard, Vol. 15, No Nikolaidi, A., & Pita, I. (2000). Faial feature extration and poe determination. The Journal of the Pattern Reognition Soiety, 33,

338 IX O--S Odiio, M., Eliei, F., Bailly, G., & Badin, P. (2001). Clone parlant 3D vidéo-réalite: Appliation à l analye de meage audioviual. [Video-realiti 3D talking-lone: applied to the analyi of audioviual meage]. Proeeding of Compreion et Répréentation de Signaux Audioviuel Otermann, J., & Millen, D. (2000, Augut). Talking Head and yntheti peeh: an arhiteture for upporting eletroni ommere. Proeeding of the IEEE International Conferene on Multimedia and Expo Otermann, J., Rurainky, J., & Civanlar, R. (2001). RTP payload format for phoneme/faial animation parameter (PFAP) [Expired April 2002]. RTF: draft-ietf-avt-rtp-pfap-00.txt Otermann, J., Rurainky, J., & Civanlar, R. (2002). Real-time treaming for the animation of talking fae in multiuer environment. Proeeding of the IEEE International Sympoium on Ciruit and Sytem. Pahor, V. & Carrato, S. (1999). A fuzzy approah to mouth orner detetion. Proeeding of the IEEE International Conferene on Image Proeing, pp Pandzi, I. S., & Forhheimer, R. (Ed.). (2002). MPEG-4 Faial Animation. The Standard, Implementation and Appliation. England: John Wiley & Son Ltd. Panti, M., & Rothkrantz, L. J. M. (2000, Deember). Automati analyi of faial expreion: the tate of the art. IEEE Tranation on Pattern Analyi and Mahine Intelligene, 22(12), Pardà, M., & Bonafonte, A. (2002). Faial animation parameter extration and expreion reognition uing Hidden Markov Model. Euraip Signal Proeing: Image Communiation, 17(9), Parke, F. I. (1974). A parametri model for human fae. (Report No. UTEC-CS ). Univerity of Utah: Computer Siene. Pentland, A., Moghaddam, B., & Starner, T. (1994). View-baed and modular eigenpae for fae reognition. Proeeding of the International Conferene on Computer Viion and Pattern Reognition. Piat, F., & Tapatouli, N. (2000). Exploring the time oure of faial expreion with a fuzzy ytem. Proeeding of the International Conferene on Multimedia and Expo. Pighin, F., Heker, J., Lihinki, D., Szeliki, R., & Salein, S. (1998). Syntheizing realiti faial expreion from photograph. Proeeding of ACM SIGGRAPH 98, Pighin, F., Szeliki, R., & Salein, D. H. (1999). Reyntheizing faial animation through 3D model-baed traking. Proeeding of the International Conferene on Computer Viion.

339 X Platon (2000). Frenh government projet RNRT. Retrieved May, 26 th 2003 from: Rabiner, L. R. (1989). A tutorial on hidden Markov model and eleted appliation in peeh reognition. Proeeding of the IEEE, 77(2), RAT & VIC. (2002). Robut Audio Tool and Videoonferening Tool. [Computer oftware, manual and projet data]. Retrieved from and Ravye, I., Sahli, H., Reinder, M. J. T., & Corneli, J. (2000). Eye ativity detetion and reognition uing morphologial ale-pae deompoition. Proeeding of the 15 th International Conferene on Pattern Reognition, Vol. 1, pp Sahbi, H., & Boujemaa, N. (2000). From oare to fine kin and fae detetion. Proeeding of ACM Multimedia 2000, Sahbi, H., Geman, D., & Boujemaa, N. (2002). Fae detetion uing oare-to-fine upport vetor laifier. Proeeding of the IEEE International Conferene on Image Proeing Sarri, N., & Strintzi, M. G. (2001, July-September). Contruting a video phone for the hearing impaired uing MPEG-4 tool. IEEE Multimedia, 8(3). Shulzrinne, H., Caner, S., Fredderik, R., & Jaobon, V. (1996). RTP: a tranport protool for real-time appliation. IETF: RFC Serra, J. (Ed.). (1982). Image analyi and mathematial morphology. London: Aademi Pre. Serra, J. (Ed.). (1988). Image analyi and mathematial morphology. Volume 2: Theoretial advane. London: Aademi Pre. Shimizu, I., Zhang, Z., Akamatu, S., & Deguhi, K. (1998). Head poe determination from one image uing a generi model. Proeeding of the Third International Conferene on Automati Fae and Geture Reognition, Similar (2003). Network of Exellene. Sixth European Reearh Framework. Information retrieved July, the 15 th 2003, from Simunek, M. (2003). Viualization of talking human head. (2003). Eletroni verion retrieved July, the 15 th, 2003, from:

340 XI S--Z Spor, S., & Rabetein, R. (2001). A real-time fae traker for olor video. Proeeding of the International Conferene on Aouti Speeh and Signal Proeing Ström, J., Jebara, T., Bau, S., & Pentland, A. (1999). Real time traking and modeling of fae: and EKF-baed analyi by ynthei approah. Proeeding of the Modelling People Workhop at ICCV 99. Sturman, D. J. (1994). A brief Hitory of Motion Capture for Charater Animation. Proeeding of SIGGRAPH Sum, K. L., Lau, W. H., Leung, S. H., Liew, A. W. C., & Te, K. W. (2001) A new optimization proedure for extrating the point-baed lip ontour uing ative hape model. Proeeding of the Int. Conf. Aouti Speeh and Signal Proeing. Talking Head Webite. (2002) Tang, L., & Huang, T. S. (1994). Analyi-baed faial expreion ynthei. Proeeding of the IEEE International Conferene on Image Proeing Terzopoulo, D., & Water, K. (1993, June). Analyi and ynthei of faial image equene uing phyial and anatomial model. IEEE Tranation on Pattern Analyi and Mahine Intelligene, 15(6). Thalmann, D. (1996, November). The Complexity of Simulating Virtual Human Superomputing Review. EPFL SCR No 8 Tian, Y., Kanade, T., & Cohn, J. F. (2001, February). Reognizing ation unit for faial expreion analyi. IEEE Tranation on Pattern Analyi and Mahine Intelligene, 23(2), Truo, E., & Verri, A. (1998). Introdutory to tehnique for 3D omputer viion. Prentie Hall Turk, M., & Pentland, A. (1991). Eigenfae for reognition. Journal of Cognitive Neuroiene, 3(1). UIB (2002). Univeritat de le Ille Balear. Mathemati and Computer Siene Department. Computer Graphi and Viion Group. Information retrieved from: Valente, S. & Dugelay, J.-L. (2000, February). Fae traking and realiti animation for teleommuniant lone. IEEE Multimedia Magazine,

341 XII Valente, S. (1999). Analye, ynthèe et animation de lone dan un ontext de téléréunion virtuelle [Analyi, ynthei and animation of lone within a virtual teleonferene framework]. (Dotoral diertation. Éole Polytehnique de Lauane, Int. Euréom, 1999). Valente, S., & Dugelay, J.-L. (2001). A viual analyi/ynthei feedbak loop for aurate fae traking. Signal Proeing: Image Communiation, 16(6), Valente, S., André del Valle, A. C., & Dugelay, J.-L. (2001). Analyi and reprodution of faial expreion for realiti ommuniating lone. Journal of VLSI and Signal Proeing, 29, Valente, S., Dugelay, J.-L. (2000, February). Fae traking and realiti animation for teleommuniant lone. IEEE Multimedia Magazine, Varaklioti, S., Otermann, J., & Hardman, V. (2001). Coding of animated 3-D wireframe model for Internet treaming appliation. Proeeding of the IEEE International Conferene on Multimedia and Expo. Video Cloning (1999). Video Cloning and Virtual Teleonferening. Image and Video Group for Multimedia Communiation and Appliation. Intitut Euréom. Retrieved May, 26 th 2003 from VRML (2003). VRML tandard information provided by the Web3D Conortium. Retrieved June, 13 th 2003 from Water, K. (1987, July). A mule model for animating three-dimenional faial expreion. ACM Computer Graphi, 21(4). Wikott, L. (2001, July 11). Optial Flow Etimation. Retrieved September, 26, 2002, from Yaoob, Y., & Davi, L. (1994). Computing patio-temporal repreentation of human fae. Proeeding of Computer Viion and Pattern Reognition Conferene, Zhang, Z. (2000). A flexible new tehnique for amera alibration. IEEE Tranation on Pattern Analyi and Mahine Intelligene, 22(11), Zhenyun, P., Wei, H., Luhong, L., Guangyou X., & Hongjian, Z. (2001). Deteting faial feature on image equene uing ro-verifiation mehanim. Proeeding of the Seond IEEE Paifi-Rim Conferene on Multimedia..

342

343

344 RESUME ETENDU EN FRANÇAIS Note de l auteur : Tout d abord, je voudrai m exuer pour mon niveau de françai. Enuite, je voudrai ajouter que le but prinipal de e réumé et de fournir une ompilation de point le plu important de ma thèe. Meri pour votre ompréhenion.

345 réumé - 2

346 Réumé étendu en Françai I Introdution 1 Motivation Le lonage de viage et devenu un beoin pour beauoup d'appliation multimédia pour lequelle l'interation humaine ave le environnement virtuel et augmenté ameilleure l'interfae. Son futur prometteur dan différent eteur tel que la téléphonie mobile et l'internet l'a tranformé en ujet important de reherhe. La preuve de et intérêt et l'apparition roiante de ompagnie offrant à leur lient la réation de viage ynthétique adapté aux beoin du lient et le outien gouvernemental omme le projet européen INTERFACE (1999). Nou pouvon laer le viage ynthétique en deux groupe prinipaux : le avatar et le lone. Le avatar ont généralement une repréentation approximative ou ymbolique de la peronne. Leur apet n'et pa trè préi. Il ont indépendant de louteur pare que leur animation uit de règle générale indépendamment de la peronne qu'on uppoe qu'il repréentent. La plupart de viage ynthétique ommeriaux atuel tombent dan ette atégorie. Le lone ont plu réalite et leur animation tient ompte de la nature de la peronne : il ont dépendant du louteur. Motivé par le avantage et le amélioration multiple qu'utilier le aratère virtuel réalite pourrait fournir aux téléommuniation, nou voulon étudier la pratiabilité de le employer dan le ytème traditionnel de vidéoonférene, en utiliant uniquement un améra. Cette diertation ouvre la reherhe développée ur la réation de nouveaux algorithme faiaux d'analye de mouvement et d'expreion afin de replier le mouvement humain ur le modèle réalite de viage qui eront employé dan de appliation de téléommuniation. Le développement omplet de notre adre d'analye et baé ur l'hypothèe qu'un modèle 3D réalite du louteur qui et devant la améra et diponible. Nou royon que de mouvement réalite peuvent eulement être reproduit ur de modèle réalite et, en e a, le modèle 3D et déjà diponible au ytème. L'information la plu préie obtenue à partir de équene viuelle monoulaire prie dan de environnement tandard (ave un élairage inonnu ; auun marqueur ;...), peut eulement être obtenue i quelque donnée ur la géométrie de l'utiliateur ont onnue, par exemple, en employant on lone réalite, omme le faion nou. 2 Contribution Nou propoon de nouveaux algorithme d'analye d'image pour le trait péifique du viage (oeil, ouril et bouhe) qui eaient de profiter autant que poible de la phyionomie et de l'anatomie du viage du loteur. D'abord, e tehnique ont été définie et examinée pour une poition frontale : réumé - 3

347 Réumé étendu en Françai Suivi de l'état d'oeil : Nou avon développé de algorithme indépendant de l élairage pour évaluer le mouvement d'œil. Il emploient de ontrainte anatomique d'intra-trait naturelle pour obtenir le regard et le omportement de la paupière à partir de l'analye de la ditribution d'énergie ur la région de l'oeil. Nou avon également examiné la poibilité d'employer l'information de ouleur pendant l'analye. Nou interpréton le réultat d'analye en terme de quelque unité péifique d'ation que nou aoion aux état temporel. En uivant un diagramme d'état temporel qui emploi de ontrainte d'intertrait pour plaer la onordane entre le deux yeux, nou rapporton no réultat d'analye aux paramètre finaaux qui dérivent le mouvement de l'oeil. Analye de Mouvement de Souril : Pour étudier le omportement de ouril de équene viuelle, nou utilion une nouvelle tehnique d'analye d'image baée ur un modèle anatomique-mathématique de mouvement. Cette tehnique onçoit le ouril omme un objet inurvé imple (ar) qui et ujet à la déformation due aux interation muulaire. Le modèle d'ation définit le déplaement 2D (vertiaux et horizontaux) implifié de l ar. Notre algorithme d'analye viuelle réupère le donnée néeaire de la repréentation de l ar pour déduire le paramètre qui ont déformé le modèle propoé. L'analye oulaire d'expreion omplète et obtenue aprè appliation de quelque ontrainte d'inter-trait parmi le yeux et le ouril. Cei nou permet d'enrihir la quantité d'information de mouvement obtenue à partir de haque trait, en le omplétant ave l'information provenant de autre. La Bouhe : C'et la aratéritique faiale la plu diffiile à analyer ; don, nou royon qu'une tratégie hybride pour dériver on mouvement devrait être utiliée : voix et image onjointement. Notre analye et baée ur le fait uivant : le mouvement de la bouhe peut exiter même i auun mot n'et prononé et le ation an parole de la bouhe ont importante pour exprimer l'émotion dan de ommuniation. Cette thèe préente le premier réultat obtenu à partir d une tehnique d'analye onçue pour étudier le apet viuel du omportement de la bouhe. Nou déduion e que ont le aratéritique de la bouhe fournie par le viage, le plu utile lorque le ondition d illumination ne ont pa onnue, et omment e aratéritique peuvent être analyée onjointement pour extraire l'information qui ommandera le modèle de mouvement muulaire propoé pour on analye. La ontribution prinipale de notre travail vient de l'étude de la onnetion de e algorithme ave l'information de la poe du viage extraite du ytème de uivi du mouvement rigide du viage. La tehnique préentée permet à l'utiliateur plu de liberté de mouvement pare que nou pouvon employer aui e algorithme indépendamment de l'endroit où e trouve l'orateur omme poible. Analye d'expreion Faiale Robute au Mouvement de la Poe 3D du Viage : Le filtrage de Kalman et ouvent utilié dan le ytème de uivi de viage pour réumé - 4

348 Réumé étendu en Françai deux but différent : d abord, il lie temporellement hor de paramètre globaux prinipaux etimé, enuite, il onverti le poition de obervation 2D de trait faiaux en évaluation 3D et en prédition de la poition et de l'orientation prinipale du viage. Dan notre appliation, le filtre de Kalman et le noeud entral de notre ytème de uivi : il réupère la poition et l'orientation globale prinipale, il prévoit le poition 2D de point de trait pour l'algorithme d appariement, et 'et le point exploité pour de appliation de téléommuniation, il fait au modèle ynthétié avoir la même éhelle, poition, et orientation que le viage du louteur dan la vraie vue, en dépit d avoir fait une aquiition par une améra non alibré. Aprè avoir déjà développé et teté poitivement de algorithme d analye de trait de viage pour de tête étudiée depui une perpetive frontale, nou avon beoin d adapter e algorithme à n'importe quelle poe. La olution que nou propoon définit le région de trait à analyer et le paramètre de modèle de mouvement de haque trait en 3D, audeu du modèle prinipal dan a poition neutre. Le proédé omplet peur e réumer : (i) (ii) Nou définion et formon le eteur à analyer ur l'image. Pour faire aini, nou projeton le ROI 3D défini au-deu du modèle du viage ur l'image en employant le paramètre de poe prédit, de e fait obtenant le ROI 2D. Nou appliquon l'algorithme d'analye d'image du trait ur e eteur extrayant le donnée demandée. (iii) Nou interpréton e donnée depui une perpetive tridimenionnelle en inverant la projetion et le tranformation due à la poe (paage de donnée de 2D à 3D). En e moment, nou pouvon omparer le réultat aux paramètre d'analye du trait déjà prédéfini ur le lone en poition neutre et déider quelle ation a été faite. La tehnique que nou employon diffère d'autre approhe préédente puique nou employon expliitement le donnée du lone pour définir l'algorithme d'analye en 3D. Le avantage prinipaux de notre olution ont la ommande omplète de l'endroit et de la forme de la région d'intérêt (ROI), et la réutiliation d'algorithme d'analye d'image de viage déjà examiné qui ont robute ur de viage qui regardent frontalement la améra. D'autre ontribution : La thèe ontient de analye et de diuion au ujet du rôle de l'animation faiale dan le téléommuniation. Nou avon également donné une deription formelle de e qu et l animation faiale en utiliant le modèle ynthétique en terme de génération et ompréhenion de paramètre de mouvement. Cette expliation théorique permet la laifiation de ytème d'animation faiale omplet en omparant leur exéution onernant le degré de réalime qu'il permettent. Il dérit également un adre pour omprendre le niveau de l'interopérabilité parmi différent ytème d'animation faiale. réumé - 5

349 Réumé en Françai I Tehnique d'analye d'image Faiale et leur Prinipe Fondamentaux Relié Beauoup de odeur viuel font l analye de mouvement pour reherher l'information de mouvement qui aidera la ompreion. Le onept de veteur de mouvement, d'abord onçu à l'heure du développement de première tehnique viuelle de odage, et intimement lié à l'analye de mouvement. Ce première tehnique d'analye aident à régénérer le ordre viuel omme reprodution exate ou approximative de équene originale, en employant la ompenation de mouvement ur le image voiine. Il peuvent ompener mai ne peuvent pa omprendre le ation de objet e déplaçant dan la vidéo et don, il ne peuvent pa reontituer le mouvement de l'objet ou un point de vue différent, ou immergé dan un énario tridimenionnel. Le viage jouent un rôle eentiel dan la ommuniation humaine. En onéquene, il ont été le premier objet dont le mouvement a été étudié afin de reréer l'animation ur le modèle ynthétié ou interpréter pour le mouvement pour un uage potérieur. La Figure I-1 illutre l'organigramme de bae pour de ytème onaré à l'expreion et à l'analye de mouvement faial ur de image monoulaire. La vidéo, ou enore de image, ont d'abord analyée pour déteter, ommander et déduire l'endroit de viage ur l'image et le ondition environnementale ou lequelle l'analye era faite (la poe prinipale, le ondition de l illumination, le oluion de viage, et.). Pui, quelque algorithme d'analye de mouvement et d'expreion extraient le donnée péifique qui ont finalement interprétée pour produire la ynthèe de mouvement de viage. Video Image PRE-MOTION ANALYSIS FACE MOTION IMAGE ANALYSIS MOTION INTERPRETATION Fae Synthei Camera alibration Illumination analyi Head detetion Poe determination Optial flow PCA Snake Segmentation Deformable model Fae feature modeling Parameter etimation Figure I-1. L'entrée d'image et analyée dan la reherhe de aratéritique générale de viage globaux : mouvement, élairage, et.. À e point le traitement d'image et effetué pour obtenir le donnée utile qui peuvent être, enuite interprétée pour obtenir la ynthèe d'animation de viage Chaun de module peut être plu ou moin omplexe elon le but de l'analye (i.e., de la ompréhenion du omportement général à l'extration du mouvement 3D exate). Si l'analye et prévue pour l'animation potérieure d'expreion de viage, le type de ynthèe de l Animation Faiale (AF) détermine ouvent la méthodologie utiliée pendant l'analye réumé - 6

350 Réumé étendu en Françai d'expreion. Quelque ytème peuvent ne pa paer par le première ou le dernière étape, ou quelque autre peuvent mélanger e étape dan l'analye d'image prinipale de mouvement et d'expreion. Le ytème manquant de l'étape d'analye de pré-mouvement ont le plu ueptible d'être limité par de ontrainte environnementale omme de ituation péiale d'élairage ou une poe prinipale prédéterminée. Ce ytème qui n'effetuent pa l'interprétation de mouvement ne e onentrent pa ur le délivrane d auune information pour exéuter la ynthèe d'animation de viage enuite. Un ytème qui et ené analyer la vidéo pour produire de donnée d'animation de viage d'une manière robute et effiae doit développer le troi module. Le approhe atuellement étudiée en reherhe et elle qui eront expoée dan ette etion effetuent lairement l'analye de mouvement faial et d'expreion et font l'interprétation de mouvement pour pouvoir animer. Néanmoin, bon nombre d'entre elle éhouent à avoir une étape forte d'analye de pré-mouvement pour aurer de la robutee pendant l'analye uivante. Ce hapitre pae en revue de tehnique ourante pour l'analye de image imple pour dériver l'animation. Ce méthode peuvent être laée en e baant ur différent ritère : 1. la nature de l'analye : global ontre baé ur trait, orienté temp réel... ; 2. la omplexité d'information a reherher : génération générale d'expreion ontre le mouvement péifique de viage ; 3. le outil utilié pendant l'analye, par exemple, la oopération d'un modèle 3D prinipal ; 4. le degré de réalime obtenu à partir de la ynthèe de l'animation de viage (FA) ; et 5. le ondition environnementale pendant l'analye : l'élairage ommandé ou uniforme, l'indépendane de la poe du viage. Dan ette etion, le ytème eront préenté dan troi atégorie prinipale, laée par le rapport exitant entre l'analye d'image et la ynthèe prévue d AF, à avoir : Méthode qui reherhent l'information d'émotion : e ont le ytème dont l'analye de mouvement et d'expreion vie à omprendre le mouvement de viage d'une façon générale. Ce tehnique évaluent le ation en terme d'expreion : tritee, bonheur, rainte, joie, et. Ce expreion ont parfoi meurée et pui interprétée par de ytème de AF mai le tehnique d'analye ne e préoupent pa par l AF en elle-même. Le méthode qui obtiennent de paramètre lié à la ynthèe d AF employée : ei inlut le méthode qui appliquent de tehnique d'analye d'image audeu de image dan la reherhe pour de meure péifique diretement liée à la ynthèe d'animation. réumé - 7

351 Réumé en Françai Méthode qui emploient la ynthèe expliite de viage pendant l'analye d'image : quelque tehnique emploient la ynthèe expliite du modèle 3D animé pour aluler de déplaement de nœud du maillage, généralement par l'intermédiaire d'une boule de rétroation. Indépendamment de la atégorie à laquelle il appartiennent, pluieur de méthode qui exéutent l'analye faiale ur de image monoulaire pour produire de l'animation partagent quelque tehnique de traitement d'image et outil mathématique. réumé - 8

352 Réumé en Françai II Clonage Réalite d Animation Faial Évaluer l'animation de ytème faiaux et une tâhe ambiguë pare que de ritère de qualité prédéfini n'exitent pa. La majeure partie du temp, le degré de réalime et de naturalité de la reprodution faiale ynthétique et déterminée à partir du jugement ubjetif. Ce hapitre ontient la définition de quelque onept théorique lié à l'animation faiale. Nou avon eayé de formalier la notion de réalime dan le ontexte de no travaux de reherhe. Nou vion à fournir une bae oneptuelle où le notion d'avatar et de lone oient lairement énonée. Ce adre formel nou permet de dérire l'interation exitant entre la génération faiale de mouvement et a ynthèe depui une perpetive globale. Nou onluon le hapitre ave quelque onidération au ujet du lonage de viage vu d'une perpetive morale. réumé - 9

353 Réumé en Françai III Cadre Étudié de AF pour le Téléommuniation 1 Introdution La demande prévue pour le ytème déployant de l'animation faiale fortement réalite et large. L'animation faiale peut être utile en vidéo lorque elle et utiliée pour ommuniquer par l'intermédiaire de plu nouvelle et flexible liaion omme, l'internet ou la téléphonie mobile, qui n'ont pa de apaité élevée de débit et ne peuvent pa aurer la qualité optimale de ervie. Le ommuniation mobile de prohaine génération ontemplent déjà la poibilité de onveration tête à tête. L e-ommere, qui emploie de vendeur virtuel augmente le ontat ave de lient en utiliant le interfae homme - ordinateur. L'indutrie du jeu peut également tirer bénéfie d'employer de lone de joueur au lieu d avatar imple. En onluion, quelque ytème de ommuniation avané faiant partiiper pluieur peronne (ytème de téléonférene viuel et virtuel) pourraient être onçu pour réduire le entiment de ditane entre le partiipant en préentant quelque élément qui exitent lor de vraie réunion ave de environnement artifiiel mai réalite. Dan e en, notre reherhe a été faite ave l'eprit de développer de emplaement plu avané de téléonférene. Il et important de noter que juqu'ii, toute le appliation mentionnée auparavant ont préféré employer de avatar plutôt que d'animer inuffiamment de viage artifiiel réalite. Cei jutifie le grand effort et le reoure mie dan la reherhe du lonage de viage et la pertinene de la thèe par rapport aux téléommuniation de no jour. Pour réer un lone nou avon beoin d'un modèle 3D du louteur fortement réalite. Au ontraire de avatar ou tête parlante (même i réalite), le lonage de viage implique pour le ytème omplet de la génération d'animation pour être dépendant du louteur. Ce domaine tombe dan la atégorie plu grande de la réalité virtualiée, en oppoition à la réalité virtuelle puique le réalime de la retitution n'et pa atteint à partir de rien par de tehnique avanée de viion par ordinateur mai il et inpiré et ontraint par de vraie donnée audioviuelle du louteur. Le lonage de viage et un exemple appropriée du phénomène réent de la onvergene entre différent domaine de reherhe : analye d'image (i.e. traitement de ignaux), ynthèe d'image (infographie), et téléommuniation. Le viage ynthétique modelié ont animé aprè le ation dérivée de l'interprétation de quelque paramètre d'animation. Produire de paramètre d'animation devient une tâhe diffiile i elle et faite manuellement ; don de ytème automatique ou emi-automatique de génération de paramètre ont été développé. Ce ytème extraient l'information de mouvement du viage à partir du diour, de l'image ou de toute le deux. Le ynthétieur viuel de Texte-À-Diour (TTS viuel), qui e rapportent à e TTS qui fournient également la ynthèe de viage, produient de leur paramètre d'animation du texte d'entrée donné au TTS. Le TTS viuel analye le texte et fournit le phonème réumé - 10

354 Réumé étendu en Françai orrepondant. Ce phonème ont leur repréentation ynthétique de mouvement de bouhe, aortie également de vieme, qui peuvent être ynthétié ultérieurement. Le TTS préentent pluieur avantage : il ont le ytème d'analye le plu imple pour produire de paramètre d'animation de viage, il n'ont pa beoin d'interation humaine et il peuvent produire un mouvement tout à fait préi de bouhe. Pour e raion, ertain de produit d'animation de viage diponible utilient ette tehnique. Nou pouvon également utilier la dualité phonème-vieme pour dériver l'animation de la parole. Dan e a, la parole et analyée pour déduire le phonème. Si nou extrayon le phonème à partir du texte ou à partir de la parole, l'inonvénient prinipal et qu'il produient eulement du mouvement automatique pour la bouhe don une autre oure de génération d'ation et néeaire pour aomplir l'animation de viage. Il donnent de réultat aeptable en animant le aratère non réalite (dein animé, animaux, et.) mai puique leur information produite n'et pa peronnelle, il donnent à peine un entiment humain normal. En plu de rendre plu réalite l animation faiale générique, nou avon beoin également de tehnique d'analye de mouvement pour étudier immédiatement le ation de louteur à un moment donné. En utiliant l AF dan le ommuniation, l'environnement appliatif exige de méthode d'analye non envahiante en temp réel pour produire de paramètre d'animation ; don la plupart de approhe adoptée pour adapter le ytème d'animation ne ont plu utile pour être appliquée aux ommuniation. 2 Vue d'enemble du Sytème La Figure III-1 illutre le ytème que nou propoon pour le lonage faial de mouvement et d'expreion. Pendant que l'information et analyée (ligne verte) ur le louteur, prinipalement de l information viuelle bien qu'il pourrait également être d'origine différente, et obtenu et employé pour reproduire le omportement faial (dénoté par λ), ur un modèle 3D du viage fortement réalite. Le paramètre produit auraient pu être odé et envoyé pour être diretement interprété ; au lieu de ela, il et préférable de imuler le déodage et la ynthèe pendant l'analye d'image. Ajoutant ette rétroation de ynthèe, nou pouvon ommander l'erreur ommie et nou pouvon ajuter le paramètre analyé pour le adapter à un mouvement plu préi (). Le donnée finale (µ) doivent être ompréhenible par le moteur d'animation faiale du déodeur dan l'emplaement à ditane (opie orange), uivant la émantique péifique ou peut-être aprè avoir été adapté à une norme. L'utiliation d'un modèle prinipal fortement réalite du louteur nou permet non eulement l'utiliation d'une rétroation viuelle ommode et exploitable mai également la onnaiane de donnée anthropométrique qui peuvent également être utiliée pendant l'analye. réumé - 11

355 Réumé en Françai Image réale (input) Sytème Geometrie & Texture 3-D Image ynthetique (output) p référene du Viewer Analye d image Coopération Analye Synthèe imulation déodage λ Tranription µ p oe & expreion MPEG-4? valable en 3-D M odèle faiale Réalitique (dep endent du louteur) Un autre m odèle low bit rate Synthèe d image d autre input (texte ou parole)? Figure III-1. En utiliant l'animation de lone pour le ommuniation, là exitent deux part ative prinipale. Le générateur faial de paramètre d'animation (opie verte), qui et inlut dan la partie de odage/tranmiion et fait le traitement d'image lourd ; et le moteur faial d'animation (opie orange), qui et plaé dan le réepteur et dont la tâhe et de régénérer le mouvement faial ur le lone du louteur en interprétant de fap. Le adre i-deu a préenté à de utiliation la rétroation ynthétique d'image de lone d'améliorer l'analye d'image et de produire de l'information plu préie de mouvement Fondamentalement, le développement omplet de e ytème ontient 4 blo prinipaux : (i) aquiition ou réation du modèle prinipal ynthétique artifiiel 3D, réalite et dépendant du louteur ; (ii) analye vidéo d'un louteur étant enregitré dan un environnement normal pour extraire de paramètre pour l'animation ; (iii) ompreion et tranmiion de paramètre entre l'enodeur et le déodeur ; (iv) ynthèe du modèle 3D et de on animation de paramètre reçu. Le noyau prinipal du travail de reherhe préenté dan ette thèe : la tratégie d'aoiation poe-expreion pour l'analye faiale de mouvement, a été développée pour atifaire aux exigene du blo (ii). réumé - 12

356 Réumé en Françai IV Analye Faiale de Mouvement depui une Perpetive Frontale 1 Introdution Le modèle développé de mouvement de trait utilient pa eulement de ontrainte anatomique (intra-trait) pour dériver de ation du trait, il emploient également de ontrainte tandard normale humaine de mouvement pour produire de l'animation faiale réalite exploitant la orrélation exitante entre le yeux, et entre le yeux et le ouril (intertrait). Le tehnique de traitement d'image ont propoé l'eai globalement pour réduire au minimum l'influene de oure d'erreur inonnue et pour améliorer le omportement global omprenant en impoant quelque retrition humaine tandard de mouvement aux donnée obtenue à partir de l'analye de haque trait. Il e onforment une tratégie d'analye qui vie à fournir la ompréhenion logique de mouvement qui peut produire de donnée fiable d'animation pour reproduire ynthétiquement de expreion faiale de trait d'entrée viuelle analyée. Le algorithme développé uppoent que l'endroit et la délimitation de la région du trait d'intérêt (ROI) (yeux, ouril ou bouhe) ont onnu. Cette hypothèe et réalite dan le ontexte atuel pare que, omme expliqué au Chapitre V, le proédé qui prolonge l'utiliation de e algorithme à n'importe quelle autre poe prend également oin du uivi et de la définition du trait ROI. poe & 3D model Intrafeature ontraint eye Interfeature ontraint eyebrow mouth Motion Model Parameter Figure IV-1. Diagramme général du adre d'analye propoé - le partie qui ont reliée à l'analye faiale d'expreion ont été aentuée réumé - 13

357 Réumé en Françai 2 Algorithme d'analye de l'état de Yeux L'importane du regard pour la ommuniation humaine et ignifiative. Le «regard et un omportement rihement intrutif dan l'interation tête à tête. Il ert au moin inq fontion ditinte (...), éoulement de régulation de onveration, fourniant la rétroation, l'information émotive ommuniante, ommuniquant la nature de rapport interperonnel et évitant la ditration en limitant l'entrée viuelle»,(garau et al., 2001). En développant de nouveaux ytème de téléommuniation pour la vidéoonférene la ompréhenion et la reprodution orrete du mouvement de yeux devient néeaire. Un exemple de ela et le projet de reherhe de Mirooft "GazeMater", un outil viant à fournir une vidéoonférene ave un regard orrigé (Gemmell, Zitnik, Kang et Toyama, 2000). En raion du vate nombre d'appliation où le mouvement de yeux obtenu par l'analye d'image et utile (détetion de leur fermeture dan la onduite, le odage baé ur modèle dan le téléommuniation, la onnaiane humaine de ation dan HCI, et.), y exitent beauoup de tehnique pour étudier l'ativité de l'oeil ur de image monoulaire. Ce n'et pa le but de e hapitre de reviiter toute le méthode poible qui peuvent être trouvée dan différent domaine de reherhe, mai nou regarderont quelque approhe qui e relient à notre travail dan le ommuniation viuelle. Deux tehnique prinipale ont été employée pour analyer le mouvement de yeux ur de image : APC et template déformable (modelant mouvement), nou renvoyon le leteur au Chapitre I pour le détail théorique ur e tehnique. APC a été largement étudié pour analyer le mouvement faial, urtout ouplé à l'utiliation d un flot optique omme oure de donnée de mouvement (Valente, 1999). La plupart de travaux réent préfèrent faire ette analye par ACI (analye ompoante indépendante) plutôt que d'employer APC (Fidaleo et Neumann, 2002). Dan le deux a, leur inonvénient prinipal et la dépendane d'exéution ur le ondition environnementale de l'analye, fondamentalement ur l'élairage. L'utiliation de template de mouvement emble être la olution hoiie pour reherher de ation de yeux de manière robute (Goto, Eher, Zanardi et Magnenat-Thalmann, 1999 ; Tian, Kanade et Cohn, 2001). Généralement, e template de mouvement e ompoent par de ellipe et de erle, repréentant la forme de l oeil, qui ont extrait à partir de image. Si l'indépendane de l élairage et herhé, le flot optique ne peut pa être employé, et d autre outil de traitement d'image, analye en utiliant la morphologie mathématique, filtrage non linéaire, et. ont utilié. Travailler viant de ondition flexible mène de herheur à reherher de olution où de réultat inorret dan l'analye devraient être ompené ou réduit au minimum, par exemple, en étudiant le omportement temporel de ation de l'oeil. Ravye, Sahli, Reinder et Corneli (2000) exéutent l'analye de mouvement de l'oeil en employant une approhe mathématique de meure ur l epae de réumé - 14

358 Réumé en Françai morphologie, formant le ourbe patio-temporelle hor de tatitique de meure de balane. Le ourbe réultante fournient une meure direte du gete de l'oeil, qui peut alor être employé omme paramètre d'animation de yeux. Bien que dan leur artile il onidèrent eulement l'ouverture et la fermeture de yeux, il montrent déjà le potentiel d'employer l'évolution temporelle du mouvement de yeux pour leur analye. Notre approhe uit la même philoophie d'analye que elle préentée par Ravye et al. Elle diffère dan le traitement d'image impliqué : nou propoon la dédution du mouvement par l'étude de l emplaement de la pupille pare qu'il fournit le regard de l'oeil et le information d ouverture et de fermeture. Au lieu d'une analye tatitique, nou préenton un diagramme d'état temporel baé ur le omportement tandard du mouvement humain qui ontraint le ation en utiliant quelque retrition intra-trait. En téléommuniation, il et trè important de produire de expreion faiale non dérangeante. Il et déjà diuté par Al-Qayedi-Qayedi et Clark (2000), la onnaiane du omportement humain tandard peut être utile pour animer le yeux. 3 Algorithme d Analye de Mouvement de Souril Hitoriquement, l'analye de mouvement de ouril a été moin étudiée que d'autre tehnique d'analye de trait (le yeux et la bouhe). Dan la littérature nou ontaton que le premier eai pour analyer le omportement de ouril (Huang, C.- L. Et Huang, Y.- M, 1997) ont onerné par la reherhe d'information d'expreion. Plu réemment, Goto, Khiragar et Magnenat-Thalmann (2001) ont également préenté une méthode pour analyer le mouvement de ouril afin d extraire de paramètre d'animation faiale. La méthodologie d'analye uivie et plutôt heuritique et le approhe propoée ne préentent pa l'influene de ondition environnementale. Kampmann (2002) propoe une tehnique qui peut déteter le ouril même i il ont partiellement ouvert par de heveux. En général, nou n'avon trouvé auune tehnique d'analye de mouvement qui relie formellement le réultat de traitement d'analye d'image à la génération de paramètre de mouvement. Dan ette etion nou dérivon une tehnique d'analye de mouvement de ouril où le traitement d'image a u adapter l analye aux aratéritique de l'utiliateur et aux ondition environnementale. Nou rapporton le réultat de ette analye d'image diretement à un modèle de mouvement. Pour étudier le omportement viuel de ouril, nou utilion une nouvelle tehnique d'analye d'image baée ur un modèle anatomique-mathématique de mouvement. Cette tehnique onçoit le ouril omme un objet inurvé imple (ar) qui et ujet à la déformation due aux interation muulaire. Le modèle d'ation définit le déplaement 2D (vertiaux et horizontaux) implifié de l ar. Notre algorithme d'analye réupère le donnée réumé - 15

359 Réumé en Françai néeaire de la repréentation de l ar pour déduire le paramètre qui ont déformé le modèle propoé. 4 Corrélation Spatiale entre le Yeux et le Souril : Étude de Expreion Extrême Généralement, le modèle de mouvement le plu omplexe ont le moin robute en temp d analye aux ondition environnementale inattendue. No algorithme d'analye ont robute grâe à leur impliité. Cette impliité limite le mouvement individuel propre à l'oeil et aux mouvement normaux et logique du ouril ; e ontrainte onviennent dan de ommuniation homme/homme mai elle peut regrettablement filtrer ertain détail qui ajoutent la fore à l'expreion, urtout, en préene d émotion extrême (joie, olère, et.). Pour ompener partiellement ette limitation, nou propoon également d'exploiter la orrélation exitante entre le mouvement de l'œil et du ouril, pour enrihir l'expreion oulaire globale provenant de l'analye individuelle de haque trait. Quand le yeux ont fermé le paupière peuvent e omporter de deux manière différente, elle peuvent être fermée an tenion i le ouril ont neutre ou tiré ver le haut ; ou elle peuvent être fortement fermée i le ouril ont abaié. Quand le yeux ont ouvert, le niveau de la taille du ouril indique le degré d'ouverture de la paupière. La Figure IV-3 illutre ette orrélation évidente entre la paupière et le ouril. Le ation extrême du ouril déterminent et raffinent le mouvement de l'oeil en : (i) (ii) prolongeant l'information à l'intérieur du diagramme d'état temporel de l'oeil pour inlure le ontrainte d'inter-trait dérivé de l'analye du ouril. Par exemple, avoir en ba une ation forte de ouril aura aurément omme onéquene une ation de fermeture de l oeil, même i le donnée de l'oeil ne ont pa fiable (Figure IV-2), dérivant le omportement final de la paupière ynthétique d'ajouter à la poition obtenue à partir de l'endroit de pupille un mouvement upplémentaire limité par la fore du mouvement du ouril : (IV-1) ave y new eyelid former eyelid = y µ fap η y eyelid y MAX eyelid y eyelid y eyelid MAX µ = et η = fap. 0 fap fap fap fap MAX former 0 former MAX 0 réumé - 16

360 Réumé en Françai S L t =f(x L t,y L t,w L t,h L t ) S R t =f(x R t,y R t,w R t,h R t ) No S L t =S R t? Ye S t =S L t =S R t No X t new = (X t L X t R )/2 S t R next to S t-1? t Ye Y new = (Y t L Y t R )/2 W t = (W t L W t R )/2 S t =f(x t new,y t new,w t,h t ) H t = (H t L H t R )/2 No Y L t & Y R t very low? Or, X L t oppoite extreme of X R t? No Eyebrow tate = downward? No S t =S t-1 Ye Ye S t =S loe S t-1 (next omparion)=s t-2 Figure IV-2. Le diagramme d'état temporel de bae appliqué à l'analye de l'oeil et établi ur eulement de ontrainte d'inter-oeil peut être omplété pour tenir ompte de donnée obtenue à partir de l'analye de ouril (a) (b) () (d) (e) (f) Figure IV-3. Quand l'oeil et fermé (rangée inférieure), le hangement de paupière dû à l'ation de ouril peut être pri en tant que ertaine animation péifique. Quand l'oeil et ouvert (rangée upérieure) elle doit être prie en onidération pour hanger le mouvement vertial tandard de la paupière 5 Analye du mouvement de la bouhe et de lèvre 5.1 Introdution L'analye du mouvement de la bouhe a été étudiée depui longtemp dan différent domaine. C'et devenu un hamp de reherhe large pare que pluieur de tehnique étudiée vient à fournir de outil utile pour la vie quotidienne, omme par exemple, "lip- réumé - 17

361 Réumé en Françai reading automatique" pour le ourd. En uivant la philoophie de no travaux de reherhe, nou nou onentron ur e algorithme qui aident à obtenir de manière plu effiae la tranmiion de l'information dan le ytème de ommuniation vidéo, en ubtituant la vidéo traditionnelle par l'animation de opie 3D de louteur. En effet, l'analye de la bouhe joue un rôle important dan e énario pare que l'exatitude du mouvement de la bouhe et la ynhroniation de ation de la bouhe ave l'aoutique produite pendant la onveration ont ruiale pour obtenir de ommuniation plaiante et normale. Nou pouvon onidérer que le mouvement global de la bouhe peut être vu omme réultat de deux fateur : M TOTAL = M peeh M expreion. Là où M peeh repréente le mouvement normal lié à l'artiulation de bruit et de phonème tout en parlant et M expreion, et la partie du mouvement qui montre l'expreion émotive et le omportement peronnel de l'individu. Il et faile de ditinguer le ompoant du mouvement venant de l'expreion quand auun diour n'et préent. Il et plu diffiile de déduire omment le ation de deux nature agient mutuellement l'un ur l'autre. Examinant ette quetion de la perpetive invere, éparant de ompoant de mouvement de bouhe en e baant ur leur nature (la parole ou l expreion) pendant l'analye, et également un axe atuel de reherhe dan la ommunauté d'animation faiale. Pendant la réation du mouvement automatique à ynthétier ur le modèle 3D (habituellement avatar), nou ombinon l'information phonétique de mouvement ave de donnée de mouvement d'expreion. Cette ombinaion doit être faite de telle manière que le omportement faial réultant agie d'une manière naturelle. Dan la plupart de a l'interation phonétique et d'expreion ne mènent pa à de réultat plaiant et normaux. La onnaiane de l'interation muulaire et du omportement faial normal doit être employée pour déduire le bon mouvement et pour adapter l'animation produite aprè avoir enuite analyé l'apet viuel de ation de la bouhe. Pour développer un adre omplet d'analye, nou avon étudié le avantage et le inonvénient de la plupart de méthode trouvée dan la littérature. Nou avon dérivé une approhe qui onvient à notre énario en développant un modèle imple de mouvement pour 'aurer que e paramètre d'ation eront déteté de manière fiable pendant l'analye, indépendamment de ondition environnementale. réumé - 18

362 Réumé en Françai V Prolongation de l'utiliation de Modèle de Mouvement Frontaux à Tout Autre Poe 1 Introdution Dan la littérature nou avon trouvé deux approhe prinipale pour adapter le mouvement faial frontal et le algorithme d'analye d'expreion : 1. Créer un template de trait par haque poe : aprè avoir développé et examiné de template de mouvement ur le viage frontaux, il ont redéfini en e baant ur différente poe prédéterminée du viage. Par exemple, 'et la olution donnée par Tian, Kanade et Cohn (2001). Il urmontent la limitation de poe dan leur analye en définiant "un modèle d'état multiple de viage", où différent modèle ont employé pour différent état prinipaux (à gauhe, à droit, ver le ba, et.). Cette tratégie d'analye et limitée. La omplexité de ette olution augmente ave le nombre d'état, qui eront grand i beauoup de liberté de mouvement et reherhée. 2. Retifiation de l'image d'entrée : l'image à analyer et tranformée pour obtenir une approximation du viage vu d'une perpetive frontale. Pui, le algorithme de traitement d'image défini pour le viage frontaux analyent ette nouvelle image pour obtenir le template orrepondant du trait (Chang, et al., 2000). Cette olution fontionne bien pour de léger mouvement rigide. De rotation et le tranlation ignifiative ne peuvent pa être ompenée ave de tranformation imple d'image pare que : l'apet de haque trait du viage dépend non eulement de la projetion due à la poe mai également de a forme 3D, don une retifiation 2D faite an reonnaître la nature 3D du trait ne peut pa être préie ; l'image retifiée peut manquer quelque eteur olu ur l'image originale ; et la retifiation 2D peut hanger la pereption d'élairage et la forme anatomique de trait, qui et trè importante dan l'analye d'image baée ur de trait ou aratéritique faiale. Nou propoon une approhe différente pour faire l'adaptation frontale de l'analye de mouvement. Notre olution emploit la onnaiane de la poe prinipale et de la phyionomie de l'utiliateur pour interpréter le expreion dan l'epae 3D au lieu de proéder ur l'image. réumé - 19

363 Réumé en Françai 2 Adaptation de Template de Trait Le proédé algorithmique d'adaptation uit e étape : (a) Nou redéfinion d'abord le modèle de mouvement, la région d'intérêt (ROI) et le paramètre de traitement d'image lié à haque template de trait en 3D, uppoant que la tête fait fae à la améra, dan a poe neutre. (b) Aprè, nou employon l'information onernant le mouvement rigide du viage ur l'armature analyée pour projeter le ROI défini par le 3D et d'autre ontrainte d'analye de haque trait ur l'image viuelle. Pui, nou appliquon le traitement d'image pour extraire le donnée. () Finalement, nou inveron la projetion et la tranformation de poe de e donnée pour obtenir leur équivalent 3D qui era prêt à être omparé ave le modèle de mouvement que nou avion déjà défini dan 3D. La figure V-1 préente une interprétation graphique du proédé d'adaptation appliqué à l'analye de trait de l'oeil. Pour l'analye adaptée nou devon définir : (i) (ii) (iii) un modèle d'obervation. Pour développer l'adaptation, nou onidéron notre énario d'analye : un objet 3D (tête) devant un améra qui aquiert le image viuelle qui ont analyée. Nou établion la poe neutre de la tête, quand le viage et omplètement porté ur l'image et regarde tatiquement ver le entre de la améra. Le modèle d'obervation dérit mathématiquement le rapport entre le oordonnée de l'objet prinipal dan a poe neutre et la vue finale du viage ur la vidéo. Ce modèle mathématique nou permet d'interpréter de donnée aoiée au viage 3D modelé dan l epae d'image (2D) et vie vera. un modèle 3D de la tête. Le tehnique d'analye de template de mouvement définie pour une vue frontale aident à onnaître l'endroit de trait de viage ur l'image. De même, pendant l'adaptation nou devon avoir la phyionomie de la peronne faiant fae à la améra et aini pouvoir loalier exatement le trait dan l'epae 3D. Nou employon une repréentation 3D fortement réalite de la peronne pour déterminer la poition du ROI de haque trait. une approximation de la urfae de haque trait. Le modèle d'analye ont à l'origine défini pour analyer l'information dan le plan d'image. Nou réumé - 20

Réumé en Françai pouvon failement adapter e modèle de mouvement en traçant diretement haun d'eux ur une urfae parallèle à e plan image et ituée à l'emplaement déterminé du trait ur la tête 3D dan a

364 Réumé en Françai pouvon failement adapter e modèle de mouvement en traçant diretement haun d'eux ur une urfae parallèle à e plan image et ituée à l'emplaement déterminé du trait ur la tête 3D dan a poe neutre. Pour obtenir le plan parallèle le plu approprié, nou développon l'approximation linéaire de la urfae qui ouvre la région du mouvement de haque trait. 2D 3D KALMAN Poe Parameter 2D 3D 1 2 B A 4 3 Eye State Parameter Neutral Frontal Figure V-1. Ce diagramme illutre le proédé général d'adaptation appliqué à l'algorithme d'analye de l'oeil. D'abord, le ommet qui définient le ROI 3D ur le modèle extérieur linéaire ont projeté ur le plan image. Alor l'algorithme de traitement d'image reherhe l'information déirée en analyant l'intérieur le eteur délimité. Pour omprendre le mouvement du trait, de donnée ont interprétée dan l'epae 3D, audeu du modèle de mouvement qui a été défini ur l'approximation extérieure linéaire du trait de l'oeil vu d'une perpetive frontale. Une foi que le mouvement et interprété il peut être reproduit ur un modèle prinipal ynthétique. réumé - 21

KINEMATIC ANALYSIS OF VARIOUS ROBOT CONFIGURATIONS

KINEMATIC ANALYSIS OF VARIOUS ROBOT CONFIGURATIONS International Reearh Journal of Engineering and Tehnology (IRJET) e-in: 39-6 Volume: 4 Iue: May -7 www.irjet.net p-in: 39-7 KINEMATI ANALYI OF VARIOU ROBOT ONFIGURATION Game R. U., Davkhare A. A., Pakhale..