Carnegie Mellon University

Size: px

Start display at page:

Download "Carnegie Mellon University"

Agnes Dorthy Cross
6 years ago
Views:

1 Caregie Mello Uiversity CARNEGIE INSTITUTE OF TECHNOLOGY THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy TITLE Pose Robust Video-Based Face Recogitio PRESENTED BY Xiaomig Liu ACCEPTED BY THE DEPARTMENT OF ADVISOR, MAJOR PROFESSOR DEPARTMENT HEAD DATE DATE APPROVED BY THE COLLEGE COUNCIL DEAN DATE 1

2 Abstract Researchers have bee workig o huma face recogitio for decades. Face recogitio is hard due to differet types of variatios i face images, such as pose, illumiatio ad expressio, amog which pose variatio is the hardest oe to deal with. To improve face recogitio, this thesis presets a itegrated approach to performig pose robust video-based face trackig ad recogitio by usig a face mosaic model. We approximate a huma head with a 3D ellipsoid model, where each face image is a projectio of the 3D ellipsoid at a certai pose. I our approach, both traiig ad test images are projected back to the surface of the 3D ellipsoid, accordig to their estimated poses, to form the texture maps. Thus the recogitio ca be coducted by comparig texture maps istead of the origial images, as doe i traditioal face recogitio. I additio, by represetig the texture map as a array of local patches, we ca trai a probabilistic model for comparig correspodig patches. With multiple traiig images uder differet views, we are able to obtai a statistical mosaic model as well as a geometric deviatio model, which ot oly reduces the blurrig effect i the mosaic model, but also serves as a idicatio of how much the actual huma faces geometry deviates from the 3D ellipsoid model. Furthermore, we apply the face mosaic model to video-based face recogitio. The mosaic model is able to simultaeously track, register, ad recogize huma faces from video sequeces. Fially, we also apply the updatig-durig-recogitio scheme i usig the mosaic model. This scheme allows the mosaic model to be updated durig the test stage i order to ehace the modelig ad recogitio over time. 2

3 Declaratios Some part of the work preseted i this thesis have bee published i the followig articles: Xiaomig Liu, Tsuha Che ad Susa M. Thorto, Eigespace Updatig for No- Statioary Process ad Its Applicatio to Face Recogitio, Patter Recogitio, special issue o Kerel ad Subspace Methods for Computer Visio, Volume 36, Issue 9, pp , September Xiaomig Liu, Tsuha Che ad B.V.K. Vijaya Kumar, Face Autheticatio for Multiple Subjects Usig Eigeflow, Patter Recogitio, special issue o Biometric, Volume 36, Issue 2, pp , February Xiaomig Liu ad Tsuha Che, Pose Robust Face Recogitio Based o Mosaicig --A Example Usage of Face I Actio (FIA) Database, Demo sessio of the IEEE Iteratioal Coferece o Computer Visio ad Patter Recogitio 2004, Washigto, DC, 27th Jue - 2d July, Xiaomig Liu ad Tsuha Che, Geometry-assisted Statistical Modelig for Face Mosaicig, Proceedig of the IEEE Iteratioal Coferece o Image Processig 2003, Vol.2, pp , Barceloa, Spai, Xiaomig Liu ad Tsuha Che, Video-Based Face Recogitio Usig Adaptive Hidde Markov Models, Proceedig of the IEEE Iteratioal Coferece o Computer Visio ad Patter Recogitio 2003, pp , Madiso, Wiscosi, Jue 16-22, Xiaomig Liu ad Tsuha Che, Shot Boudary Detectio Usig Temporal Statistics Modelig, Proceedig of the IEEE Iteratioal Coferece o Acoustics, Speech, ad Sigal Processig 2002, vol.4, pp , Orlado, Florida, May 13-17, Xiaomig Liu, Tsuha Che ad B.V.K. Vijaya Kumar, O Modelig Variatios For Face Autheticatio, Proceedigs of the Iteratioal Coferece o Automatic Face ad Gesture Recogitio 2002, pp , Washito D.C., May 20-21, Tsuha Che, Yu-Feg Hsu, Xiaomig Liu, ad Wede Zhag, Priciple compoet aalysis ad its variats for biometrics, Proceedigs of the IEEE Iteratioal Coferece o Image Processig 2002, Vol.1, pp , Rochester, NY, Sep

4 Ackowledgmets I am greatly idebted to my advisor, Prof. Tsuha Che, who guided me through my whole PhD study. I could ot but deeply thakful for his cotiuous support, ecouragemet ad great ethusiasm. His isights have bee very helpful i guidig me to fid the right directio ad make progress i my research. He has always ecouraged me to stay focused o my research ad persevere with my ideas, especially whe the goig was tough. He himself is a great teacher ad speaker. Through our group meetigs ad may iteractios, I have leared may lessos o how to be a good speaker ad preseter, which to me will be very valuable for my future career. Beig his studet, I also lear a lot from his ethusiasm for research, work, ad life. I believe all of these will beefit ot oly my career but also my life. I have to say that I am really fortuate to have had him as my advisor. I would like to thak my secod advisor, Prof. B. V. K. Vijaya Kumar for his ecouragemet durig my study. His isights have greatly eabled me to have a deep uderstadig about my research problems ad to explore ew ideas. I would also like to thak Dr. Jie Yag ad Dr. Zhegyou Zhag for their iterest i my work, their patiece ad their valuable time i beig part of my thesis committee. I am grateful to them for their may isights ad commets that have led to the improvemet of this work. I am thakful to all my group mates ad frieds durig my study at Caregie Mello Uiversity. My group mates, Ta-chie Li, Deepak Turaga, Fu Jie Huag, Trista Che, Howard Leug, Jessie Hsu, Cha Zhag, Claire Fag Fag, Ed Li, Sam Che, Wede Zhag, Simo Lucey, Kate Shim, Jack Yu, Aviash Baliga, Michael Kaye, David Liu, Kubota Akira, ad Todd Stepheso have bee spedig time ad providig valuable feedbacks to my work. Thak my frieds Depeg Wu, Jigfeg Liu, Hogwei Sog, Yi Su, Ji Lu, Haotia Zhag, Chachai, ad Liu Re, for brigig the happy momets. 4

5 Fially, I would like to express my earest gratitude to my wife ad parets for their love ad support. I particular, I am deeply idebted to my wife, Big Yu. Without her sacrifice, support ad ecouragemets, there would ever have bee ay chace for this thesis to happe. 5

6 List of Figures Figure 1 Experimetal results of FRVT 2002 o the recogitio rate of differet variatios Figure 2 Geeratig a statistical face mosaic model from multiple images with differet poses Figure 3 Geometric mappig: the correspodig betwee oe pixel o the texture map ad oe poit o the surface of the ellipsoid Figure 4 Geometric mappig: rotate the ellipsoid ad obtai the correspodig pixel o the image plae Figure 5 Triagle represetatio for speedig up the texture mappig Figure 6 Sca-lie algorithm: fidig the correspodig pixel of oe lie i the destiatio triagle is equivalet to sca oe lie i the image place, whose slope is determied by the affie trasformatio parameters betwee these two triagles Figure 7 Geometry-assisted face recogitio: all traiig ad test images are coverted ito the texture map, ad the distace measure is calculated based o the overlap area betwee two texture maps Figure 8 Patch represetatio for the texture map: a texture map is evely decomposed ito a array of local patches Figure 9 Sample Images of oe subject from the PIE database: the image i the first row is the traiig image, while all the others are test images Figure 10 2D map of the similar values of oe patch (aroud the right eye) betwee ad pose c29 ad c Figure 11 Gaussia approximatio: each figure has two histograms (solid ad broke curves) ad two Gaussia approximatios (dotted curves); four figures are from the 6

7 two distributios of the same patch (aroud the right eye) with four differet poses, c29, c11, c14, c same Figure 12 Probabilistic modelig for patches: the first four colums are plots of µ, same diff µ, σ, σ for all eight test poses; the last colum is the fisher ratio of two diff i, j i, j i, j distributios for all eight poses; each row correspods to the statistical iformatio of each test pose, amely c34, c14, c11, c29, c05, c37, c02, c22 from top to bottom Figure 13 Recogitio performaces of four algorithms o the CMU PIE database based oe frot traiig image Figure 14 Applyig optical flow o images with differet expressios: left had side are two images from the same subject, ad right had side are two images from differet subject. Differet radomess patter ca be observed from two resultig optical flows Figure 15 Applyig optical flow o images with registratio errors: left had side are two images from the same subject, ad right had side are two images from differet subject. Differet radomess patter ca be observed from two resultig optical flows Figure 16 Five expressio images used for traiig a idividual eigeflow for this subject Figure 17 The first three eigeflows traied from expressio images of oe subject: Some promiet movemets of facial features, such as mouth corers, eyebrows, asolabial furrows, ca be see from them Figure 18 Labeled facial features: up to 25 feature poits are labeled o each traiig images Figure 19 Mappig ad averagig the positio of key poits: the positio of all key poits i the traiig texture maps (2 d row), which correspod to the same facial feature, such as the left eye corer, are averaged ad result i the positio i the fial model (bottom row) Figure 20 Computatio of patch s deviatio flow: each o-key patch falls ito at least oe triagle; the deviatio of a o-key patch is iterpolated by the key patch deviatio of oe triagle i, j 7

8 Figure 21 Traied geometric deviatio model (Top: mea, left: 1 st eigevector, right: 2 d eigevector) Figure 22 Traiig process of the appearace model for oe patch: the deviatio idicates where to fid the correspodig patch from each of traiig texture maps; all correspodig patches are treated as samples for traiig a statistical model Figure 23 The mea of two uiversal mosaic models (left: without the modelig of geometric deviatio, right: with the modelig of geometric deviatio) Figure 24 Computig the map-to-patch distace: the deviatio map builds up the patch correspodece betwee the model ad the test texture map; the distace measures from correspodig patches are feed ito the Bayesia framework to geerate a probabilistic distace measuremet Figure 25 Marked feature poits for three views Figure 26 3D feature poits ad fitted ellipsoid Figure 27 Mea images of three idividual mosaic models Figure 28 Recogitio performaces of three algorithms o the CMU PIE database based three traiig images Figure 29 Importace samplig: represet a o-gaussia PDF usig a set of samples ad correspodig weights (idicated by the size) Figure 30 Two steps for desity PDF propagatio: predictio step estimates the ew positio of each sample ad the weight assigmet step assig weights for each sample based o the observatio desity fuctio Figure 31 Mote Carlo method: a PDF is approximated by geeratig a set of samples with uiform weights Figure 32 Trackig via the Leveberg-Marquardt algorithm: the mappig parameter is iteratively adjusted i order to miimize the distace betwee the texture map ad the mosaic model Figure 33 Basics of video-based recogitio Figure 34 The differet betwee close-set ad ope-set recogitio usig the 2D codesatio method Figure 35 Propagatio steps for video-based recogitio Figure 36 FIA capturig sceario: multiple cameras are capturig faces while the subject is mimickig i goig through the airport passport checkig

9 Figure 37 The desig of the camera cart: six cameras are grouped ito three pairs ad mouted o a height-adjustable arm Figure 38 Camera cart ad lights: 3 light bulbs are used to create a ambiet lightig eviromet Figure 39 System cofiguratio: six cameras are coected to two IEEE-1394 buses o the computer; the SYNC uit sychroizes two buses Figure 40 A sample sapshot from 6 cameras: top images are from cameras with loger focallegth; bottom images are from cameras with short focal-legth; each colum are images from a pair of camera eighbor to each other Figure 41 Sample images of oe sequece i the FIA database: substatial pose variatio ca be observed from this database Figure 42 Trackig results based o the patch-pca mosaic model: horizotal ad vertical lie idicates the estimated pose i two directios Figure 43 9 traiig images from oe subject i the FIA database Figure 44 The meas of idividual model i three methods (left: Idividual PCA traied from 9 images, middle: mosaic model traied from 9 images, right: mosaic model traied from 1 image) Figure 45 The mea of the mosaic model beig updated durig the test stage of oe subject

10 List of Tables Table 1 Compariso of methods i iitializig the mosaic model Table 2 Recogitio error rate of differet algorithms Table 3 Recogitio performace of updatig the mosaic model

11 Table of Cotets THESIS Itroductio Our approaches Thesis structure Backgroud Template based face recogitio Pose robust face recogitio Video-based face recogitio Face Recogitio Usig Geometry-Assisted Probabilistic Modelig Geometrical mappig Geometry-assisted face recogitio Probabilistic modelig for patches Probabilistic geometry-assisted face recogitio Experimetal results Coclusios Face Mosaicig For Recogitio Flow represetatio for face recogitio Modelig the geometric deviatio Modelig the appearace

12 4.4 Face recogitio usig the statistical mosaic model Determiig the mappig parameters for traiig images Experimetal results Coclusios Video-based Face Recogitio Face trackig usig the mosaic model Face recogitio Face-I-Actio video database Experimetal results Face trackig Face recogitio Coclusios Face Recogitio via Updatig Mosaic Model Eigespace updatig with decayig memory Updatig based o the covariace matrix Updatig based o the ier-product matrix Updatig eigespace with missig data Experimetal results of updatig the mosaic model Coclusios Summary ad Future Directios Bibliography

13 1. Itroductio For decades huma face recogitio has bee a active topic i the field of object recogitio. A geeral statemet of this problem ca be formulated as follows: give still or video images of a scee, idetify oe or more persos i the scee usig a stored database of faces [10]. A system that performs face recogitio has may applicatios, such as oitrusive idetificatio ad autheticatio for credit card usage, oitrusive access cotrol to buildigs, ad idetificatio for law eforcemet. Comprehesive surveys of huma ad machie recogitio techiques ca be foud i [10][1][20][79]. A lot of algorithms were proposed to deal with the image-to-image, or imagebased, recogitio where both the traiig ad test sets cosist of still face images. There are two basic kids of face recogitio algorithms: oe is based o the feature matchig, such as Elastic Graphic Matchig [40]; the other is based o the template matchig, such as the eigeface approach [72], ad Liear Discrimiate Aalysis (LDA) [2]. I the latter, the eigeface approach, which applies Pricipal Compoet Aalysis (PCA) i the pixel domai, plays a fudametal role. It is widely cosidered as the baselie of may face recogitio algorithms. It has the advatage of fast computatio, stable performace for the case of frotal face recogitio with costraits o illumiatio, expressio variatios, etc. However, with existig approaches, the performace of face recogitio systems i practice is affected by differet types of variatios, for example, expressio, illumiatio, ad pose. 13

14 At least two observatios have bee made from the previous extesive studies. First, face recogitio is to deal with variatios. Researchers have studied how face recogitio is affected by differet kids of variatios, such as expressio [74][55][50], illumiatio [1], pose [78][58], agig [41], ad suglasses [80]. Amog these, pose variatio is the hardest oe to model ad therefore cotributes most of the recogitio error [20][61]. Because pose variatio results ot oly i shape variatio, but also i appearace variatio due to the chagig relatio betwee the illumiatio source ad the face. For example, as show i Figure 1, oe of the results from Face Recogitio Vedor Test (FRVT) 2002, the recogitio rate of pose variatio is much lower tha that of illumiatio variatio. Secod, face registratio is the key of face recogitio. This observatio is a direct cosequece of the first oe. I dealig with differet variatios, if we could register face images ito the caoical model, the recogitio task would be simpler. I traditioal image-based face recogitio, the face area is ormally cropped before feedig it ito the recogitio module. The importace of face registratio has bee overlooked i the literature. However, i video-based face recogitio, the face portio has to be registered from the video frame before ay recogitio ca take place Idoor Illumiatio 53 Outdoor Illumiatio 42 Pose Figure 1 Experimetal results of FRVT 2002 o the recogitio rate of differet variatios. 14

15 1.1 Our approaches I this thesis, we propose a itegrated approach to performig video-based pose robust face trackig ad recogitio usig a face mosaic model. As motivated by the research o video mosaics [68] ad figerprit mosaicig [37], we propose to model the facial appearace by costructig a mosaic model from multiple faces at various poses. Traditioally, the pose variatio is very difficult to model. We propose to use the geometry of a face to improve the mosaicig result. By approximatig a huma head with a 3D ellipsoid, each face image is the result of projectig the ellipsoid s certai portio o the image plae. Give a umber of face images uder various poses, as show i Figure 2, we map the face portio of each frame oto the surface of the ellipsoid usig the geometric mappig algorithm. Uwrappig the surface of the ellipsoid will result i a texture map, which has a α ad β coordiate system. I the mea time, istead of oe sigle texture map, a statistical model composed of a mea image ad a umber of eige-images, is traied by usig the uwrapped texture maps. I this thesis, we first preset how to perform pose robust face recogitio usig geometry-assisted probabilistic modelig. I our approach, all traiig ad test images are projected to the surface of a 3D ellipsoid by estimatig the optimal pose ad positio iformatio, ad represeted as texture maps. The distace measure is calculated i the overlap area betwee texture maps of the traiig ad test images. Also by represetig a texture map as a array of local patches, it eables us to develop a probabilistic model for comparig correspodig patches from a face database with pose variatio. We study how the discrimiative power of correspodig patches varies for differet poses. Evetually, we are able to utilize the Bayesia framework to evaluate the distace measure of correspodig patches. 15

16 Secod, we combie multiple images with pose variatio to build a statistical mosaic model, which is used for face recogitio. I our mosaic method, combiig multiple images is essetially combiig multiple texture maps. Sice the same facial feature, such as the mouth corer, foud i multiple maps might ot correspod to the same coordiate o the sigle texture map, the blurrig effect would be observed whe we combie multiple texture maps. To reduce such blurrig, oe key idea i our approach is to allow a patch to move locally toward better correspodig across multiple maps, use the flow represetatio for modelig the amout of movemet, ad trai the flow represetatio to form a geometric deviatio model via PCA. The beefit of this approach is that while we are obtaiig a less-blurrig facial appearace model from multiple views, we also form a geometric deviatio model, which models how the actual geometry of each idividual subject deviates to the 3D ellipsoid model. It is importat to use two models: oe for appearace variatios ad aother for geometric deviatio, especially whe a rough 3D ellipsoid model is used as the face geometry. We show that both the appearace ad the geometric model are useful for face recogitio. Third, sice our face mosaic model is a simple statistical model combiig both appearace ad geometric iformatio, we apply it for performig face trackig ad recogitio simultaeously. Give a test face sequece, we ca track faces usig the codesatio method [28] or the Leveberg-Marquardt algorithm [63], based o a face mosaic model. Both algorithms are tryig to estimate the optimal mappig parameter i order to miimize the distace measure betwee the test image ad the model. Face trackig ad recogitio ca be performed simultaeously by usig the codesatio framework. Fourth, due to a limited umber of traiig images, usually we caot trai a face mosaic model cotaiig eough statistical iformatio. To deal with this issue, we apply the updatigdurig-recogitio scheme [51] i video-based face recogitio. That is, by takig more images from a test sequece, which cotais pose variatio or expressio that have ot bee see, as the 16

traiig data, our mosaic model ca be ehaced ad evetually results i a better recogitio system. β Figure 2 Geeratig a statistical face mosaic model from multiple images with differet poses.

First is that istead of usig the cylidrical projectio, we use the ellipsoidal projectio, which works more aturally with the head motio i both horizotal ad vertical directios.

a more sophistical statistical model to represet differet types of variatios, compared to usig oly oe template image.

17 traiig data, our mosaic model ca be ehaced ad evetually results i a better recogitio system. β Figure 2 Geeratig a statistical face mosaic model from multiple images with differet poses. α I literature, a few papers propose techiques similar to face mosaicig [11][42]. Compared to them, our method has a umber of ovelties. First is that istead of usig the cylidrical projectio, we use the ellipsoidal projectio, which works more aturally with the head motio i both horizotal ad vertical directios. Secod is that while traditioal mosaic algorithms usually result i oly oe texture map image, our method geerates oe statistical model with both the mea image ad a umber of eige-images, which provides a more sophistical statistical model to represet differet types of variatios, compared to usig oly oe template image. Third, we represet the mosaic as a set of patches ad lear a probabilistic model for the similarity measure betwee the correspodig patches of the mosaic model ad the test image. Fourth, while traditioal mosaic algorithms assume a plaar relatio amog multiple images, we use the ellipsoid model together with the deviatio modelig, which results i better matchig amog multiple texture maps. 17

18 Comparig to other approaches i pose robust face recogitio, we ca imagie there is oe dimesio measurig how much the face recogitio algorithm are relyig o the geometric iformatio. The approach of modelig dyamic ad usig mappig fuctios are o the extreme ed of this dimesio sice they do ot use ay geometric iformatio. While Blaz ad Vetter s work [5] is o aother extreme ed of this dimesio because they use a very sophistical shape model. Kaade ad Yamada [32] s work use a little geometric iformatio because they register the face based o three facial features. Comparig to them, our approach uses less geometric iformatio tha Blaz ad Vetter s work, but more tha Kaade ad Yamada s work. Researchers always assume that the better modelig leads to the better recogitio performace. However, the price we have to pay for a more sophistical modelig is that the model fittig will become too difficult. For example, i [5], both the traiig ad test images are maually labeled with 6 to 8 feature poits. O the other had, we believe that ulike the rederig applicatios i computer graphics, we might ot eed a very sophistical geometric model for face recogitio applicatios. The beefit with a simpler geometric model is that the fittig will ted to be easier ad automatic, which is the goal of our approach. Although our approach uses more geometric iformatio tha [32], which eeds to track three feature poits for ay test images, we cosider our approach ca fit the model more reliable sice we use the appearace iformatio of all face portio to register the face accordig to the model. Compared to AAM [16][56], our mosaic model ca model much large pose variatios because i AAM oe of the meshes ca be occluded durig the model fittig, which greatly costrais the possible pose variatio. Comparig to Zhou et. al. s work [82], our approach uses a sophistical face mosaic model which ca take care of pose variatio better tha the traditioal PCA model used i [82]. I additio, we utilize the idea of statistical learig i updatig the face mosaic model, which greatly improves the face recogitio, especially whe there are very few traiig images to begi with. 18

19 1.2 Thesis structure The remaiig of the thesis is orgaized as follows. Chapter 2 itroduces the backgroud of huma face recogitio. We survey the previous work o three major problems i face recogitio: template based face recogitio, pose robust face recogitio, ad video based face recogitio. I Chapter 3, we itroduce how to geerate a texture map from a face image based o a kow mappig parameter. We preset a method of learig a probabilistic model for comparig correspodig patches from a face database with pose variatio, ad how to apply it for pose robust face recogitio. I the experimets, we show that the probabilistic model ca improve the pose robust face recogitio. Comparig with the baselie algorithm, we observe a sigificat improvemet whe performig experimets o the CMU PIE database. Chapter 4 presets a method of combiig multiple traiig images for traiig a statistical mosaic model. A geometric deviatio model is traied i order to have a better matchig amog patches from multiple texture maps. We show a improvemet of usig the deviatio model from pose robust face recogitio experimets. Chapter 5 itroduces the mosaic based face trackig ad recogitio from video sequeces. Give a face mosaic model ad a test sequece, we itroduce two methods of performig face trackig: the codesatio method ad the Leveberg-Marquardt algorithm. We also preset our effort i collectig a face video database. Experimetal results of both trackig ad recogitio from video sequeces are show. Chapter 6 presets how to apply a updatig-durig-recogitio scheme i usig the mosaic model. Differet methods of subspace updatig are preseted. We show that by usig the updatig, pose robust face recogitio ca be greatly improved based o oly oe frot traiig image. 19

20 Fially, Chapter 7 cocludes this thesis ad poit out the cotributios. Also we provide iterestig extesios for the work described i this thesis. 20

21 2. Backgroud Huma face has bee a iterestig research topic for decades. May promisig topics are explored based o the iteractio betwee the face ad the computer, such as face modelig [81], face aimatio [42], face redig [6], face detectio [77][66][38], ad face recogitio [10][1][20][79]. As we metioed before, comprehesive surveys of huma ad machie face recogitio techiques ca be foud i [10][1][20][79]. Thus, this chapter does ot ited to give a detailed survey of all previous work i the huma face recogitio. Rather, we would like to focus our attetio o three major problems i face recogitio: template based face recogitio, pose robust face recogitio, ad video based face recogitio. 2.1 Template based face recogitio Huma face recogitio has a log history i the visio commuity. The first major attempt is made i Kaade s Ph.D thesis i 1973 [31], which tries to recogize faces via the distributio of facial feature poits. There are two basic kids of face recogitio algorithms: oe is based o the feature matchig, such as Elastic Graphic Matchig [40]; the other is based o the template matchig, such as the eigeface approach [72], Liear Discrimiate Aalysis (LDA) [2]. The eigeface approach has the beefit of fast computatio, easy implemetatio ad good performace i ormal coditios. Sice the birth of the eigeface approach, the template based 21

22 approach has become more domiat tha the feature based approach. I the later chapters, we also use the eigeface approach as oe of the baselie algorithms. The details of this algorithm are preseted i this sectio. Suppose there are M traiig face images for each of the K subjects. Let each face image I ( x, y) be a 2-dimesioal N-by-N array of pixel values. A image may also be represeted (after scaig) as a vector of dimesio N 2, where each image correspods to a sigle poit i the N 2 -dimesioal image space. Let us deote each face image of the traiig set as f ij, a N 2 1 vector, where i ad j deote the subject idex ad the face image idex respectively, ad 0 i K 1, 0 j M 1. The average face vector g is defied by g = M K K 1 M 1 1 i= 0 j= 0 The differece betwee each traiig face ad the average is deoted by the vector s ij = f ij g. These differece vectors form a N 2 MK matrix, A s, s,..., ]. We apply f ij = [ s K 1, M 1 PCA to these differece vectors by fidig a set of Q orthoormal eigevectors, u, correspodig to the largest eigevalues of the matrix T AA, i.e., AA T u = λ u, =0,1,, Q 1, (1) where λ 0, λ 1,, λ Q 1 are oegative ad i a decreasig order. However, the matrix T AA is N 2 by N 2, ad determiig N 2 eigevectors ca be computatioally itesive. Usually the umber of traiig faces, M K, is much smaller tha N 2. So we first determie the eigevectors, ' u, of a MK MK matrix A T A, i.e., A T Au ' = λ u (2) ' Pre-multiplyig (2) by A ad comparig to (1), we ca see that u ' 1/ 2 = Auλ. These eigevectors form a orthoormal basis set of a ew feature space, called the eigespace. 22

23 Essetially it is a subspace represetatio of all the faces. Thus, we ca trasform each face image, f ij, from the image space to the eigespace as follows: T w u ( f g) =0,1,, Q 1 (3) = ij Each face image ca be described as a vector p i the eigespace. T ij = [ w 0, w1,..., w Q 1] Durig the test stage of the recogitio system, a test face image f is projected to the eigespace, as i (3), to obtai the projected vector p. The the earest eighbor classifier is used to determie a subject whose p ij has the miimal distace to p. The eigevector determiatio ca be computatioally expesive whe the umber of traiig images is large. The power method [26] is oe approach to efficietly determiig the domiat eigevectors. Istead of determiig all the eigevectors, the power method obtais oly the domiat eigevectors, i.e., the eigevectors associated with the largest eigevalues. Essetially the eigespace provides a low dimesioal liear subspace for describig the facial appearace. All recogitio tasks are performed i this subspace istead of the origial pixel domai. However, the objective i the dimesio reductio is to best represet the origial data set i the mea squared sese, which might ot be optimal i terms of classificatio. This observatio leads to oe directio of improvig the role of PCA i face recogitio: to treat classificatio as the criterio of costructig a subspace. Fisherface [2] ad its subsequet work [12][77] are oe attempt alog this directio. They basically combie PCA ad LDA to geerate a subspace that is optimal i terms of liear classificatio. Aother attempt is to use kerel methods together with PCA [47][35] or LDA [48] to lear a subspace. The secod directio of extedig PCA is to apply PCA o feature domais other tha pixels. For example, we ca apply PCA o the optical flow betwee two images, which results i a expressio robust face recogitio system [50]. Chug et. al. [15] apply PCA o the Gabor filter resposes, ad the ew algorithm works better for illumiatio ad pose variatio. 23

24 2.2 Pose robust face recogitio As we metioed before, amog the variatios that have bee extesively studied, pose variatio is the hardest. Let us review the previous work i dealig with the pose variatio i face recogitio. There are differet types of approaches for pose robust/ivariat face recogitio. The first type of approach is to lear the dyamics/trajectories from images with cotiuous pose variatio. Ad the such trajectories ca be used for recogizig faces from image sequeces [43][3][52]. The trajectory is represeted by either a curve or a surface. Notice this is also oe typical approach i the literature of video-based face recogitio. Oe drawback with these approaches is that certai applicatio sceario, where the subject shows cosistet motio i both traiig ad test video sequeces, has to be assumed, i order to make the dyamic to be meaigful. This assumptio is ot true i geeral, but might be true for specific tasks, which limits the popularity of this type of approaches. The secod type of approach is to treat the whole face image uder a certai pose as oe sample i a high dimesio space, ad lear the relatio betwee a frot pose image ad ofrot pose images by costructig a mappig fuctio betwee them. Give a test image with a arbitrary pose, a recogitio-by-sythesis approach ca be applied. That is, we ca either trasform this test image ito the frot view [43], or trasform each of the traiig images ito the same pose as the test image [58], based o the leared mappig fuctio. Oe potetial problem with this type of approaches is that it is ot clear whether the relatio amog multiple pose images could be approximated as a simple fuctio, such as a liear trasformatio [43]. The eige light-field method [23] ca also be classified as this type of approach. Sice a face image is pretty complex ad differet parts of the face might trasform i a differet maer uder varyig poses, researchers start to look at faces as a set of parts/patches [54][32]. Kaade ad Yamada [32] coduct a systematic aalysis o how the discrimiative 24

25 power of differet parts o huma faces chages accordig to differet poses, ad such aalysis leads to a probabilistic approach to pose ivariat face recogitio. I this thesis, we propose to use the patch represetatio for texture maps. There are at least three beefits of usig the patch represetatio istead of the origial texture maps. First, the patches represetatio eables use to build a probabilistic model for the similarity betwee correspodig patches. Thus each patch could be treated differetly accordig to its discrimiative power. Secod beefit is that the variatio of the facial patch is simpler ad thus more likely to be modeled with a certai fuctio or a probabilistic model. Third, whe multiple texture maps are combied together, the patches are allowed to move locally i order to have a better correspodig amog multiple texture maps, which compesates the ot-perfect assumptio o the 3D ellipsoid geometry of the huma head. Sice we are dealig with pose variatio, which is a result of the huma head s geometry projected differetly, aturally researchers would rely o the geometric iformatio to aid the pose ivariat face recogitio. If we imagie there is a dimesio idicatig how much geometric iformatio is used for recogitio, we ca place may algorithms alog this dimesio. May algorithms, such as the oes i the first ad secod type of approaches, do ot use ay geometric iformatio. Others do make use of geometric iformatio by assumig a particular head model. For example, a cylider head model is widely used i face trackig [9][7]. However, the cylider model does ot act aturally for head od. Thus we propose a spherical head model to ehacig the modelig [53]. I this thesis, we will use a 3D ellipsoid as the head model based o the cosideratio that the huma head does have differet width, height, ad depth. Blaz ad Vetter s approach [5] is i the extreme ed of this dimesio sice they use perfect 3D geometric iformatio of huma heads. Based o a large set of face images, they trai two subspace models for facial texture ad shape respectively. Give a test image uder ay pose ad lightig, they ca fit the image with two models by tuig the coefficiets i the models. Fially, the model coefficiets are used for recogitio. 25

26 Researchers also take care of pose variatio via face registratio. Oe commo way i face registratio is to detect the facial features, such as eyes, mouth, etc, ad apply trasforms based o the correspodig facial features i images uder differet poses [3]. Active Appearace Models (AAM) [16][56] has bee used for face trackig ad recogitio, where the face is modeled by a triagulated mesh structure. Oce the mesh could be fitted to a face image, this face image is registered with a caoical model. Of course, [5] is also a sophistical approach of registerig face images. 2.3 Video-based face recogitio To improve face recogitio, recetly the researchers start to look at video-based face recogitio [44][18][82][51], where the test images are video sequeces cotaiig faces. Recet psychophysical results show that a huma makes use of facial motio iformatio for face recogitio [70]. Video-based face recogitio has several advatages over image-based face recogitio. First, the geometric iformatio ca be explored give cotiuous video sequeces showig differet poses of huma faces, which helps to hadle pose variatio. Secod, the motio iformatio of faces ca be utilized to facilitate the recogitio task. For example, the subject-depedet dyamic characteristics ca help face recogitio [52]. Third, give the fact that i video sequeces most of face variatios are preset i a cotiuous fashio, video-based recogitio allows the learig or updatig of the subject s model over time. For example, we propose a updatig-durig-recogitio scheme, where the curret ad past frames i a video sequece ca be used to update the subject s models to improve recogitio results for future frames [51]. Furthermore, most practical face recogitio systems actually take video sequeces uder certai sceario as the iput. Thus, it is very atural to take advatage of the video iformatio from the iput, istead of just selectig certai frames ad performig image-based recogitio. 26

27 There are ot much previous approaches ca be cosidered as video-based recogitio yet. Most of the previous video-based algorithms simply apply image-based recogitio to each frame, ad average the frame based scores to obtai the fial similarity betwee the sequece ad the model. We still cosider this type of approaches as image-based recogitio. Oe type of video-based recogitio is to model the dyamics i video sequeces [43][3][52], where face trackig ad recogitio are two separate steps. Normally face recogitio is performed after the trackig is fiished, which still igores the face registratio issue. Recetly, aother tred i video-based face recogitio is to simultaeously perform face trackig ad recogitio give a test sequece. For example, Zhou et. al. [82] apply the codesatio method for video-based face recogitio. I their case, sice a simple PCA model is used i modelig facial appearace, the trackig performace i dealig with pose variatio will be affected. 27

28 3. Face Recogitio Usig Geometry-Assisted Probabilistic Modelig As we metioed before, the most difficulty variatio for face recogitio is pose variatio. The difficulty is that the itra-subject variatios are as large as, or eve larger tha the iter-subject variatios whe pose variatio is preset. To improve face recogitio uder pose variatio, we preset a geometry-assisted probabilistic approach. We approximate a huma head with a 3D ellipsoid model, where ay face image is a projectio of such a 3D ellipsoid at a certai pose. I this approach, both traiig ad test images are projected back to the surface of the 3D ellipsoid, accordig to their estimated poses, to form texture maps. Thus the recogitio ca be coducted by comparig the texture maps istead of the origial images, as doe i traditioal face recogitio. The geometrical mappig could be treated as oe way of compesatig pose variatio ad reducig the itra-subject variatios. I additio, we represet a texture map as a array of local patches, which eables us to trai a probabilistic model for comparig correspodig patches. I this chapter, we first itroduce how to geerate a texture map from a face image based o a kow mappig parameter. The we preset a method of learig a probabilistic model for comparig correspodig patches from a face database with pose variatio, ad how to apply it for pose robust face recogitio. I experimetal results, we show that the probabilistic model ca improve the performace of pose robust face recogitio. 28

29 3.1 Geometrical mappig If we compare two face images of the same subject captured at two differet view agles, the pixel-by-pixel differece is relative big because these two images are ot registered/aliged with respect to each other. This is also the reaso why the traditioal eigeface approach does ot work well for face images with pose variatio. Image registratio is a way to fix this problem, i.e., the compariso should oly be coducted after two images are registered. Cosiderig the fact that a huma head has the o-plaar geometry, oe way to register face images is to project them back to the surface of a 3D ellipsoid from each of the specific poses. This procedure of projectio is called geometrical mappig. Geometrical mappig is a key compoet i our proposed algorithm. I this sectio, we itroduce how to geerate a texture map s from a face image f, give a kow match parameter x. I the followig chapters, we will preset how to estimate the mappig parameter x by various methods. Three assumptios are made. First, a huma head is a 3D ellipsoid with radius to be r x, r y, ad r z. Secod, a face image is captured with a weak perspective camera model [20] ad a camera focal legth equals to oe. Third, all images are captured uder the ambiet lightig eviromet. Uder these assumptios, we use a mappig parameter x to describe the relatio betwee a face image ad its correspodig texture map. The mappig parameter x is a 6- dimesioal vector x = [ c c v h d R R R ] α β T χ, where c v ad c h idicate the ceter of the face area i the face image, d idicates the average distace betwee the face ad the camera, ad R α, R β ad R idicate the rotatio of the huma head with respect to the XYZ axis. As we χ ca see, the mappig parameter x icludes all the iformatio for locatig a face, as well as geeratig the texture map from the face image. 29

30 Let a huma head cetered at the origi of a XYZ coordiate system ad the frot face look at the positive Z axis. Thus differet views of huma faces ca be obtaied by fixig the camera ad rotatig the huma head with certai degrees i various directios. To geerate a texture map s from f, essetially for each pixel, s ( α, β ), we eed to fid its correspodig coordiate, f ( v, u), by kowig the mappig parameter x, which is followed by a biliear iterpolatio [36] to fill i the itesity of pixel s ( α, β ). The parameters v ad u are the axis of the origial image; α ad β are the axis of the texture map. As show i Figure 3 ad Figure 4, there are basically two steps for this mappig. First, a pixel s ( α, β ) i the texture map correspods to oe poit p, p, p ) o the surface of a sphere, whose radius is oe: ( x y z p p x z = si( α) si( β ) p y = cos( α) = si( α) cos( β ) (4) As show i the right illustratio of Figure 3, the sphere is the coverted ito a ellipsoid by stretchig each radius accordig to r x, r y, ad r z : p p p x y z = r x = r y = r z p p p x y z Secod, we ca rotate the head ellipsoid by R α, R β ad R χ with respect to the XYZ axis. As show i Figure 4, the poit o coordiate p, p, p ) moves to a ew coordiate p, p, p ) by the followig equatio. ( x y z ( x y z p x cos( Rχ ) = p y si( Rχ ) p z 0 si( R ) cos( R ) 0 χ χ cos( R ) α si( R ) α 0 cos( Rβ ) si( Rα ) 0 cos( R ) α si( Rβ ) si( Rβ ) p x 0 p y cos( R ) β p z 30

31 The we project the coordiate p, p, p ) oto the image plae by usig the weak ( x y z perspective camera model, ad traslate the resultig coordiate by c v ad c h i both vertical ad horizotal directios. v = u = p y d p d x + c v + c h (5) Fially, we obtai the ew coordiate ( v, u) i the image coordiate. By judgig whether p, p, p ) is facig the positive Z axis or ot, it ca tell us whether ( v, u) is a valid coordiate ( x y z i the image plae. If it is, the biliear iterpolatio result of ( v, u) is filled i as the itesity of the pixel s ( α, β ). Otherwise s ( α, β ) is cosidered as a missig pixel ad its itesity is set to be zero. To compesate the lightig variatios, we also ormalize the mea of the itesity of all o-missig pixels to be 128. Y Y r y (p x,p y,p z ) β O α rx r y r z r x O α β X r z β X α Z Z Figure 3 Geometric mappig: the correspodig betwee oe pixel o the texture map ad oe poit o the surface of the ellipsoid. 31

32 Y Y r y (p x,p y,p z ) r y (p' x,p' y,p' z ) u r x O α R β R α R χ r x O d r z β X r z X c v c h Z Z v Figure 4 Geometric mappig: rotate the ellipsoid ad obtai the correspodig pixel o the image plae. Oe issue i the above mappig is how to determie the radius of a huma head ellipsoid, r x, r y, ad r z, which is essetially the height, width ad depth of the huma head. Sice we already iclude d i the mappig parameter, ay oe of the three radiuses, for example, the width r x, ca be set to be oe. Thus we oly eed to determie the ratio betwee the width ad the depth, ad the ratio betwee the width ad the height. I our algorithm, the former is set to be a fix costat 0.9 by cosiderig that the head s depth is slightly larger tha the head s width, while the latter is usually obtaied from the exteral sources, such as a face detector or maual labelig of a frot face image. Oce we obtai these two ratios, they are assumed to be costat for the same subject. Of course, we ca also treat these two ratios as two additioal elemets i the mappig parameter x, ad estimate them usig the same framework of estimatig x, which will be itroduced i future chapters. Sice the geeratio of the map is a essetial step i our mosaicig algorithm, the efficiecy of this step will affect the speed of face trackig/recogitio. This step ca be computatioally itesive if every pixel i s eeds to fid its correspodig positio i f. To solve this problem, we approximate the mappig usig a triagular mesh, as show i Figure 5. 32

33 That is, the texture map s is represeted as a set of triagles; for the vertexes of these triagles, we derive their correspodig coordiates i f usig the above mappig equatios. The the mappig betwee two triagles ca be approximated by a affie trasformatio, whose six parameters are estimated via three correspodig vertexes. For the pixels iside each triagle, the sca-lie algorithm [73] ca be used to quickly fid the correspodig pixels. For example, i Figure 6, for each triagle i the destiatio texture map s, the correspodig pixels of a vertical colum is lyig o a lie i the triagle of the source image f, whose slope, d, d ), is ( x y determied via the affie trasformatio. The goal of this approximatio is to speed up the geometrical mappig while ot oticeably affectig the recogitio performace. The choice of triagle size is a trade-off betwee the mappig speed ad the mappig precisio. If the triagle is larger, the mappig is faster while the precisio is also lower. If the triagle is too small, we do ot gai much i speedig up the geometrical mappig. I our implemetatio, the triagle size is 4 by 4 pixels. Source f Destiatio s u β v α Figure 5 Triagle represetatio for speedig up the texture mappig 33

34 Source triagle Destiatio triagle d y d x Figure 6 Sca-lie algorithm: fidig the correspodig pixel of oe lie i the destiatio triagle is equivalet to sca oe lie i the image place, whose slope is determied by the affie trasformatio parameters betwee these two triagles. 3.2 Geometry-assisted face recogitio I may face recogitio systems, there is oly oe face image, ormally the frot view face image, durig the traiig stage. However, i the test stage, there might be test images that correspod to differet poses of huma faces. This is a hard problem because the same subject looks very differet uder various poses. I this sectio, we preset our geometry-assisted approach to deal with this case. As show i Figure 7, give a face database with L subjects, there is oly oe frot view image, f ( l = 1,2, L, L), for each subject that is available for traiig. Durig the traiig stage, l we estimate the optimal mappig parameter x l for each traiig image f l based o a uiversal mosaic model, which will be described i the ext chapter. Essetially this optimizatio process is tryig to miimize the differece betwee the uiversal mosaic model ad the texture map cotrolled by the mappig parameter, which provides the iformatio about the positio of the face, the distace of the face, ad the pose. Notice that some of the parameters might be kow from exteral sources. For example, if we kow all traiig images are frot view, the pose parameters, R α, R β ad R, are kow to be zero. Oce the optimizatio is doe, the χ 34

35 correspodig texture map s l is geerated from each traiig image f l. It is obvious that i the texture map s l, oly part of the pixels are valid iformatio of the appearace, while the rest are missig pixels sice each face image oly correspods to oe portio of the 3D huma ellipsoid s surface. To describe this missig pixel iformatio, we also geerate a mask map, a l, which has the same dimesio as the texture map s l. For all missig pixels i s l, the correspodig pixel i a l is zero ad the others are oe. Durig the test stage, give oe test image f t, first we estimate the optimal mappig parameter based o the uiversal mosaic model. Secod, the resultig texture map s t ad mask map a t are compared with each of the traiig texture maps as the followig: d l 1 2 = ( s t s l ) o a t o a l (6) a o a t l where o refers to the elemet-wise multiplicatio. Basically d l is the ormalized mea-squareerror betwee the overlap area of the test texture map s t ad the traiig texture map s l, ad a t o a l idicates the size of the overlap area betwee two texture maps. There is a degeeratio case whe the two texture maps have a very small overlappig area, which leads to small d l. Because i our estimatio algorithm, the mappig parameter chages slowly, there is a very low chace that we will fall ito this degeeratio case. Evetually, the test image is recogized as the subject with the miimal d l. 35

36 MSE MSE Mi MSE Trai Test Figure 7 Geometry-assisted face recogitio: all traiig ad test images are coverted ito the texture map, ad the distace measure is calculated based o the overlap area betwee two texture maps. 3.3 Probabilistic modelig for patches Researchers have cosidered that differet parts of a huma face cotribute differetly to face recogitio. For example, Petlad et. al. [58] propose to use modular eigespaces to model the appearace of facial features, such as eyes, mouth, etc. Kaade ad Yamada [32] perform discrimiative aalysis for all sub-regios i a face area ad obtai a pose robust face recogitio algorithm. 36

37 We exted the idea of sub-regio aalysis ad applied it to the geometry-assisted approach. As show i Figure 8, for each texture map s l, we represet it as a array of local patches s l i, j. There are a umber of beefits of usig the patch represetatio istead of the origial whole texture map. First, whe combiig differet texture maps from multiple poses to geerate a map that covers larger pose views, patches ca be moved locally to fid better matchig with other poses. Hece the movig of local patches compesates whe the assumptio of the ellipsoid huma head is ot perfect. We will further utilize this beefit ad propose ew algorithms i the ext chapter. Secod, istead of treatig each pixel equally by usig (6), we ca modify the similarity value of each patch accordig to the pose chages. I the meatime, a probabilistic model ca be traied to model such chages ad improve face recogitio uder pose variatio. Notice that after a texture map is decomposed ito patches, i the boudary area of the face portio, there are some patches icludig partial missig data. For simplicity, we treat all these patches as missig data. Cosiderig the fact that the patch size is ot too big ad also the boudary area is heavily up-sampled from the origial image domai, the simplificatio is egligible sice we oly discard a very small amout of boudary pixels. I our implemetatio, the patch size is 4 by 4 pixels ad the texture map s size is 90 by 180 pixels. Thus there are 22 ad 45 patches i the vertical ad horizotal directios respectively. The selectio of the patch size is a trade-off. If the patch size is too big, we lose the beefit of modelig local appearace ad we could ot model eough patch variatio with respect to the varyig pose. O the other had, if the patch size is too small, it is harder to fid correspodig amog patches from multiple texture maps. From the experimets, we fid 4 by 4 is a good choice for the patch size. Also we thik it is ot ecessary to overlap patches sice we will allow patches move aroud locally, which will be itroduced i the ext chapter. 37

The PIE database cosists of face images of 68 subjects uder differet combiatios of poses ad illumiatios. We use part of this database i this thesis, which are 9 pose images from 68 subjects.

38 Figure 8 Patch represetatio for the texture map: a texture map is evely decomposed ito a array of local patches. Let us itroduce how to trai a probabilistic model for the similarity value of patches from a face database with pose variatio. I this thesis we trai such a model o the CMU PIE database [24]. The PIE database cosists of face images of 68 subjects uder differet combiatios of poses ad illumiatios. We use part of this database i this thesis, which are 9 pose images from 68 subjects. These are the images with multiple poses uder the eutral illumiatio. Sample images from oe subject ca be see from Figure 9, where the umber, c27, c34, c14, c11, c29, c22, c02, c37, c05, is the pose labels for each image. We choose c27 as the traiig pose ad the other eight poses as the test poses. We take 9 pose images of 34 subjects for traiig the probabilistic model. We deote each of the images as f ( l, φm), where φ m is oe of the eight pose labels. For the traiig process, the mappig parameters of all images are estimated based o the uiversal mosaic model. Thus we ca obtai the texture maps of all images, ad have the patch represetatio as s ( l, φ ), where i ad j are the idex of patches vertically ad horizotally. i, j m Sice we treat the frot view, c27, as the traiig images, we eed to study how the similarity values of correspodig patches betwee c27 ad all other eight poses chage. This is doe by fixig oe patch ad oe particular pose, ad calculatig the similarity value (measquare-error) of oe patch betwee all subjects i the pose c27 ad all subjects i that particular pose. For example, Figure 10 is the result of such a calculatio for oe patch closer to the right eye ad the pose c29. I this 2D map, the vertical axis represets all the traiig images, 34 38

39 subjects uder the pose c27, while the horizotal axis represets all test images from 34 subjects uder the pose c29. Each etry idicates the similarity value of the same patch betwee ay pair of subjects. For each combiatio of all other patches ad other eight test poses, we should geerate oe such 2D map. Ideally we should expect that the diagoal elemets of this 2D map are darker tha the off-diagoal elemets because the former is a idicatio of the itra-subject variatios, while the latter is a idicatio of the iter-subject variatios. I order to verify such expectatio, we plot the histogram of the diagoal elemets ad off-diagoal elemets separately. Also, for explicitly modelig these two types of variatios, we approximate them as two Gaussia distributios. That is, we estimate the mea ad stad deviatio of itra-subject variatios from the diagoal elemets, ad the mea of ad stad deviatio of iter-subject variatios from offdiagoal elemets. The resultig two distributios ca be deoted as followig: same 1 1 di, j µ i, j P( di, j same, φ m) = exp ( ) same same 2πσ, 2 σ i j i, j P d diff 1 1 d i, j µ i, j 2 diff, φ m) = exp ( (7) diff 2πσ, 2 σ i j i, j ( i, j ) diff 2 where same µ, σ, same i, j i, j diff i, j i, j diff µ, σ are the mea ad stad deviatio of itra-subject ad itersubject variatios for the patch ( i, j) uder the test pose φ m. Let us deote the probabilistic same diff same diff model as P = {{ µ, µ,, σ,, σ, } }. Notice that all four parameters deped o the test d i, j i j i j i j φm pose φ m. For example, the first plot o the left of Figure 11 is the Gaussia approximatio of two distributios i Figure 10. The solid ad broke curves are the histograms of two distributios, ad the dotted curves are the approximated two Gaussia distributios. The four figures i Figure 11 are from the two distributios of the same patch with four differet test poses: slightly right (c29), more right (c11), further right (c14), profile (c34). We ca see that as the pose chages 39

40 from the frot view to the profile view, the discrimiative power is gettig less, which is a useful observatio ad should be take ito accout durig the recogitio. To illustrate the relatio amog these parameters for all test poses, we plot them i Figure 12. I total, there are five colums ad eight rows, where each row correspods to the statistical iformatio of each test pose, amely c34, c14, c11, c29, c05, c37, c02, c22 from top to bottom. The first four colums are the plots of µ, same i, j same diff µ, σ, σ for all eight test diff i, j i, j i, j poses. The itesity of each pixel idicates the value of parameter. The brighter the itesity it, the larger the value is. I order to compare the differece betwee these two distributios, we ormalize the itesity of the first ad secod colum, as well as the itesity of the third ad fourth colum. Naturally, we ca observe that the secod colum, µ, is brighter tha the first diff i, j colum, µ, ad the fourth colum, σ diff, is brighter tha the third colum, σ same, which same i, j i, j i, j meas the iter-subject variatios have larger mea ad stad deviatio tha those of the itrasubject variatios. The last colum is the Fisher ratio [17] betwee two Gaussia distributios defied as followig: f i, j ( µ = σ diff i, j same 2 i, j µ same i, j + σ ) 2 diff 2 i, j Sice the fisher ratio is a good idicatio of the discrimiative power, we ca study amog all patches i the texture map, which patches provide more discrimiative power tha the others. From the last colum of Figure 12, we observe that the ose ad forehead seem to have more discrimiative power. This observatio might ot be true i geeral. However, it seems to be a right coclusio for this particular dataset. 40

27 34 14 11 29 05 37 02 22 Figure 9 Sample Images of oe subject from the PIE database: the image i the first row is the traiig image,

41 Figure 9 Sample Images of oe subject from the PIE database: the image i the first row is the traiig image, while all the others are test images. Figure 10 2D map of the similar values of oe patch (aroud the right eye) betwee ad pose c29 ad c27 41

Figure 11 Gaussia approximatio: each figure has two histograms

curves); four figures are from the two distributios of the

first four colums are plots of µ, µ, diff σ, σ for all eight

distributios for same i, j i, j all eight poses; each row

42 Figure 11 Gaussia approximatio: each figure has two histograms (solid ad broke curves) ad two Gaussia approximatios (dotted curves); four figures are from the two distributios of the same patch (aroud the right eye) with four differet poses, c29, c11, c14, c34. same diff Figure 12 Probabilistic modelig for patches: the first four colums are plots of µ, µ, diff σ, σ for all eight test poses; the last colum is the fisher ratio of two distributios for same i, j i, j all eight poses; each row correspods to the statistical iformatio of each test pose, amely c34, c14, c11, c29, c05, c37, c02, c22 from top to bottom. i, j i, j 42

43 3.4 Probabilistic geometry-assisted face recogitio After itroducig how to trai a probabilistic model, let us focus o how to utilize it for improvig pose robust face recogitio. Give a face database with L subjects, oly oe frot view image, f l, of each subject is available for traiig. Durig the traiig stage, the mosaic algorithm estimates the optimal mappig parameter x l for each traiig image f l based o the uiversal mosaic model. The resultig texture map is represeted as a array of local patches, s. l i, j Give a test image, we geerate its texture map s t i, j based o the uiversal mosaic model. For the test texture map s t i, j ad oe of the traiig texture map s l i, j, we compute the similarity values of all correspodig patches, d }. Sice we have developed the probabilistic { i, j models of similarity values of each local patch, it eables us to properly combie these similarity values, oe computed for each patch, to reach to the local decisio for recogizig whether the two texture maps/faces are from the same subject or ot. Give the similarity values ad the pose of the test image, the posteriori probability that the test image ad the traiig image belog to the same subject is: P( same d i, j p( di, j same, φt ) P( same), φt ) = (8) p( d same, φ ) P( same) + p( d diff, φ ) P( diff ) i, j t where φ t is the pose of the test image, which ca be obtaied durig the estimatio of the mappig parameter, P (same) ad P (diff ) are a priority probability of beig the same subject or i, j t ot give ay test image. For a database with L subjects, ormally we ca set 1 P( same) = ad L 43

44 L 1 P( diff ) =. Notice that i order to calculate p( d i, j same, φ t ) usig (7), φ t eeds to be equal L to oe of the test poses φ m. This issue ca be dealt with i two differet ways. First, if the pose of the test image φ t is similar to oe of the eight test poses φ m, we ca approximate φ t usig the most similar test pose. Secod, if φ t is ot similar to ay oe of test poses φ m, we ca compute the margial distributios of (8) over φ m : p ( di, j same) = P( φ m) p( di, j same, φm) m p ( d i, j diff ) = P( φ m ) p( d i, j diff, φm ) m p( d i, j same) P( same) P( same d i, j ) = p( d i, j same) P( same) + p( d i, j diff ) P( diff ) Here we assig a uiform distributio for P φ ). It could be o-uiform if we cosider the probability of each pose presetig i the test set. Fially, the sum rule is applied. That is, the averaged probability measure of all patches P ( same d i, j ) will be the similarity measure betwee the test image ad oe of the traiig subjects. Basically differet combiatio rules, such as the sum rule, the product rule, the max rule, etc, ca be applied here. Kittler et. al. coclude that i geeral the sum rule outperforms other combiatio rules because the sum rule is more resiliet to estimatio errors [33]. The test image is recogized to be the subject that gives the highest similarity measure. ( m 3.5 Experimetal results We evaluate our algorithm by comparig its performace o the CMU PIE database with a stadard eigeface method [72]. We use half of the subjects (34 subjects) i the PIE database for traiig the probabilistic model as preseted above. The 9 pose images per subject from remaiig 34 subjects are used for the recogitio experimets. The frot view image (c27) is used for the traiig, ad the other 8 images are used for test. As show i Figure 13, the horizotal axis represets the labels of 8 pose images, 44

45 34,14,11,29,05,37,03,22 from the right profile to the left profile. The vertical axis shows the recogitio rate of four differet algorithms for each specific pose. The first is the traditioal eigeface approach [72], where the earest eighbor classifier is applied. We have maually cropped the huma face for both the traiig ad test images, ad ormalized them to the size of 64 by 64 pixels. Sice there are 34 traiig images i total, it is possible to use a eigespace whose umber of eigevector varies from 1 to 33. We have tested all these possibilities ad plotted the oe with the best recogitio performace, whose umber of eigevectors is 21. The secod algorithm is our geometry-assisted method without probabilistic modelig, which is preseted i Chapter 3.2. The third algorithm is the geometry-assisted method with probabilistic modelig. A umber of observatios ca be made from this result. First, whe the pose of the test image is more toward the profile view, the recogitio rate is gettig lower. Secod, both our algorithms perform much better tha the baselie algorithm. Third, the geometry-assisted method with probabilistic modelig works better tha the oe without probabilistic modelig. We ca see that with oe traiig image, our algorithm presets satisfyig recogitio performace: it recogizes all face views with more tha 90% correct rate except the two most extreme profile views. Eve for the two profile views, aroud 70% ad 60% recogitio rates are obtaied. We also plot the results of the multi-subregio method reported i Figure 8(a) of [32]. We ca see that the performace of our algorithm is comparable with the multi-subregio method for test images closer to the frot view. For test images closer to profile views, our algorithm performs oticeably better. For example, i their report, the recogitio rates of two profile views are both lower tha 40%. There are a few reasos why our method works better for profile views. Oe is that we utilize more appearace iformatio istead of oly usig the area bouded by facial features, such as eyes ad the mouth, as doe i [32]. Also, the geometrical mappig greatly compesates the pose variatio ad reduces the itra-subject variatios. 45

46 Figure 13 Recogitio performaces of four algorithms o the CMU PIE database based oe frot traiig image. 3.6 Coclusios I this chapter we have itroduced a probabilistic geometry-assisted approach ad applied it to pose robust face recogitio. All traiig ad test images are projected oto the surface of a 3D ellipsoid by estimatig the optimal pose ad positio, ad represeted as texture maps. The distace measure is calculated o the overlap area betwee ay two texture maps. Also by represetig a texture map as a array of local patches, it eables us to develop a probabilistic model for the similarity value of patches from a face database with pose variatio. Evetually we are able to utilize the Bayesia framework to evaluate the similarity value of correspodig patches. Comparig with the baselie algorithm, we observe a sigificat improvemet whe performig experimets o the CMU PIE database. The above proposed algorithms work well for the case where oly oe traiig image is available for each subject. However, if there is more tha oe traiig images, ca we recogize faces better? The key issue is how to combie multiple traiig images ad geerate a uified 46

47 model that covers all pose variatio i the traiig images. We will focus o this issue ad propose ew algorithms i the ext chapter. 47

48 4. Face Mosaicig For Recogitio I the previous chapter, we propose that pose robust face recogitio should be performed i the feature space of the texture map, istead of the origial image space. Due to limited oe traiig image per subject, there is oly oe texture map for each subject after traiig. I order to build a statistical mosaic model for each subject, we eed multiple traiig images. This chapter will preset our proposed algorithm o how to build such a statistical model from multiple images. To be more specific, give f k, a set of images cotaiig faces with differet poses, we eed to build a geometric deviatio model Θ = { g, u} ad a statistical appearace model = { m i j, Vi,, j }, which is a array of patches each of which is modeled by a eigespace. The statistical mosaic model is composed of both these models together with the probabilistic model P d whose traiig is preseted i the previous chapter. I our mosaic method, combiig multiple images is essetially combiig multiple texture maps sice all images are coverted to texture maps. Whe combiig multiple texture maps, it is atural to observe that the same facial feature, such as the mouth corer, foud i multiple maps might ot correspod to the same coordiate o the sigle texture map. The blurrig effect, which is ormally ot a good property for modelig, will therefore be observed whe we combie may texture maps. To reduce such blurrig, oe key idea i our proposal is to allow a local patch to move toward better correspodig across multiple maps, use the flow 48

49 represetatio for modelig the amout of movemet, ad trai the flow represetatio to obtai the geometric deviatio model via PCA. Sice the flow represetatio plays a key role i this process, we first preset how we use it for face recogitio. Next, we itroduce our proposed algorithm for traiig both a geometric deviatio model ad a appearace model from multiple traiig images. Fially, we show the experimetal results by usig this ew method. 4.1 Flow represetatio for face recogitio Flow represetatio (optical flow) [34] is geerally used for motio aalysis. Usig two or more cosecutive frames of a image sequece, a 2-dimesioal vector field, called the optical flow, is computed to estimate the most likely displacemet of image pixels from oe frame to aother. Some researchers use optical flow i the aalysis of huma expressios for the purpose of expressio recogitio [46][75][69]. Also Kruiziga ad Petkov [34] propose to utilize optical flows i perso idetificatio. However, they oly cosider the optical flow residue as the criterio of classificatio, while we propose to make use of the eigeflow residue, which appears to exhibit better classificatio ability tha the former. Optical flow essetially is a approximatio of the velocity field. It approximately characterizes the motio of each pixel betwee two images. If two face images, which show differet expressios of the same subject, are fed ito the optical flow algorithm, the resultat motio field will emphasize the regios of facial features, such as the eyes ad the mouth. This is illustrated i Figure 14. The left half of the figure shows two face images from the same subject, but with differet expressios. The resultig optical flow is show below these figures. The secod set shows the same figure except that the two iput images are from two differet subjects. Obviously, the optical flow looks more irregular i this case. This clue ca help discrimiatig these two cases, which is the task of face recogitio. 49

50 The same idea ca be applied to images with registratio errors. Because the traditioal PCA approach is uacceptably sesitive to registratio errors, eve small shifts i iput images ca make the system performace degrade sigificatly. However, face images are usually difficult to register precisely, especially i a live autheticatio system. Therefore, we wat to use the optical flow to build a system that is tolerat to differet kids of registratio errors. I Figure 15, the secod image i the left colum is a up-shifted versio of the first image. The optical flow show below captures most of its motio aroud facial features. The right colum shows images of differet subjects leadig to a optical flow that appears to be radom. Sice the optical flow provides a useful patter for classifyig persoal idetity, we propose to use PCA to model this patter. Suppose that i the traiig data set, there are a few images with differet expressios for each subject, such as five images show i Figure 16. Usig these images, twety optical flow images o ( 1 k K) (correspodig to twety pairs) ca be obtaied through the optical flow estimatio. PCA ca be computed through the followig: k g = 1 K o k K k= 0 C = 1 K ( o K k= 0 k g)( o k g) By performig eige-aalysis for the covariace matrix C, we ca obtai a umber of eigevectors u = u, u, }. The three pricipal eigeflows of twety optical flow images are { 1 2 L show i Figure 17. Obviously large motio ca be observed i the regio of facial features, such as mouth corers, eyebrows ad asolabial furrows. So all the expressio variatios occurrig i a sigle subject ca be represeted by a space spaed by these eigeflows. The optical flow o betwee ay two images of this subject should have small residue defied as: T 50

Thus, the eigeflow residue ca be a useful feature for recogitio. Similarly eigeflows ca be used to model the optical flow caused by image registratio errors.

51 ( o g) 2 T e = o g uu (9) This is basically the error term that could ot be modeled by a subspace. I cotrast, the optical flow betwee this subject ad other subjects caot be represeted well by this space, which results i a large residue. We call this the eigeflow residue. Thus, the eigeflow residue ca be a useful feature for recogitio. Similarly eigeflows ca be used to model the optical flow caused by image registratio errors. We have applied the eigeflow approach for face recogitio ad autheticatio, ad obtaied satisfyig results. Please refer to [50] for detail iformatio about this approach. I the ext sectio, we will itroduce how to use the same idea for modelig the geometric deviatio ad servig for face recogitio. Figure 14 Applyig optical flow o images with differet expressios: left had side are two images from the same subject, ad right had side are two images from differet subject. Differet radomess patter ca be observed from two resultig optical flows. 51

Figure 15 Applyig optical flow o images with registratio errors: left had side are two

Differet radomess patter ca be observed from two resultig optical flows.

Figure 17 The first three eigeflows traied from expressio images of oe subject: Some

52 Figure 15 Applyig optical flow o images with registratio errors: left had side are two images from the same subject, ad right had side are two images from differet subject. Differet radomess patter ca be observed from two resultig optical flows. Figure 16 Five expressio images used for traiig a idividual eigeflow for this subject. Figure 17 The first three eigeflows traied from expressio images of oe subject: Some promiet movemets of facial features, such as mouth corers, eyebrows, asolabial furrows, ca be see from them. 52

4.2 Modelig the geometric deviatio Oe potetial problem of combiig multiple texture maps is that the resultig averaged map might get blur due to the fact that facial features from multiple maps do ot

For the model traiig process, it is reasoable to obtai such ladmark poits by maual labelig.

53 4.2 Modelig the geometric deviatio Oe potetial problem of combiig multiple texture maps is that the resultig averaged map might get blur due to the fact that facial features from multiple maps do ot correspodig to the same coordiate i the texture map. To reduce such blurrig, we might eed to alig the facial features better by relyig o some ladmark poits. For the model traiig process, it is reasoable to obtai such ladmark poits by maual labelig. Give K traiig images, f k, icludig differet poses of huma faces, i order to facilitate the modelig process, we label the positio of facial feature poits. As show i Figure 18, 25 facial feature poits are labeled. For each traiig image, oly a subset of the 25 poits will be marked accordig to their visibility. We call these poits as key poits. Figure 18 Labeled facial features: up to 25 feature poits are labeled o each traiig images As usual, first we geerate the texture maps k s from each traiig images. Sice we oly label the facial key poits o the traiig images, we eed to fid their correspodig coordiates i b ( i = 1,2, L,25) i the texture map s k, as show i the first two rows of Figure 19. Essetially k this is a iverse operatio of the geometrical mappig described i the previous chapter. 53

54 After determiig key poits o all texture maps of the traiig images, we eed to fid the coordiate o the mosaic model where all correspodig key poits deviate to. Ideally if the i huma head s geometry is a perfect 3D ellipsoid, the same key poit b ( 1 k K) from multiple traiig texture maps should correspod to exactly the same coordiate, i.e., k i i b b 2 = = 1 = L b, For example, if we look at the three texture maps i the secod row of Figure i K 19, the coordiate of the left ose s corer should be the same. However, due to the fact that the huma head is ot a perfect ellipsoid, these key poits will deviate to each other. The amout of deviatio is a idicatio of how much geometrical differece betwee the actual head geometry ad the 3D ellipsoid. We will model such deviatio by applyig PCA o the flow represetatio. First, we compute the averaged positio b i i of all key poits b ( 1 k K) that correspod to the same facial feature ad are also visible o the texture map. We treat this averagig as the target positio i the fial mosaic model where all correspodig key poit should deviate. As show i the third row of Figure 19, each white poit is the averaged positio computed from all traiig texture maps. Sice our mosaic model is composed of a array of patches, each oe of 25 averaged key poits falls ito oe particular patch, which is called key patch. Notice that istead of averagig, we ca also use weightig i geeratig b i. For example, the texture maps that are more reliable (mostly frot view images) would have larger weights. Secod, for each texture map, we take the differece betwee the positios of key poit i b k ad that of the averaged key poit i b as the key patch s deviatio flow (DF) that describes which patch from each texture map should move toward oe key patch i the mosaic model. However, there are also o-key patches i the mosaic model. I order to model their deviatio flows, as show i Figure 20, we represet the mosaic model as a set of triagles, whose vertexes are the key patches. Thus for each o-key patch, it falls ito at least oe triagle. I the last step, the deviatio flow for a o-key patch from each traiig texture map is iterpolated by the key k 54

55 patch s deviatio flow of oe triagle. The reaso we assig a o-key patch to multiple triagles is that i case some key patch s deviatio flows of oe triagle are ot available due to their ivisibility, we ca rely o other triagles to perform the iterpolatio. Oe might thik why we compute deviatio flows through the triagulatio of key patches, rather tha applyig optical flow. The reaso is that traditioal optical flow computatio starts with two images: a test image ad a referece image. However, whe we compute deviatio flows, we do ot have the referece texture map yet, which will be calculated after the deviatio map is obtaied. Thus we could ot compute deviatio flows usig optical flow. For each traiig texture map, its geometric deviatio is a 2D vector map v k i, j dimesio is the same as the umber of patches i the vertical ad horizotal directios, ad each pixel is a vector idicatig how far this patch is away from the averaged patch i the mosaic. Its model. Notice that for ay traiig texture map, some elemets i the 2D v are cosidered as k i, j missig oes. We use a to deote the mask map of k i, j v k i, j. If v is a miss elemet, k i, j a is zero, k i, j otherwise it is oe. I order to model the deviatio, we trai the geometric deviatio v k i, j from all traiig texture maps usig the PCA with missig data [71] as followig: 1 K k k g i, j = ( a K i, j v i, j ) k k = 0 a k = 1 i, j k k v i, j = g i, j if a i, j = 0, for 1 k K C = 1 K k ( v K k = 0 g)( v k g) By performig eige-aalysis for the covariace matrix C, we obtai a umber of eigevectors u = u, u, }. Figure 21 shows the resultig deviatio model Θ = g, u 1, u } { 1 2 L T { 2 based o the traiig images from oe subject. Essetially the liear combiatio of these basis 55

56 vectors describes all the possible geometric deviatio of ay view agle for this particular subject s face. 1 N Figure 19 Mappig ad averagig the positio of key poits: the positio of all key poits i the traiig texture maps (2 d row), which correspod to the same facial feature, such as the left eye corer, are averaged ad result i the positio i the fial model (bottom row). - Key patch s DF No-Key patch s DF Figure 20 Computatio of patch s deviatio flow: each o-key patch falls ito at least oe triagle; the deviatio of a o-key patch is iterpolated by the key patch deviatio of oe triagle. 56

57 Figure 21 Traied geometric deviatio model (Top: mea, left: 1 st eigevector, right: 2 d eigevector) 4.3 Modelig the appearace After modelig the geometric deviatio, we also eed to build a appearace model, which describes the facial appearace from all poses. Figure 22 illustrates the process of buildig such a appearace model. O the left had side, there are two pairs of traiig texture map s k ad its correspodig geometric deviatio v. The resultig appearace model = m i, V } with oe mea ad two eigevectors are k i, j {, j i, j show o the right had side. This appearace model is composed of a array of eigespaces, where each is devoted i modelig the appearace of the local patch idexed by ( i, j). I order to trai oe eigespace for oe particular patch, the key issue is to collect oe correspodig patch from each traiig texture maps s k, where the correspodece is specified by the geometric deviatio v k i, j. For example, to trai a eigespace Π i, j for a patch cetered at (40,83), first we obtai the correspodece iformatio from v 1 i, j, which specifies how much deviatio the correspodig patch i the texture map s 1 with respect to the target locatio 57

58 (40,83). Hece the summatio of 1 v i, j ad (40,83) determies the ceter of correspodig patch, 1 s i, j, i the texture map s 1. Usig the same way, we ca fid other correspodig patches k s i j ( k = 2,3, L ) from all other texture maps. Notice some of s k i, j might be cosidered as, K missig patches. Oce we collect correspodig patches s k i, j from all traiig texture maps, we are ready k to take these patches s i, j (1 k K) as samples ad trai a statistical model Π i, j via PCA. Figure 22 shows that a 2-dimesioal eigespace is obtaied from the traiig patches. Fially, the appearace model is composed of a array of PCA models, where each PCA model describes the appearace of oe patch. We call this the patch-pca mosaic. Modelig via PCA is popular whe the umber of traiig samples is large, such as the traiig of a uiversal mosaic model based o may subjects, or of a idividual mosaic model with may traiig images. However, whe the umber of traiig samples is small, such as the traiig of a idividual mosaic model with oly a few traiig images, it might ot be suitable for traiig a PCA model for each patch. Istead we would keep all the correspodig patches ad use them directly as part of the model. Oe computatioal efficiet way of doig this is to trai a uiversal k PCA model based o all correspodig patches s (1 k K,1 i I,1 j ) of all traiig i, j J texture maps, ad keep the coefficiet of these patches i the uiversal PCA model as well. This is called as the global-pca mosaic. Notice that the patch-pca mosaic ad the global-pca mosaic oly differ i how the correspodig patches across traiig texture maps are utilized to obtai a model, depedig o the availability of traiig data i differet applicatio scearios. 58

59 83 (40,83) 40 (40,83) Figure 22 Traiig process of the appearace model for oe patch: the deviatio idicates where to fid the correspodig patch from each of traiig texture maps; all correspodig patches are treated as samples for traiig a statistical model. Figure 23 The mea of two uiversal mosaic models (left: without the modelig of geometric deviatio, right: with the modelig of geometric deviatio). 59

60 Evetually the statistical mosaic model icludes the appearace model Π, the geometric deviatio model Θ ad the probabilistic model P d traied as i the previous chapter. We cosider that the geometric deviatio model plays a key role i formig the mosaic model. For example, Figure 23 shows the mea appearace of two mosaic models traied by the same set of images from 10 subjects. The oe o the left does ot have the modelig of geometric deviatio, while the right oe has. It is obvious that the model o the right is much less blurrig ad captures more useful iformatio of the facial appearace. Lookig at Figure 20, we otice that the modelig area of the mosaic model is bouded by the positio of most outside key patches. I order to let the mosaic model cover larger pose variatio, we ca also do extrapolatio while computig the deviatio flow of o-key patches, so that more appearace iformatio ca be icluded i the fial model. Oe example of usig extrapolatio is the right illustratio of Figure 23, which covers much larger area o facial appearace comparig to the up-right illustratio of Figure Face recogitio usig the statistical mosaic model I the previous sectio, we preset a approach to trai a statistical mosaic model. Now let us see how this model ca be used i pose robust face recogitio. Give L subjects with K traiig images for each subject, we use our approach to trai a idividual statistical mosaic model for each subject. For simplicity, let us assume we have eough traiig samples ad obtai the patch-pca mosaic for each subject. We will discuss the case of the global-pca mosaic i the ed of this sectio. As show i Figure 24, give oe test image, we geerate its texture map by usig the uiversal mosaic model. The we measure the distace betwee the test texture map ad each of the idividual mosaic. Thus the key issue here is to compute the map-to-model distace. Notice that the appearace model is composed of a array of patch models, which is called the referece 60

61 patch. Basically the map-to-model distace is the summatio of map-to-patch distaces. That is, for each referece patch, we eed to fid its correspodig patch from the test texture map, ad compute its distace to the referece patch model. Figure 24 illustrates the calculatio of the map-to-patch distace. Sice we deviate correspodig patches durig the traiig stage, we should do the same while lookig for the correspodig patch i the test stage, istead of pickig up the patch from the text texture map that has the same coordiate as the referece patch. Oe simple approach is to search for the best correspodig patch for the referece patch iside a searchig widow, whose ceter is the coordiate of the referece patch. However, this approach does ot impose ay costrait o the deviatio of eighborig referece patches. To solve this issue, we would like to make use of the deviatio model that is traied before. I the right had size of Figure 24, there are three models, the deviatio model Θ = { g, u}, the appearace model = m i, V } {, j i, j, ad the probabilistic model P d, as the compoets of the statistical mosaic model. The deviatio model describes all the possible geometric deviatio of ay view agle for oe subject s faces. Because the geometries of differet huma heads are ot the same, such deviatio model cotais useful iformatio about idividual subject s geometry. If we radomly sample oe coefficiet c = c, c, ] i this [ 1 2 L subspace model, the liear combiatio (or subspace recostructio) of this coefficiet describes the geometric deviatio t v for all referece patches. v t = g + c u k k k The beefit of this approach is that it eforces the geometric deviatio of eighbor patches to follow certai costrait, which is described by the mea ad eigevectors of the deviatio model. Based o this idea, the key is to fid a coefficiet i deviatio subspace, which provides the optimal matchig betwee the test texture map ad the model. I our implemetatio, we adopt a simple searchig scheme to fid such a coefficiet by determiig 61

62 each dimesio oe by oe. That is, i a K-dimesioal deviatio subspace, uiformly sample multiple coefficiets alog the first dimesio while the coefficiets for other dimesios are zero, ad determie oe of them which results i the maximal similarity betwee this text texture map ad the model. The rage of samplig is bouded by the coefficiets of traiig deviatio maps. The we perform the same searchig alog the secod dimesio while fixig the optimal value for the first dimesio ad zero for all other dimesios. The searchig is fiished util the K th dimesio. Essetially this is a problem of motio estimatio with a subspace costrait. I the future, we might also use the POCS idea to fid a better solutio [13]. For each sampled coefficiet i the above searchig scheme, the recostructed 2D deviatio map (i the bottom-left of Figure 24) idicates where to fid the correspodig patch i the test texture map. The the residue distace (9) betwee the correspodig patch ad the referece patch model is computed, which is further feed ito the probabilistic model. Fially, the probabilistic measuremet provides how likely this correspodig patch belogs to the same subject as the referece patch. By doig the same operatio for all other referece patches ad averagig all patch-based probabilistic measuremets, we obtai the similarity betwee this text texture map ad the model based o the curret sampled coefficiet. Fially, the test image is recogized as the subject who provides the largest similarity. Depedig o how the idividual mosaic model is traied (the patch-pca mosaic or the global-pca mosaic), there are differet ways of calculatig the distace betwee the correspodig patch ad the referece patch model. As we preseted before, for the patch-pca mosaic, the residue with respect to the referece patch model is used as the distace measure. For the case of the global-pca mosaic, sice oe referece patch model is represeted by a umber of coefficiets, the distace measure is defied as the earest eighbor of the correspodig patch amog all these coefficiets. 62

P d 95 38 (38,95) Θ Figure 24 Computig the map-to-patch distace: the deviatio map builds up the patch correspodece betwee the model ad the test texture map; the distace measures from correspodig

5 Determiig the mappig parameters for traiig images Give a set of traiig images for oe subject, the first step i our mosaic algorithm is to geerate the texture map for each traiig image.

63 P d (38,95) Θ Figure 24 Computig the map-to-patch distace: the deviatio map builds up the patch correspodece betwee the model ad the test texture map; the distace measures from correspodig patches are feed ito the Bayesia framework to geerate a probabilistic distace measuremet. 4.5 Determiig the mappig parameters for traiig images Give a set of traiig images for oe subject, the first step i our mosaic algorithm is to geerate the texture map for each traiig image. There are three ways of doig this. First, we ca treat a pre-traied uiversal mosaic model as the referece ad calculate the mappig parameter of all images refer to this uiversal model, by usig the codesatio method. Secod, if oe of the traiig images is the frot view, we ca geerate its texture map, which will be 63

64 treated as the iitial mosaic model, by labelig its boudary ad assumig all rotatio agles are zero. The the mappig parameters of other traiig images ca be foud by miimizig their distaces to the iitial model. The third method is the same as the secod oe except that the rotatio agles of the frot view image are obtaied from the 3D positio of facial features, istead of assumig zero agles. This is to solve oe potetial problem with the secod method, i.e., the frot view face might ot correspod to zero rotatio agles. Actually this problem also exists for the first method whe geeratig the iitial texture map for the uiversal model. We will preset the basic steps of the third method i this sectio. The process of obtaiig the 3D positio of facial feature poits is straightforward by usig the stereo triagulatio techique [20] i the visio commuity. We require that multiple view images of the iitial frot view image are available ad all cameras are calibrated. I the followig case, we have 3 views of the huma face captured simultaeously, where the ceter view is the iitial traiig image. First, as show i Figure 25, we mark the commo feature poits amog three views. We also mark the face vertical boudary o the 2D image. Secod, by usig the stereo triagulatio, the 3D positio of these feature poits ca be recostructed. Third, based o the 3D positio of feature poits with respect to the ceter view, we ca fit a 3D ellipsoid by miimizig the distace betwee the poits to the ellipsoid surface, uder the costrait that the 2D projectio of the 3D ellipsoid at all three views should fit with the marked boudary. This costrait is importat sice if the 3D ellipsoid is larger tha the actual head size, part of the backgroud will be icluded i the texture mappig. Basically the 3D recostructed vertical boudary poits tell the vertical ad horizotal ceter of the ellipsoid. Oly the rotatio agles ad the ceter i the Z axis eed to be determied durig the fittig process. The fitted ellipsoid tells the optimal rotatio agels that the frot view traiig image could be approximated by a ellipsoid model. For example, i Figure 26, although the face i the ceter view looks like the frot-view, the fittig results idicates that there is a slight tilt aroud the horizotal axis. We will use these rotatio agles i geeratig the iitial mosaic based o this 64

the secod ad third method i determiig the mappig

What we have show so far is oe beefit of havig

Aother beefit is that we may eve build a 3D wire frame

65 image. Later we will show the experimetal results of comparig the secod ad third method i determiig the mappig parameters for the traiig images. What we have show so far is oe beefit of havig calibrated multiple view images. Aother beefit is that we may eve build a 3D wire frame geometric model usig the 3D coordiates of facial features. Thus the face image ca be projected oto the wire frame, istead of the 3D ellipsoid model. Figure 25 Marked feature poits for three views Figure 26 3D feature poits ad fitted ellipsoid 4.6 Experimetal results 65

66 Similar to the previous chapter, we evaluate our algorithm by comparig its performace o the CMU PIE database with a baselie method. We use half of the subjects (34 subjects) i the PIE database for traiig the probabilistic. The 9 pose images per subject from remaiig 34 subjects are used for the recogitio experimets. Three poses (c27, c14, c02) are used for the traiig, ad the remaiig 6 poses (c34, c11, c29, c05, c37, c22) are used for test. As show i Figure 28, the horizotal axis represets the labels of 6 test poses. The vertical axis shows the recogitio rate of three differet algorithms for each specific pose. The first is the result of the eige light-field algorithm from Figure 10 (a) i [23]. It is hard to fid a previous method testig o the same scheme of the same database as us. We plot this result eve it oly uses oe frot view per subject as the traiig data. The secod algorithm is our face mosaic method without the modelig of geometric deviatio, which essetially let the mea of all eigevectors of Θ = { g, u} to be zero. The third algorithm is the face mosaic method with the modelig of geometric deviatio. Sice the umber of traiig images is small, we trai the global-pca mosaic for each subject. Three eigevectors are used i buildig the global-pca subspace. Thus each referece patch from the traiig stage is represeted as a 3-dimetioal vector. For the face mosaic method, the patch size is 4 by 4 pixels ad the size of the texture map is 90 by 180 pixels. For illustratio purpose, we plot the mea of three models i Figure 27. We ca see that all mea images cotai eough pose variatio ad do ot blur much. Comparig amog these three algorithms, both of our algorithms works better tha the baselie algorithm. Also, if we compare this result with the experimetal results i the previous chapter (Figure 13), we ca see the algorithm preseted i this chapter works better sice it has more traiig images ad the idividual mosaic model successfully combies the pose iformatio from multiple images, while the algorithm i the previous chapter oly takes oe image for traiig. Obvious the mosaic approach provides a better way of registerig multi-view images for a ehaced modelig, ulike the aïve traiig procedure of the traditioal eigeface 66

67 approach. For our algorithms, the oe with deviatio modelig performs better tha the oe without deviatio modelig. There are at least two beefits for the oe with deviatio modelig. Oe is that a geometric model ca be used i the test stage. The other is that as a result of deviatio modelig, the patch-based model also captures the persoal characteristic of the multiview facial appearace i a o-blurrig maer. Figure 27 Mea images of three idividual mosaic models. Figure 28 Recogitio performaces of three algorithms o the CMU PIE database based three traiig images. I Sectio 4.5, we have metioed that the 3D positio of facial feature poits could be used to determie the mappig parameters for the traiig images. We would like to see how could this help the mosaic based face recogitio. We perform experimets based o the FIA 67

68 database, which we will itroduce i detail i the ext chapter. There are 20 subjects i the database, with 9 traiig images per subject, where oe of them cotais the frot view face. There are 50 test images per subject. We have performed two algorithms o this database. Both of them use the idividual mosaic method with deviatio modelig. They oly differ i that oe uses zero agles i the mappig parameter for the iitial frot view image, the other uses the 3D positio of facial feature poits to determie the rotatio agles. From the experimetal results i the followig table, we ca see that the oe use the 3D positio works slightly better tha the oe assumig zero agles. This is reasoable sice the 3D positio provides better approximatio to the true geometry. Also, due to the fact that oly the iitial mosaic is ehaced via 3D poits fittig, the improvemet is ot dramatic. Table 1 Compariso of methods i iitializig the mosaic model Mosaic method with zero iitial Mosaic method via 3D agle iitializatio 7.32% 6.71% 4.7 Coclusios This chapter presets a approach to trai a statistical mosaic model by combiig multiple traiig images with pose variatio. Also we propose to utilize the geometric deviatio model for fidig the correspodig patch durig the test stage. We show improved performace for pose robust face recogitio by usig this ew method. Our face mosaic model is a quite sophistical statistical model because of the followig. First, as the hardest variatio, the pose variatio is hadled aturally by mappig images from differet view-agles to form a mosaic mea image, which ca be treated as a compact represetatio of faces uder differet view-agles. Secod, all the other variatios that could ot be modeled by the mea image, for example, illumiatio ad expressio if they preset i the traiig images, are take care of by a umber of eigevectors. Therefore, istead of modelig oly oe type of variatios as the covetioal methods, our method is tryig to model all 68

69 possible appearace variatios i oly oe model. Third, as a simple geometrical assumptio of the ellipsoid model, it has the problem of over-simplificatio sice the huma head is ot truly a ellipsoid. This is take care of by traiig a geometric deviatio model, which results i better correspodig across multiple texture maps. Havig show the applicatio of pose robust face recogitio, we would like to apply our mosaic model for video-based face recogitio as well, which ivolves face trackig ad recogitio from video sequeces. We will preset it i the ext chapter. 69

70 5. Video-based Face Recogitio I traditioal image-based face recogitio, usually the face area is cropped before feedig to a recogitio system. However, i video-based face recogitio, give a video sequece cotaiig huma faces, we have to track the face over sequeces before ay recogitio task ca proceed, which ormally ivolves two differet tasks: face trackig ad face recogitio. Oe computatio efficiet way is to combie these two tasks together. That is, by usig the same model for both trackig ad recogitio, these two tasks ca be performed simultaeously. Sice this same model has to serve the purpose of both trackig, which requires a simple model for achievig real-time trackig efficiecy, ad recogitio, which requires a specific model cotaiig eough variatios about the idetity. As we preseted i the previous chapters, sice our face mosaic model is a simple statistical model combiig both appearace ad geometric iformatio, it is a good cadidate for servig face trackig ad recogitio simultaeously. I this chapter, we will focus o how to use the mosaic model for video-based face trackig ad recogitio. 5.1 Face trackig usig the mosaic model Give oe video frame, the most importat task i all the trackig, recogitio ad olie model traiig is to geerate a texture map ad compare it with the mosaic model, which results i the similarity betwee this frame ad the model. Sice the mappig parameter x cotais all 70

71 the iformatio for geeratig the texture map, the goal of face trackig is to estimate the optimal x, which ca result i the miimal distace (or maximal similarity) betwee the texture map ad the mosaic model. I oe word, the face trackig is equivalet to estimatig x. There are two methods for estimatig the mappig parameter x : the codesatio method [29] ad the Leveberg-Marquardt algorithm [63], which is similar to the gradiet decet method Face trackig via the codesatio method As we said before, the goal i face trackig is to estimate the mappig parameter x based o the curret frame f ad the mosaic model. The basic idea of the codesatio method is that istead of directly estimatig x give each frame f, it estimates the coditioal probability desity fuctio (PDF) p x f ). The ame codesatio refers to coditioal ( desity compesatio, which meas to compesate or propagate the coditioal PDF p( x 1 f 1) by usig the kowledge from f, ad to obtai a estimatio of p( x f ). Because i geeral the coditioal PDF p x f ) might ot be Gaussia distributio, ( importace samplig is used to approximate the arbitrary o-gaussia distributio, where a set ( k ) ( k ) of K samples together with their weights, { x, w } ( 1 k K), is used. As show i Figure ( ) ( k ) 29, give a set of samples { x k, w }, a coditioal PDF p ˆ( x f ) could be sythesized to approximate the origial coditioal PDF p ( x f ) as follows. K ( k ) ( k ) pˆ ( x f) = w δ ( x x ) (10) k = 1 Thus propagatio of a coditioal PDF becomes the updatig of the sample set, i.e., ( k ) ( k ) ( k ) ( k ) give { x, }, ad f, geerate { x, w }. As show i Figure 30, the propagatio ca be 1 w 1 accomplished i two steps. 71

72 The first step predictio is essetially to aswer the questio: if I have ot see the curret observatio f, what would be the most likely place that each of the sample will sit o based o the best of my kowledge about the system? This is aswered by applyig the (k ) ( k ) kowledge of p x x ) to predict a ew sample set { x } from { x }. The kowledge of ( 1 p x x ) ca come from the domai kowledge or be traied from the traiig data. For ( 1 example, if we kow the huma head is movig aroud i all possible directios, we ca use the followig equatio as oe way of applyig the domai kowledge. x + b = x 1 1 where b is a white oise with certai variace. Oe potetial problem with this multiple samples propagatio is that some of the samples might have too tiy weights, ad propagatig them would ot cotribute much to the modelig of a coditioal PDF. For this reaso, people have proposed to add a re-samplig step before the predictio step, where the Mote Carlo ( k ) ( k ) ( k ) Method is used to geerate a ew set of samples { x } from { x, }. Figure 31 illustrates ' 1 1 w 1 ( k ) ( k ) the procedure of the Mote Carlo Method. Basically based o { x, } ad (10), we ca 1 w 1 obtai a estimated coditioal PDF p ˆ( x f ), from which a cumulative desity fuctio (CDF) is geerated. Fially, by lookig at which bi a radom umber is fallig ito, we ca geerate a set ' 1 ( k ) of samples { x } that fit with this coditioal PDF. For example, if a radom umber is i the rage of the red bi, ( k ) x will be a ew sample i { x }. (6) 1 ' 1 w I the secod step weight assigmet, each sample (k ) x is assiged with a ew weight ( ) = p( f x ), which measures how likely the curret frame f ca be observed based o ( k ) k this particular sample based o (k ) x. This likelihood measure is calculated by geeratig a texture map s (k ) x ad f, as preseted i Sectio 3.1, ad calculatig the similarity measure betwee 72

73 s ad the mosaic model, as preseted i Sectio 4.4. Evetually the ew weights are ormalized such that the total weights of all samples equals to oe. ( k ) ( k ) After the propagatio, the weighted mea of the ew sample set { x, w } becomes the curret estimated x. Whe the ext frame f + 1 arrives, we will start the same estimatio ( k ) ( k ) procedure based o the curret sample set { x, w }, which essetially carries all the statistical iformatio of x, ad is propagated to future frames. We implemet this algorithm for face trackig ad observed reasoable good trackig results. For face trackig i a video sequece, ormally it is assumed that the trackig result of the first frame is available before the trackig starts. This result might come from the face detectio or maual labelig. Notice i the codesatio method, we also eed to iitialize a set ( k ) ( k ) of samples of their correspodig weights { x, w }, which are obtaied from the trackig 1 ( k ) result of the first frame, x 1. Basically we geerate radom samples { x } aroud x 1, ad the ( k ) assig weights accordig to the similarity measure betwee the texture maps from { x } ad the mosaic model. 1 Notice i previous chapters, for the traiig ad test images i a recogitio system, we geerate their texture maps based o the uiversal texture map by usig the codesatio method. This procedure is actually the same as trackig oe frame without a good sample set from the previous frames. Obviously, i this case we eed to use more samples i order to have a good estimatio of the mappig parameter of oe face image. I our thesis proposal, we were usig the Hidde Markov Model (HMM) to model the mappig parameters, which is ot ecessary aymore sice the codesatio method is a extesio of the HMM. Because Gaussia assumptio is made i the HMM, while the codesatio method does ot make such a assumptio

74 p(x f) x pˆ ( x f ) x Figure 29 Importace samplig: represet a o-gaussia PDF usig a set of samples ad correspodig weights (idicated by the size). { x ( k) ( k ) 1, w 1 } Predictio Observatio desity fuctio Weight assigmet k ) { x, w ( ( k ) } Figure 30 Two steps for desity PDF propagatio: predictio step estimates the ew positio of each sample ad the weight assigmet step assig weights for each sample based o the observatio desity fuctio. 74

75 { x ( k ) 1, ( k) w 1 } pˆ ( x f ) x 1 pˆ ( x x f ) Radom umber geerator 0 (1) x 1 (6) x 1 ( N ) x 1 x { ( k ) ' x 1 } Figure 31 Mote Carlo method: a PDF is approximated by geeratig a set of samples with uiform weights Face trackig via the Leveberg-Marquardt algorithm Havig itroduced the codesatio method, let us look ito aother trackig algorithm, the Leveberg-Marquardt algorithm. This method is especially useful whe the mosaic model is traied without the deviatio model ad the probabilistic model. I this case, sice there is o otio of patch represetatio, the mosaic model ca be simply represeted as oe eigespace Π = { m, V}. Whe the patch represetatio is used, we eed to add the coefficiet of the geometrical deviatio model ito the miimizatio process as well. Essetially the face trackig is a miimizatio procedure, which is illustrated by Figure 32. The objective is to iteratively miimize the differece betwee the texture map s ( α, β ) ad the statistical model Π = { m, V}, which cosists of the mea m ad q eigevectors { v, v, 2 } V =,. Sice the mappig parameter x cotrols the texture map s, this 1 L v q 75

76 76 miimizatio is over the parameter x. The miimizatio will stop if the followig distace is small eough, otherwise it will keep updatig the mappig parameter x. = = α,β 2 2 ) ( mi e J vc m s w x o (11) ) ( ) ) ( ( ) ) ( ( m s v w v w v c = T T diag diag where o refers to the elemet-wise multiplicatio, () diag geerates a matrix whose diagoal elemet is the iput vector, ad c is the eige-coefficiet of s with respect to the mosaic model. The parameter w is the mask map for s, which combies the mask iformatio from two sources. Oe is the mask map for the origial iput image f. The other is the mask map from the mosaic model. We adopt the Leveberg-Marquardt algorithm to fid the optimal mappig parameter x that miimizes (11). This algorithm requires the computatio of the partial derivatives of e with respect to all ukow parameters i x, for example: ) ( ( ' ' ' ' α α α α α α R p p R p p e R e y y x x + = s s )) ( ' ' ' ' α α β β β R p p R p p y y x x + + s where ) )( ( T diag e vv I w s =, ' = 0 p x α, 2 ' 2 ' 1 y y p r p = α, 2 ' 2 ' 2 ' z y x p p r p = β, 2 ' 2 2 ' ' ' y z x y p r p p p = β, ) )si( si( ) )si( cos( ' ' ' β α β α α R R p R R p R p z y x =, ) cos( ) si( ' ' ' α α α R p R p R p z y y =,

77 ad s s ad are the image itesity gradiets of α β s at ( β ) α,. With these partial derivatives, we ca calculate a approximate Hessia matrix A ad the weighted gradiet vector b [63]. For simplicity, if there are two parameters, R α ad e R A = e R R β, i α β e R e R α α x, e R e R e e b = e e Rα Rβ The the parameters x ca be liearly updated by α β e R e R 1 x = ( A + λi) b (12) This algorithm cosists of the followig steps: 1. Assig the iitial value for x. 2. Compute s ad w accordig to Sectio Compute the error e as i (11) ad the itesity gradiet o s, computer the partial derivative of e with respect to x, ad compute A ad b. T 4. Liearly update [ α, β ] by x calculated i (12). 5. Evaluate (11) usig the updated parameters ad check whether the error J decreases; if ot, icrease λ as described i [63], ad compute a ew T x. 6. Cotiue the iteratio util the parameters coverge or a fixed umber of steps are fiished. If we compare the Leveberg-Marquardt algorithm with the codesatio method, we ca see that the former is similar to the gradiet decet method, which tries to move toward the global miimal poit o the error surface as fast as possible from a iitial poit, while the latter is a statistical method, which starts with may poits (samples) o the error surface, moves each of them toward their best locatios, ad takes the averaged locatio as the trackig results. β β 77

78 Normally the Leveberg-Marquardt algorithm is more likely to be trapped ito the local miimal sice oly oe poit is movig aroud o the error surface ad it might starts with a bad iitial poit. O the other had, due to its statistical ature, the codesatio method is more robust i terms of trackig performace, because as log as some of the samples are closer to the true global miimal, they will be respoded by high weights ad the result would be pretty good already. However, the drawback of the codesatio method is also due to its statistical ature. That is, sice may samples are used for trackig, the computatio load of the codesatio method is usually higher tha the Leveberg-Marquardt algorithm, which ca coverge i usually a few iteratios. I summary, we ca see that these two methods are complemetary to each other i terms of trackig performace ad computatioal efficiecy. f w s Mappig with parameter x Adjustig parameter x Error calculatio Exit Figure 32 Trackig via the Leveberg-Marquardt algorithm: the mappig parameter is iteratively adjusted i order to miimize the distace betwee the texture map ad the mosaic model. 5.2 Face recogitio 78

79 There are two differet schemes for performig face trackig ad recogitio from video sequeces. First is to use the image-based method. For a face database with L subjects, we build the idividualized model for each subject, based o oe or multiple traiig images. Give a test sequece ad oe specific model, a distace measuremet ca be calculated for each frame by face trackig. Averagig of the distace over all frames i the sequece provides the distace betwee the test sequece ad oe specific model. After the distaces betwee the sequece ad all models are calculated, we ca obtai the recogitio result for this sequece by comparig distaces across subjects. Secod is to use the video-based method. Zhou et. al. [82] propose a framework to combie the face trackig ad recogitio usig the codesatio method. They basically propagate a set of samples govered by two parameters: the mappig parameter ad the subject ID. Thus we call it as the 2D codesatio method, as show i Figure 33. There are at least three beefits of usig video-based recogitio comparig to imagebased recogitio i usig the codesatio method. First, durig the weightig ormalizatio step, the covetioal codesatio method ormalizes weights of all samples of oe subject, while the 2D codesatio method ormalizes weights of all samples of all subjects. Thus the samples of the matched subject would have relative larger weights tha samples of o-matched subjects. I the mea time, the weights of the samples of o-matched subjects are depressed, which is what we wat. Secod, the set of samples with the same mappig parameters is assiged for all subjects. O oe had, it reduces the computatio of evaluatig the weights based o each sample because the geometrical mappig operatio is the same. O the other had, samples for o-matched subjects are ot allowed to move freely, whose movemet is maily govered by samples of the matched subject. 79

80 Third, the 2D codesatio method might be able to hadle the ope-set recogitio problem. Due to the weight ormalizatio, it is likely that o subject shows domiat weights if the test subject is ot icluded i the traiig set. Otherwise the samples of the matched subject should have domiat weights comparig to samples of o-matched subjects. As show i Figure 34, we perform a simple experimet to show this poit. Give a face database with 29 subjects, if the test frame comes from oe of the 29 subjects (i.e., this is a close-set recogitio), the total probabilities of all samples from the matched subject is much larger tha the oes from other subjects. However, i the close-set recogitio, sice the test frame does ot match with ay subjects i the database. No domiat probability is observed. Figure 33 Basics of video-based recogitio 80

Figure 34 The differet betwee close-set ad ope-set recogitio usig the 2D codesatio method. Let us itroduce the basic step i the 2D codesatio method usig Figure 35.

81 Figure 34 The differet betwee close-set ad ope-set recogitio usig the 2D codesatio method. Let us itroduce the basic step i the 2D codesatio method usig Figure 35. Give L subjects i the traiig set, the idividualized model is built for each subject. Suppose we use a set of K samples for modelig the mappig parameter. I the iitial status, there are L K samples i the 2D space. The first step is to select the top K samples (red circles) that have the largest weights amog all L K samples. The these K samples are predicted to a ew locatio accordig to a certai model. Secod, L samples are duplicated for all subjects based o each oe of K samples. I other words, all L samples share the same set of mappig parameters. Third, the same mappig parameter from L samples would result i the same texture map, which greatly saves the computatio load of geometrical mappig. The texture map will have differet similarity with respect to L differet models. Thus differet weights are assiged to each sample. Fially, the subject who has the maximum total weights from K samples will be the recogitio result. 81

82 Figure 35 Propagatio steps for video-based recogitio Before we preset the experimetal results, first we will itroduce the Face-I-Actio video database we are collectig. 5.3 Face-I-Actio video database As more ad more researchers are startig to work o video-based face recogitio, as opposed to traditioal image-based face recogitio, there is more demad for a database of video sequeces cotaiig huma faces. With such a database, the beefits of video-based face recogitio ca be explored. We are makig the effort to collect such a face video database, called Face I Actio (FIA) database [49]. 82

83 5.3.1 Capturig sceario There are may existig databases cotaiig face images uder cotrolled coditios, such as FERET[60], PIE [67], ORL [64], Xm2vts [57], etc. However, whe collectig a face database i videos, we have to brig i motio. Based o our study, we cosider passport checkig as the most popular motio sceario for real-world applicatios of face recogitio techiques. As show i Figure 36, i a cotrolled eviromet with the blue backgroud, multiple cameras are poitig at the desk from three differet agles. The cameras capture the whole process of the subject s walkig approachig the desk, stadig i frot of the desk, makig simple coversatio, head motio that might happe durig passport checkig, ad fially walkig away from the desk. The resultig video hece cotais the movig head while the subject is walkig, user-depedet pose variatio due to atural motio of the head, lip movemets ad expressio variatios durig coversatio. Actually this capturig sceario is ot oly mimickig passport checkig, it is also highly represetative for may other daily scearios, such as checkig i a hotel, visitig the hospital or govermetal offices, etc. Figure 36 FIA capturig sceario: multiple cameras are capturig faces while the subject is mimickig i goig through the airport passport checkig. 83

84 5.3.2 Capturig system I face database collectio, oe samples the face i multiple dimesios, such as pose, illumiatio, expressio, agig, etc. I our FIA capturig system, we sample i the followig dimesios: motio, pose, image resolutio, illumiatio ad variatios over time. Motio is sampled by cotiuous videos at 30 frames per secod. Pose is sampled by capturig faces from three differece directios simultaeously. The image resolutio is sampled by usig cameras with two differet focus legths. Illumiatio is sampled by capturig faces i both idoor ad outdoor scearios. Variatios over time are sampled by capturig three differet sessios each spaig three moths. As show i Figure 37, we built a cart for moutig the capturig system. O the C- shape arm, there are 6 cameras. All cameras are poitig to the same ceter spot ad have the same distace (0.83M) to that spot. Each camera is able to capture video sequeces with 640 by 480 frame size i 30 frames per secod. Six cameras are arraged ito three pairs. Sice the C- shape arm ca be adjusted vertically by the liear bearig accordig to the height of the subject, a face is essetially captured by three pairs of cameras with the same vertical agle but differet horizotal agles (-60, 0, 60 ) respectively. Withi each pair of cameras, oe has 4mm focallegth, which results i the face area with aroud 300 by 300 pixels, ad the other has 8mm focal-legth, which results i the face area with aroud 100 by 100 pixels. The video sequece with larger face area ca be used for applicatios demadig high-resolutio face images, such as 3D recostructio, while the smaller oe is closer to the face data i video surveillace applicatios. Figure 38 shows the picture of the camera cart. Three light bulbs are placed o the cart so as to create a ambiet lightig eviromet for capturig. Two carts are used for capturig huma faces i the idoor ad outdoor sceario respectively. There are three differeces betwee the idoor ad outdoor sceario. First is that 84

85 there is o cotrolled illumiatio i the outdoor sceario. Secod is that o blue backgroud will be placed for the outdoor sceario. Third is that either color or camera calibratio is performed for the outdoor sceario. Thus the sequeces from the outdoor sceario ca be used to study how well the video-based face recogitio performs i the atural illumiatio. To capture variatios over time, we are plaig to capture 200 subjects i three differet sessios each spaig three moths. For oe sessio, both idoor ad outdoor sceario will be captured. Six sequeces are captured simultaeously for 20 secods i each sceario. Havig itroduced the camera cart, we ow preset the system cofiguratio, as show i Figure 39. We use the Dragofly camera from Poit Grey Research Ic.[62], which is a OEM-style IEEE-1394 board level camera. Based o the data rate we are capturig (640 by 480 by 30 frames by 20 secods), oe IEEE-1394 bus ca oly allow the data stream of three cameras. Although three cameras o the same bus are sychroized, we would have to sychroize two buses ad thus all six cameras are sychroized. The SYNC Uit [62] plays the role of sychroizig two differet IEEE-1394 buses. Evetually all six camera streams are saved oto the hard driver of oe computer. Based o our experieces, the speed of the hard drive, rather tha the CPU speed, is the bottleeck of the capturig system. Curretly we use more memory as the cache to compesate the ot-fast-eough hard driver. For each subject, we collect the followig data: six 20 secods face sequeces at 30 frames per secod, for both idoor ad outdoor scearios, for 3 sessios i total. We store persoal iformatio for each subject, such as, age, geder, glasses, beard, mustache, etc. Also for each idoor sceario, we provide the color calibratio data usig the color checker, ad the camera calibratio data usig the check board. 85

1.6 M Liear brearig 0.8 M 0.58 M 1.7 M 0.5 M 1.5 M Figure 37 The desig of the camera cart: six cameras are grouped ito three pairs ad mouted o a height-adjustable arm.

86 1.6 M Liear brearig 0.8 M 0.58 M 1.7 M 0.5 M 1.5 M Figure 37 The desig of the camera cart: six cameras are grouped ito three pairs ad mouted o a height-adjustable arm. Figure 38 Camera cart ad lights: 3 light bulbs are used to create a ambiet lightig eviromet. Figure 39 System cofiguratio: six cameras are coected to two IEEE-1394 buses o the computer; the SYNC uit sychroizes two buses. 86

87 5.3.3 Specificatios ad samples I summary, the specificatio of the FIA database are listed i the followig: 200 subjects. 3 sessios per subject. 2 scearios per sessio (idoor ad outdoor). Color ad camera calibratio data for the idoor sceario. 6 sequeces per sceario. 20 secods per sequece. 30 frames per secod. 640 by bits color image per frame Storig mage data for each subject, such as age, geder, etc. Savig each image i JPEG format with 90% quality (100K). Total storage of the database: 100k*30*20*6*2*3*200=412G. Oe sample sapshot from six cameras ca be see i Figure 40. Three images i the top row are captured by 8mm focal-legth cameras. The others three images are captured by 4mm focallegth cameras. Figure 41 shows the sample images from oe sequece i the FIA database. Substatial pose variatio ca be observed from this sequece. 87

Figure 40 A sample sapshot from 6 cameras: top images are from cameras with loger focal-legth; bottom images are from cameras with short focal-legth; each colum

Figure 41 Sample images of oe sequece i the FIA database: substatial pose variatio ca be observed from this database. 5.4 Experimetal results 5.4.1 Face trackig Sice a face mosaic model describes the facial appearace from multiple views, we ca use it for performig pose robust face recogitio.

88 Figure 40 A sample sapshot from 6 cameras: top images are from cameras with loger focal-legth; bottom images are from cameras with short focal-legth; each colum are images from a pair of camera eighbor to each other. Figure 41 Sample images of oe sequece i the FIA database: substatial pose variatio ca be observed from this database. 5.4 Experimetal results Face trackig Sice a face mosaic model describes the facial appearace from multiple views, we ca use it for performig pose robust face recogitio. There are two methods for model-based face trackig. Oe is to use the codesatio method. The other is to use the Leveberg-Marquardt 88

89 algorithm. The secod optio is faster, however might have slightly worse performace comparig to the first oe. We use the patch-pca or global-pca mosaic model for the first optio depedig o the umber of traiig images. I the secod optio, we ca use the mosaic model without deviatio modelig, for targetig at fast trackig speed. The trackig result of oe sequece usig the idividual patch-pca mosaic is show i Figure 42. The while circle shows the result of the face positio, ad two curves show the result of horizotal ad vertical rotatio agles. We ca see these two curves always across the eyes ad ose area across frames. We use 500 samples i the codesatio method. The trackig ca be performed at aroud 2 frames per secod. Figure 42 Trackig results based o the patch-pca mosaic model: horizotal ad vertical lie idicates the estimated pose i two directios Face recogitio We have performed a experimet o the FIA database. There are 29 subjects i the database, with 10 sequeces per subject as the test sequeces. Each sequece has 50 frames, ad the first frame is labeled with the groud truth data. We use the idividual PCA algorithm with the image-based recogitio ad the idividual PCA with the video-based recogitio as the baselie algorithms. For both algorithms, 9 images are used for traiig ad the best 89

90 performaces are reported by tryig differet umber of eigevectors. The algorithms work best whe the umber of eigevectors is 4. For example, Figure 43 shows the 9 traiig images for oe subject i the FIA database. The face locatio of traiig images is from the maual labelig, while that of the test images is based o the trackig results usig our mosaic model. All face images are cropped to be 64 by 64 pixels from the video frame. For our algorithms, we tested three differet optios. First is to use the idividual patch- PCA mosaic with image-based recogitio, which uses the averaged distace betwee the frames to the mosaic model as the fial distace measure. There are 9 images per subject as the traiig images. Secod is to use the idividual patch-pca mosaic with video-based recogitio, which uses the 2D codesatio method to perform trackig ad recogitio. The same set of traiig images are used. The third is similar to the secod optio except that oly oe traiig image per subject is used. Thus oly oe texture map from each traiig image ca be used for traiig. We illustrate the mea images of the idividual models i three methods. We ca observe dramatic blurrig effect i the mea from the idividual PCA model. O the other had, the mea of our idividual patch-pca mosaic model covers large pose variatio while still keeps eough idividual facial characteristic. Sice there is oly oe frot traiig image i the third optio of the mosaic method, the mea is oly the texture map of that image. The recogitio performace of the baselie algorithms ad our approaches are show i the followig table. Two observatios ca be made from here. First, give the same model, such as the PCA model or the mosaic model, video-based face recogitio is better tha the imagebased recogitio. Secod, the mosaic model works much better tha the PCA model for poserobust recogitio. The third optio works worse tha the first two optios sice it has oly traiig image per subject. 90

Figure 43 9 traiig images from oe subject i the FIA database Figure 44 The meas of idividual model i three methods (left: Idividual PCA traied from 9 images, middle: mosaic model traied from 9

91 Figure 43 9 traiig images from oe subject i the FIA database Figure 44 The meas of idividual model i three methods (left: Idividual PCA traied from 9 images, middle: mosaic model traied from 9 images, right: mosaic model traied from 1 image). PCA with imagebased method Table 2 Recogitio error rate of differet algorithms PCA with videobased method Mosaic with image-based method Mosaic with video-based method Mosaic with video-based method (1 traiig image) 17.24% 8.97% 6.90% 4.14% 9.66% 5.5 Coclusios I this chapter, by usig the face mosaic model, we are able to perform face trackig ad recogitio simultaeously eve dramatic pose variatio is preset i the test sequeces. We itroduce the face trackig usig two differet algorithms: the codesatio method ad the Leveberg-Marquardt algorithm. We preset two methods of itegrated face trackig ad recogitio scheme: image-based method ad video-based method. We also preset the collectio effort of the FIA face video database. We apply our algorithm o the FIA database ad obtai satisfyig trackig ad recogitio performace. 91

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for