Development of an Active Shape Model. Using the Discrete Cosine Transform

Size: px

Start display at page:

Download "Development of an Active Shape Model. Using the Discrete Cosine Transform"

Opal Owen
5 years ago
Views:

1 Development of an Actve Shape Model Usng the Dscrete Cosne Transform Kotaro Yasuda A Thess n The Department of Electrcal and Computer Engneerng Presented n Partal Fulfllment of the Requrements for the Degree of Master of Appled Scence at Concorda Unversty Montreal, Quebec, Canada Aprl 2014 Kotaro Yasuda, 2014

2 CONCORDIA UNIVERSITY SCHOOL OF GRADUATE STUDIES Ths s to certfy that the thess prepared By: Enttled: Kotaro Yasuda Development of an Actve Shape Model Usng the Dscrete Cosne Transform and submtted n partal fulfllment of the requrements for the degree of Master of Appled Scence Comples wth the regulatons of ths Unversty and meets the accepted standards wth respect to orgnalty and qualty. Sgned by the fnal examnng commttee: Char Dr. M. Z. Kabr Examner, External Dr. C. Y. Su (M.I.E.) To the Program Examner Dr. M. N. S. Swamy Supervsor Dr. M. O. Ahmad Approved by: Dr. W. E. Lynch, Char Department of Electrcal and Computer Engneerng 20 Dr. C. W. Trueman Interm Dean, Faculty of Engneerng and Computer Scence

3 Abstract Development of an Actve Shape Model Usng the Dscrete Cosne Transform Kotaro Yasuda Facal recognton systems have been successfully appled n securty, lawenforcement and human dentfcaton applcaton, for automatcally dentfyng a human n a dgtal mage or a vdeo frame. In a feature-based face recognton system usng a set of features extracted from each of the promnent facal components, automatc and accurate localzaton of facal features s an essental pre-processng step. The actve shape model (ASM) s a flexble shape model that was orgnally proposed to automatcally locate a set of landmarks representng the facal features. Varous mproved versons of ths model for facal landmark annotaton have been developed for ncreasng the shape fttng accuracy at the expense of sgnfcantly ncreased computatonal complexty. Ths thess s concerned wth developng a low-complexty actve shape model by ncorporatng the energy compacton property of the dscrete cosne transform (DCT). Towards ths goal, the proposed ASM, whch utlzes a 2-D profle based on the DCT of the local grey-level gradent pattern around a landmark, s frst developed. The ASM s then utlzed n a scheme of facal landmark annotaton for locatng facal features of the face n an nput mage. The proposed ASM provdes two dstnct advantages: () the use of a smaller number of DCT coeffcents n buldng a compressed DCT profle

4 sgnfcantly reduces the computatonal complexty, and () the process of choosng the low-frequency DCT coeffcents flters out the nose contaned n the mage. Smulatons are performed to demonstrate the superorty of the proposed ASM over other mproved versons of the orgnal actve shape model n terms of the fttng accuracy as well as n terms of the computatonal complexty. It s shown that the use of the proposed model n the applcaton of facal landmark annotaton sgnfcantly reduces the executon tme wthout affectng the accuracy of the facal shape fttng. v

5 ACKNOWLEDGEMENT I would lke to express the deepest apprecaton to my thess supervsor, Dr. M. Omar Ahmad, for hs gudance and support throughout the development of ths thess. Wthout hs supervson, ths thess would not have been possble. I would also lke to thank my famly for ther constant support for me to pursue my study at Concorda Unversty. v

6 To my grandfather v

7 TABLE OF CONTENTS LIST OF FIGURES... x LIST OF TABLES... x LIST OF SYMBOLS... xv LIST OF ACRONYMS... xv Chapter 1 Introducton Background A Bref Revew of Actve Shape Model Objectve and Scope of the Thess Organzaton of the Thess... 6 Chapter 2 Background Materal Actve Shape Model Pont Dstrbuton Model Local Grey-Level Gradent Model Search of a Face n an Image Mult-Resoluton Actve Shape Model Actve Shape Model wth 2-D Profles Actve Shape Model wth a PCA-based LGGM Dscrete Cosne Transform D DCT D DCT Summary Chapter 3 Proposed Actve Shape Model Usng Dscrete Cosne Transform Introducton v

8 3.2 Proposed Actve Shape Model usng a DCT-Based Gradent Profle Samplng Computaton of the Profle Matrx Computaton of the DCT Coeffcents Selecton of the DCT Coeffcents Normalzaton and Equalzaton of the Selected DCT Coeffcents An Illustratve Example Samplng Computaton of the Profle Matrx Computaton of the DCT Coeffcents Selecton of the DCT Coeffcents Normalzaton and Equalzaton of the Selected DCT Coeffcents Computatonal Complexty Analyss Summary Chapter 4 Expermental Results of Applyng the Proposed Actve Shape Model n a Facal Annotaton Applcaton Facal Annotaton usng the Proposed Actve Shape Model Generatng an Intal Model Shape Refnng the Landmark Postons usng LGGM Fndng a Model Shape by Calculatng the Model and Transformaton Parameters Updatng the Model and Transformaton Parameters Databases Evaluaton Method Expermental Results Expermental Settngs Results and Analyss of the Proposed Actve Shape Model v

9 Comparson of the Facal Annotaton Performance usng the Proposed and Other Actve Shape Models Summary Chapter 5 Concluson and Future Work Concludng Remarks Future Work References x

10 LIST OF FIGURES Fgure 2.1: The landmarks manually labelled on a facal mage Fgure 2.2: Tranng shapes used to buld the PDM Fgure 2.3: Fgure 2.4: Fgure 2.5: Fgure 2.6: Fgure 2.7: Fgure 2.8: Generated shapes by varyng each of the frst three modes (parameters) wthn the three standard devatons of the mean shape Drectons of the 1-D grey-level ntensty gradent profle used n [33] An example of the rectangular regon detected by the face detector,and the mean shape mapped onto the mage wth ( x c, yc) as the reference pont An example of the ntal model shape approxmately ftted n the rectangular regon Sampled ntensty gradent profle ĝ, the subset gd extracted from ĝ, the mean profle g from the LGGM algned to the subset g d, f ( g and the cost of ft d ) Image pyramd wth 3 levels and ther relatonshps n terms of the sze Fgure 2.9: Mult-resoluton teratve shape search method Fgure 2.10: The square regon around each landmark used to generate the 2-D profle Fgure 2.11: A plot of the cosne bass functons for N = Fgure 2.12: A plot of the bass mages of the 2-D DCT for N = Fgure 2.13: Row-Column Decomposton of the 2-D DCT Fgure 3.1: An llustraton of an 8 x 8 square regon (shown n yellow color) around a landmark (shown as a red dot) annotated on the upper lp of an mage, used for computng a 2-D profle for the landmark x

11 Fgure 3.2: Fgure 3.3: Fgure 3.4: Fgure 3.5: An llustraton of a 3 x 3 spatal flter mask utlzed for computng the profle matrx of a landmark An llustraton of a 3 x 3 grey-level ntensty regon centered at the pxel poston (x, y), utlzed for computng the response of a 3 x 3 spatal flter Examples of some of the 3 x 3 flter masks that can be used for computng the profle matrx of a landmark. (a) A mask that does not make use of neghbourhood nformaton around the landmark. (b) A mask that utlzes the nformaton on the forward ntensty change n the x-drecton. (c) A mask that utlzes the nformaton on the forward ntensty change n the y-drecton. (d) A mask that captures the forward ntensty varatons along both the x- and y- drectons. (e) A mask that captures the forward and backward ntensty varatons along the x- and y-drectons. (f) A mask that captures the forward and backward ntensty varatons along the dagonals as well as the x- and y-drectons An llustraton of addtonal pxel ntensty values requred for the computaton of an 8 x 8 profle matrx G correspondng to a gven landmark C Fgure 3.6: An llustraton of a zg-zag scannng of an 8 x 8 array G of the DCT coeffcents Fgure 3.7: Fgure 3.8: Fgure 3.9: Fgure 3.10: Fgure 3.11: An mage utlzed for buldng a 2-D profle of a landmark (shown as a red dot) annotated on the face contour and the 8 x 8 square (shown n yellow color) around t An llustraton of the grey-level pxel ntensty values requred for buldng an 8 x 8 profle matrx for the landmark shown n Fgure An llustraton of (a) a 3 x 3 flter mask that captures the forward ntensty varatons both along the x- and y-drectons, and (b) a 3 x 3 grey-level ntensty regon centered at the pxel poston (0, 7), utlzed for computng the flter response n ths example The 8 x 8 array of the flter response (profle matrx) of the landmark n the mage of Fgure The array of the DCT coeffcents of the profle matrx of Fgure x

12 Fgure 3.12: Fgure 4.1: A zg-zag scannng of the 8 x 8 array of the DCT coeffcents of the profle matrx correspondng to the landmark of Fgure Examples of the samples from the MUCT database consstng of a facal mage and 76 manually annotated landmarks representng partcular parts of a face n the mage Fgure 4.2: The fve dfferent postons of the camera Fgure 4.3: Fgure 4.4: Fgure 4.5: Fgure 4.6: Fgure 4.7: Fgure 4.8: Fgure 4.9: Fgure 4.10: Examples of the mages of an ndvdual photographed from the frontal camera postons under the same lghtng condton. (a) An mage photographed from the camera poston a. (b) An mage photographed from the camera poston d. (c) An mage photographed from the camera poston e Examples of the mages of an ndvdual photographed under three dfferent lghtng condtons. (a) An mage photographed under the lghtng condton q. (b) An mage photographed under the lghtng condton r. (c) An mage photographed under the lghtng condtons Some examples of the samples from the BoID database consstng of a facal mage and 20 manually annotated landmarks Some examples of the samples from the IMM database consstng of a facal mage and 58 manually annotated landmarks An example of the mrrored pars of the samples from the MUCT database. (a) The mage of a sample from the MUCT database. (b) The mrrored mage of the same sample Two landmarks representng the extreme ponts on the left and rght eyes, and the dstance s between the two landmarks The 17 landmarks nternal to the face utlzed for computng the average normalzed fttng error n the experment wth the frst group of samples An example of the ntal and fnal model shapes of the automatc facal annotaton system. (a) The ntal model shape (shown n red color) mapped onto an nput mage usng a rectangle regon contanng a face (shown n yellow color). (b) The fnal model shape (shown n red color) that fts closely to the face x

13 LIST OF TABLES Table 3.1: Table 4.1: Table 4.2: Table 4.3: Table 4.4: Table 4.5: Table 4.6: Table 4.7: Computatonal complexty of each step of the process of buldng the profles of a landmark for the stacked and proposed ASMs Summary of the three databases used for buldng the PDM and LGGM parts of the proposed ASM and performng the facal shape search usng the ASM Summary of the number of tranng and test sets n the two groups of the mages chosen from the three databases and the number of landmarks n each facal shape Average normalzed fttng error obtaned from the experment wth Group 1 of samples by usng dfferent combnatons of the number of DCT coeffcents and a flter mask Average normalzed fttng error obtaned from the experment wth Group 2 of samples by usng dfferent combnatons of the number of DCT coeffcents and a flter mask Summary of the parameters utlzed for a facal landmark annotaton usng the stacked ASM, ASM wth a PCA-based LGGM and the proposed ASM Mean values of average normalzed fttng error and average executon tme obtaned by usng the proposed and two other actve shape models from the experment usng the tranng and test sets of Group 1 (the MUCT and BoID databases) Mean values of average normalzed fttng error and average executon tme obtaned by usng the proposed and two other actve shape models from the experment usng the tranng and test sets of Group 2 (the IMM database) x

14 LIST OF SYMBOLS x y x x S s The x-coordnate of the th landmark The y-coordnate of the th landmark Vector contanng the x- and y-coordnates of the n landmarks Mean of the vectors contanng the x- and y-coordnates of the n landmarks over the N tranng shape Covarance matrx of the devatons from the mean vector v k k P b x s The kth egenvector of The kth egenvalue of S s S s Matrx whose columns are the t egenvectors correspondng to the frst t largest egenvalues Vector of weghts appled to the t egenvectors Target shape T Smlarty transform that transforms the model shape from the model space to the mage space by scalng, rotatng and translatng the model shape ( c c x, y ) Center pont of a rectangular regon contanng a face w h Wdth of a rectangular regon contanng a face Heght of a rectangular regon contanng a face ( x y t, t ) Translaton parameters s Scalng parameter Rotaton parameter xv

15 g j Grey-level ntensty gradent profle for the th landmark of the jth shape g jk The kth element n g j ĝ j Normalzed grey-level ntensty gradent profle for the th landmark of the jth shape g Grey-level ntensty gradent profle for the th landmark g Mean grey-level ntensty gradent profle for the th landmark S Covarance matrx of the devatons from the mean grey-level ntensty gradent profle for the th landmark ĝ Grey-level ntensty gradent profle for the th landmark g d Subset of ĝ at a canddate poston d g Mean grey-level ntensty gradent profle for the th landmark f ( g d ) Mahalanobs dstance between the mean grey-level ntensty gradent profle for the th landmark and a canddate profle at a poston d g d best Subset of ĝ at a best poston d ( xbest t ybest t, ) Best translaton parameters s best Best scalng parameter best Best rotaton parameter w (, j) Response of a lnear spatal flter G Profle Matrx G ( x, y) Flter response at the pxel poston (x, y) g j The jth element of g g j Normalzed jth element of g xv

16 g j g Normalzed and equalzed jth element of g Normalzed and equalzed grey-level ntensty gradent profle for the th landmark C G 2-D array of the DCT coeffcents representng the profle matrx G Y C G Vector contanng a zg-zag sequence of C G Y n c Vector contanng the frst n c DCT coeffcents representng the lowfrequency components of the profle matrx G y The th element n nc Y y Normalzed th element n nc Y Y n c Vector contanng the normalzed elements of Y nc y Normalzed and equalzed th element n Y nc Y n c Vector contanng the normalzed and equalzed elements of Y nc a Vector contanng the x- and y-coordnates of the th landmark of the model shape b d Vector contanng the x- and y-coordnates of the th landmark of the target shape Eucldean dstance between two ponts gven by a and b E average Average normalzed fttng error xv

17 LIST OF ACRONYMS ASM PDM LGGM MRASM STASM PCA DCT GPA DFT FFT MUCT IMM Actve shape model Pont dstrbuton model Local grey-level gradent model Mult-resoluton actve shape model Stacked actve shape model Prncpal component analyss Dscrete cosne transform Generalzed procrustes analyss Dscrete Fourer transform Fast Fourer transform Mlborrow / unversty of Cape Town Informatcs and mathematcal modellng xv

18 CHAPTER 1 Introducton 1.1 Background In recent advances n bometrcs, facal recognton has been one of the promsng research areas. The goal of facal recognton s to automatcally dentfy a human n a dgtal mage or a vdeo frame usng faces stored n a database. Facal recognton has been successfully appled n securty, law-enforcement and human dentfcaton applcatons [1]. In general, a face recognton system conssts of three stages, acquston, feature extracton and classfcaton. In the acquston stage, a regon contanng a face s detected from an nput mage, and some pre-processng s performed on the detected regon. In the feature extracton stage, a set of meanngful features s extracted from the detected regon. A database of feature set s bult n the tranng phase, usng the feature sets extracted from a set of tranng mages. In the classfcaton stage, the dstance between the feature set extracted from a test mage and each feature set n the database s measured usng a dstance metrc to dentfy the face n the test mage. Facal recognton systems can be categorzed nto classes of systems employng holstc approaches [1]-[9] and feature-based approaches [10]-[19]. In the holstc approach, the feature set s extracted from the entre face regon n an mage, whereas n the feature-based approach, the feature set s extracted from each of the promnent facal 1

19 components, such as eyes, nose and mouth. Snce these facal components are treated separately, the feature-based approach s more robust to postonal varatons of the face n the mage compared to the holstc approach [20]. Ths approach, however, reles heavly on an accurate localzaton of facal features. Thus, automatc and accurate localzaton of facal features s an essental pre-processng step n a feature-based facal recognton system. Shape models capable of encompassng facal components could be useful for localzng facal features. In the lterature, varous approaches that model shape and shape varatons have been proposed [21]-[31]. Actve shape model (ASM) [31] s a flexble model that has wdely been utlzed n order to automatcally locate a set of landmarks representng a target object n an mage [32]-[38]. In an applcaton of ASM n facal landmark localzaton, the facal features are represented by a set of landmarks, and ASM automatcally locates these landmarks by fttng ts model shape to a facal shape n an mage. 1.2 A Bref Revew of Actve Shape Model Actve shape model (ASM) was frst ntroduced n 1994 n [31], and has been successfully employed to varous computer vson applcatons ncludng locatng organs n medcal mages, face recognton and hand-wrtten character recognton [31]-[34]. It utlzes a pont dstrbuton model (PDM) and a local grey-level gradent model (LGGM) to teratvely ft the model shape generated by ASM to a target object n an mage. The pont dstrbuton model s used for generatng shapes smlar to those n the tranng set, and the LGGM s bult usng 1-D profles, each of whch represents a local grey-level 2

20 gradent pattern along a lne centered at a partcular landmark of the model shape. The local grey-level gradent pattern s then utlzed for searchng a new poston of each landmark along a lne centered at the landmark. The model shape s teratvely reformed for ts better ft to the target object n the mage. The classcal ASM performs well f the landmarks of the ntal model shape are placed close to ther targets. However, the ntal model shape could be placed only roughly whereby the landmarks often get placed away from ther targets. Thus resultng n long search lnes and consequently makng the target search computatonally expensve. It could also dstract the landmarks by local structures n the mage. In order to overcome the above drawbacks of the classcal ASM, a multresoluton approach of ASM, known as mult-resoluton ASM (MRASM), s proposed n [34]. In ths scheme, an ASM s frst appled to a coarse mage to roughly place the model shape near the target object, and then appled to fner mages to refne the shape fttng. An mage pyramd contanng a set of mages wth dfferent resolutons s used n order to buld the LGGM for multple mage resolutons. For each level of the mage pyramd, the correspondng LGGM s utlzed for searchng a new poston of each landmark. Unlke the classcal ASM, the mult-resoluton approach requres only short search lnes; thus reducng the rsk of dstracton of the landmarks by local structures n the mage, and decreasng the computatonal complexty [34]. However, the 1-D profle used n the classcal ASM and MRASM does not suffcently represent the grey-level ntensty gradent pattern around each landmark to dstngush the landmarks from one another [39]. Consequently, the search can converge to a local mnmum and produces poor fttng results. 3

21 For capturng grey-level ntensty gradent nformaton around each landmark more accurately, 2-D profles are ntroduced n [35]. A 2-D profle s bult by samplng grey-level ntenstes of a square regon around each landmark, and computng ther ntensty gradents n the x and y drectons. Ths profle represents the ntensty gradent nformaton n a larger regon around the landmark compared to that n the 1-D profle. Stacked actve shape model (STASM), ntroduced n [35], proposes the use of two ASM searches sequentally, a search usng the classcal ASM wth 1-D profles to roughly place the model shape, and then a search usng an ASM wth 2-D profles to refne the shape fttng. Ths approach yelds relatvely more accurate shape fttng results. However, the search can stll converge to a local mnmum [39]. Furthermore, the use of 2-D profles sgnfcantly ncreases the computatonal complexty of the search. In order to further mprove the shape fttng accuracy, a model, referred to as an actve shape model wth a PCA-based LGGM, s proposed n [36]. It employs the prncpal component analyss (PCA) n the LGGM to model the varatons of grey-level ntensty gradents n a square regon around each landmark. In ths scheme, a canddate 2-D profle of a partcular landmark s mapped onto a PCA subspace to obtan a set of projecton coeffcents, and a new profle s constructed by usng ths set of coeffcents. The weghted reconstructon error between the new profle and the canddate profle s computed to determne a new poston of the landmark [36]. The use of the PCA contrbutes to mprovng the shape fttng accuracy and to reducng the rsk of the search convergng to a local mnmum [39]. However, the computatonal complexty of ASM wth a PCA-based LGGM s much larger than that of the technques usng other versons of ASM, snce the computaton of the PCA s expensve. 4

22 Several other mproved versons of ASM have also been reported n the lterature. An mproved ASM usng the edge nformaton s proposed n [37] n order to mprove the shape fttng of the landmarks on the face contour by buldng better local texture models for the landmarks. The authors n [38] employ the color nformaton to detect the center of the mouth and the eyes to mprove the shape ntalzaton of ASM. These enhancements contrbute to slghtly mprovng the shape fttng accuracy wth a small ncrease n the computatonal complexty. However, the shape fttng accuracy s much lower than that obtaned by the stacked ASM and ASM wth a PCA-based LGGM. 1.3 Objectve and Scope of the Thess As seen from the revew of the actve shape model and ts modfed versons, conducted n the prevous secton, the stacked actve shape model as proposed n [35] provdes a sgnfcant mprovement n the shape fttng accuracy by usng both the 1-D and 2-D profles. The actve shape model wth a PCA-based LGGM developed n [36] ncreases the accuracy of the shape fttng even further by makng use of a subspace offered by the PCA technque. However, these mprovements are acheved at the expense of a substantal ncrease n the computatonal complexty. Ths thess s amed at developng a low-complexty actve shape model by ncorporatng the energy compacton property of the dscrete cosne transform (DCT). The proposed ASM utlzes a novel 2-D profle of a landmark, whch s based on the dscrete cosne transform, n order to reduce the computatonal complexty wthout affectng the facal shape fttng accuracy. The development of the proposed model puts emphass on reducng the executon tme of a facal shape search whle provdng a shape 5

23 fttng accuracy that s better or the same as that provded by the stacked ASM or ASM wth a PCA-based LGGM. 1.4 Organzaton of the Thess Ths thess s organzed as follows. Chapter 2 presents an overvew of the actve shape model (ASM) and the dscrete cosne transform (DCT) as an essental background materal for the development of the work undertaken n ths thess. Some mproved versons of ASM to be used as a bass for developng the proposed ASM are also presented. In Chapter 3, a low-complexty ASM, whch utlzes a novel 2-D profle constructed by makng use of the energy compacton property of the DCT, s proposed. The 2-D profle conssts of a subset of the DCT coeffcents of the profle matrx representng the local grey-level gradent pattern around a partcular landmark of a facal shape. It s shown that the sze of the new profle s much smaller than that of the conventonal spatal-doman 2-D profle, snce the new profle s bult by usng only a small subset of DCT coeffcents. A detaled buldng process of the novel profle s presented along wth an llustratve example that demonstrates the buldng process. A theoretcal computatonal complexty analyss of stacked ASM [35] and that of the proposed ASM s also presented n ths chapter. In Chapter 4, a facal shape search scheme usng the proposed ASM for localzng a facal shape n an mage s presented along wth a detaled step-by-step procedure. Experments are then performed n order to examne the effectveness of the proposed ASM n an automatc facal landmark annotaton of frontal faces. The performance 6

24 results of the scheme of the facal landmark annotaton, usng the proposed and two other ASMs, namely, stacked ASM [35] and ASM wth a PCA-based LGGM [36], are provded and compared n terms of the shape fttng accuracy as well as n terms of the computatonal complexty. Chapter 5 concludes ths thess by summarzng the man contrbutons made n ths thess, and suggestng some future work that can be undertaken along the deas and schemes presented n ths thess. 7

25 CHAPTER 2 Background Materal Ths chapter provdes an overvew of actve shape model (ASM) and dscrete cosne transform (DCT) whch are essental background materals n the dscussons of the development and soluton presented n ths thess. In the next secton, the ASM s brefly explaned to show the essence of the problem addressed n ths thess. Then, an overvew of the DCT upon whch the soluton to the problem n ths thess s bult s gven. Fnally, a concse summary of ths chapter s presented. 2.1 Actve Shape Model Actve shape model s a flexble shape model wdely used n automatcally locatng a set of landmarks that forms the shape of any known object n an mage. In general, the shape generated by the ASM can be freely deformed by adjustng the model parameters to best ft the target object n the mage. The model parameters are constraned to generate shapes that are consstent wth the shapes n the tranng set [33]. The actve shape model conssts of two sub-models called pont dstrbuton model and local grey-level gradent model. In ths secton, the shape of human faces s characterzed by usng the ASM. 8

26 Pont Dstrbuton Model Pont dstrbuton model (PDM) s a deformable shape model, whch s the foundaton of ASM, and t s bult by capturng the shape varatons n a gven set of shapes. The modellng process nvolves collectng and preparng the tranng shapes, algnng the collected shapes, and capturng the statstcs of the algned shapes. A. Collectng and Preparng the Tranng Shapes In order to correctly model the shape of human faces, t s frst represented by a set of landmarks. Each landmark descrbes a partcular part of the face such as the tp of nose and the eye pupls as shown n Fgure 2.1. A set of tranng mages s prepared to collect a range of facal shapes of varous ndvduals. The landmarks are then manually labelled for each tranng mage to generate a set of tranng shapes. Fgure 2.1: The landmarks manually labelled on a facal mage. 9

27 B. Algnng the Collected Shapes The tranng shapes prevously prepared all dffer n sze, orentaton and poston, snce these shapes come from dverse condtons and ndvduals. Snce the PDM s bult by analyzng the coordnates of each landmark of the tranng shapes, correspondng landmarks from dfferent shapes must be algned to one another wth respect to a set of axes [31]. Wth ths n mnd, algnment of the shapes s done by usng the generalzed procrustes analyss (GPA) whch scales, rotates and translates each tranng shape so that the correspondng landmarks n the varous mages of the tranng set are as close as possble [42]. The algned shapes are then statstcally analyzed as descrbed next. C. Capturng the Statstcs of the Algned Shapes Usng the algned shapes, statstcal nformaton of the tranng shapes s obtaned. Frst, each shape formed by n landmarks s represented by a shape vector where x and x x, x x, y, y y ] (2.1) [ 1 2 n 1 2 n y are the x and y coordnates of the th landmark, and n s the number of landmarks. Prncpal component analyss (PCA) s then performed to analyze the varaton of the algned shapes. The mean of the N tranng shapes s calculated as where 1 N x N x (2.2) 1 x s the th shape n the set of the N tranng mages. By subtractng the mean shape calculated above from each tranng shape, the covarance matrx of the devatons from the mean s calculated as follows. 10

28 1 N S ( x x)( x x) T (2.3) s N 1 The egenvectors and egenvalues of the covarance matrx are subsequently obtaned by usng the equaton, S v s k v (2.4) k k where v k and k are, respectvely, the kth egenvector and egenvalue of S s, and k = 1, 2,, 2n. The egenvectors correspondng to the largest egenvalues are the prncpal axes n the 2n-dmensonal space, and descrbe the sgnfcant varaton of the shapes [31]. The majorty of varatons can be descrbed by the egenvalues correspondng to the t largest egenvalues (t < 2n), snce the egenvectors correspondng to the smaller egenvalues have very small effects on the shape varaton. As a result, any shape x n the tranng set can be approxmated by the mean shape x and a weghted sum of t egenvectors. Ths s mathematcally represented as x x Pb (2.5) where P v, v, ) s a matrx whose columns are the t egenvectors correspondng ( 1 2 vt to the frst t largest egenvalues, and T b [ b 1, b2,, b t ] s a vector of weghts appled to the egenvectors. These weghts (b k ) are n fact the model parameters whch control the shape generated by the PDM; thus, new facal shapes can be generated by varyng these parameters. At the same tme, these parameters are constraned n order for the model shape to be consstent wth those n the tranng set. Snce most of the populaton les wthn three standard devatons of the mean, the model parameters are chosen to le wthn the range gven by 3 3 (2.6) k bk k 11

29 where k s the egenvalue that corresponds to the kth egenvector of the t egenvectors. To llustrate how the PDM works, a small set of tranng shapes as shown n Fgure 2.2 s used to buld the PDM. Fgure 2.3 shows the shape generated by the PDM by varyng each of the frst three modes (the model parameters), b 1, b 2 and b 3, n the range specfed by (2.6). As seen from ths fgure, varaton of the model parameters results n the model shapes that are devated versons of the mean shape. Fgure 2.2: Tranng shapes used to buld the PDM [32]. The pont dstrbuton model gven by (2.5) s utlzed to approxmate a gven target shape x s through an approprate choce of the model parameters b by mnmzng the expresson x s 2 T ( x Pb) (2.7) where T s a smlarty transform that transforms the model shape from the model space to the mage space by scalng, rotatng and translatng the model shape. The model 12

30 parameters b best and the transformaton parameters ( t x, ty, s, ) best that make the resultng model shape to best ft the target shape are found. Thus, an approxmaton of the target shape can be smply represented by these parameters. Local grey-level gradent model descrbed next s utlzed to move each landmark of the model shape to a new locaton to generate a new shape. Fgure 2.3: Generated shapes by varyng each of the frst three modes (parameters) wthn the three standard devatons of the mean shape [32] Local Grey-Level Gradent Model As seen prevously, the pont dstrbuton model can generate a range of facal shapes by controllng ts model and transformaton parameters. Local grey-level gradent model (LGGM) descrbes the grey-level gradent pattern n the vcnty of each landmark n the mage contanng the face n queston. Gven a model shape generated by the PDM, the goal of the LGGM s to determne a new poston for each landmark of the model shape 13

31 to generate a new shape whch outlnes the face n queston better than the model shape. To buld the LGGM, a grey-level ntensty gradent profle g j for each landmark of each shape j n the tranng set contanng N shapes s frst constructed. The classcal ASM n [33] employs 1-D profle, whch s generated by samplng grey-level ntenstes of m pxels centered at each landmark. The drecton of the profle s chosen to be along a lne orthogonal to the shape boundary at each landmark as shown n Fgure 2.4. Fgure 2.4: Drectons of the 1-D grey-level ntensty gradent profle used n [33]. In order to construct the profle g j, each element of the sampled pxel ntenstes s replaced by ts ntensty gradent, whch s the dfference between ntenstes of the pxel and ts neghbourng pxel. In the constructon of the profles, the gradent values rather than the pxel ntenstes are used for reducng the effect of global ntensty changes from mage to another mage n the tranng set [35]. Each gradent value s 14

32 subsequently normalzed by dvdng by the sum of absolute values of the gradents n a profle as where g g ˆ (2.8) j j m g jk k 1 g jk s the kth element n the profle g j. As a result, the normalzed profle s robust to unform scalng and the addton of a constant [31]. Fnally, the LGGM for each landmark s represented by the mean profle gven by and the covarance matrx gven by 1 g (2.9) N ˆ g j N j 1 1 N S T (ˆ g j g )(ˆ g j g ), 1 n. (2.10) N j 1 The local grey-level gradent model s thus used for determnng the best poston for each landmark usng ts statstcal nformaton gven by the mean profle and covarance matrx for each landmark. A comprehensve procedure of fndng a shape n an mage usng the PDM and the LGGM s descrbed next Search of a Face n an Image Searchng a face n an mage s equvalent to automatcally fndng the best locaton of the landmarks n the face. Ths s done by optmally fttng the landmarks obtaned by usng the PDM and the LGGM descrbed prevously onto the gven face. Ths search nvolves an teratve refnement of the model shape by movng each landmark of a current model shape to a new poston by usng the LGGM, and then updatng the current 15

33 model parameters b of the PDM and the transformaton parameters ( tx, t, s, ) n order to acheve a better ft of the model shape to the gven face. The search method conssts of four steps descrbed below. y A. Generatng an Intal Model Shape The search method starts wth a detecton of the face n the mage contanng the face. Vola et al. n [43] have ntroduced a fast detecton technque n whch the face s detected by locatng a rectangular regon contanng the face. Ths technque makes use of a set of smple Haar-lke features for classfyng the varous rectangular regons contanng the face. Evaluaton of these features s accelerated by a fast computaton and makng use of a so-called ntegral mage. A learnng algorthm, known as AdaBoost, s used for selectng a smaller set of mportant features from the large set of the computed Haar-lke features n order to buld a set of weak classfers. These weak classfers are combned n cascade so as to quckly dscard the non-face regons. ( c c The rectangular regon s represented by ts center pont x, y ), wdth w, and heght h, and ncludes most of the facal features as shown n Fgure 2.5. In order to generate an ntal model shape, the model and transformaton parameters are determned by usng the rectangular regon contanng the face. The mean shape x of the tranng set s mapped onto the gven mage wth x, y ) as the reference pont, as shown by a red ( c c lne n Fgure 2.5. The shape s then scaled up or down and rotated by respectvely assgnng a value to the scalng parameter s and the rotaton parameter dependng on the value of w and h so as to approxmately ft t n an approprate regon (say 90%) of 16

34 the rectangular regon, as shown n Fgure 2.6 [35]. Therefore, the ntal model shape s represented by the model parameters b 0 0 and the transformaton parameters ( tx, ty, s, ) 0 where tx x and t c y yc, and subsequently used for fndng a new shape wth the LGGM n the search method. Fgure 2.5: An example of the rectangular regon detected by the face detector, and the mean shape mapped onto the mage wth x, y ) as the reference pont. ( c c B. Refnng the Landmark Postons usng LGGM Gven the ntal model shape represented by b 0 and ( tx, ty, s, ) 0, each landmark of the model shape s moved to a sutable poston determned by usng the LGGM to generate a new shape. The grey-level pxel ntenstes of a pre-specfed number (say L > m) of 17

Fgure 2.6: An example of the ntal model shape approxmately ftted n the rectangular regon. pxels centered at a partcular landmark of the current model shape are sampled from the mage.

35 Fgure 2.6: An example of the ntal model shape approxmately ftted n the rectangular regon. pxels centered at a partcular landmark of the current model shape are sampled from the mage. The ntensty gradents ĝ are then calculated from the sampled pxel ntenstes as descrbed n Secton A subset g d of the ntensty gradents ĝ of length m pxels centered at a canddate poston d s normalzed usng (2.8). The cost of ft of g d to the mean profle g for the th landmark of the LGGM s evaluated usng the Mahalanobs dstance gven by where T 1 f g ) ( g g ) S ( g g ) (2.11) ( d d d S s the covarance matrx for the th landmark. The cost of ft s evaluated at L - (m - 1) canddate postons on ĝ usng (2.11) ncludng that at the landmark n queston (.e. the th landmark), as shown n Fgure 2.7. The poston d best at whch the sub-profle g d best yelds the smallest Mahalanobs dstance (the mnmum cost of ft) s then selected 18

36 as the new poston of the landmark. The above process s repeated for each landmark of the model shape to obtan a new shape x S formed by the new postons of the landmarks. The new model and transformaton parameters are calculated usng the shape x S as descrbed next. Fgure 2.7: Sampled ntensty gradent profle ĝ, the subset gd extracted from ĝ, the mean profle g from the LGGM algned to the subset g d, and the cost of ft f g ) [32]. ( d 19

37 C. Fndng a Model Shape by Calculatng the Model and Transformaton Parameters Gven the new shape x S, the model parameters b best and the transformaton parameters ( t, t, s, ) are obtaned by mnmzng the expresson gven by (2.7) usng the x y best smlarty transform T and the PDM. An teratve method to mnmze the expresson s comprehensvely presented n [32]. Consequently, the shape x S s closely approxmated by the model shape gven by T ( x Pb ), whle satsfyng the constrants of the model parameters defned by (2.6). ( t x, t y, s, ) best best D. Updatng the Model and Transformaton Parameters The model parameters b and the transformaton parameters ( tx, t, s, ) are then updated ( best by assumng the best parameters b and t,,, ) best xbest t ybest sbest : tx t xbest (2.12) ty t ybest (2.13) best (2.14) s s best (2.15) b b best (2.16) The updated parameters b and ( tx, t, s, ) represent a model shape that approxmates the y shape x S whle satsfyng the shape constrants mposed by (2.6). The model shape s then used as the startng shape n the second teraton of the search method. The steps B, C and D descrbed above are repeated untl the model shape converges, that s, the teratve process satsfes a pre-specfed termnatng condton. For the reason that the 20 y

38 shape generated by the PDM s teratvely modfed to ft the target shape, and the generated shape s consstent wth the shapes found n the tranng set, the model s called the actve shape model [33]. The teratve shape search method descrbed above for fttng the model shape to the face n the mage can be summarzed n the followng algorthm. Algorthm 1: Classcal ASM 1. Generate the ntal model shape by locatng a rectangular regon contanng the face n the mage usng the Vola Jones face detector, mappng the mean shape onto the center of the regon, and applyng scalng and rotaton to the shape to approxmately ft t n the regon. 2. Refne the postons of the landmarks of the model shape usng the local greylevel gradent model to generate a new shape x S gven by the landmarks, each of whch s determned by mnmzng the Mahalanobs dstance gven by (2.11). 3. Fnd a new set of the model parameters b, and the transformaton parameters ( tx, t, s, ) usng the pont dstrbuton model and the smlarty transformaton to y best ft the model shape to the shape x S, enforcng the lmts gven by (2.6) on the model parameters b and mnmzng the expresson gven by (2.7). 4. Update the model and transformaton parameters to the new parameters obtaned n Step 3 (Equatons (2.12) (2.16)). 5. Repeat Steps 2 to 4 untl convergence. 21

39 Mult-Resoluton Actve Shape Model In [34], the classcal ASM has been modfed by searchng the desred face at dfferent levels of resoluton usng Algorthm 1 descrbed n the prevous secton. The resultng model can be referred to as mult-resoluton ASM (MRASM). It utlzes an mage pyramd to buld the LGGM for multple mage resolutons. Here, as shown n Fgure 2.8, the authors use an mage pyramd consstng of a set of mages wth dfferent resolutons, whch are created from an orgnal mage usng Gaussan smoothng and sub-samplng [34]. Level 0 (base) of the mage pyramd s the orgnal mage, level 1 s the mage wth half the number of pxels of the orgnal mage n each dmenson, and so on. The authors n [34] have used fve levels of resolutons. In order to buld the mult-resoluton LGGM, a mean profle and a covarance matrx of the LGGM descrbed n Secton are calculated for each of the levels of the mage pyramd n the tranng stage. At the testng stage, an mage pyramd wth 5 levels s frst created usng the test mage. The shape search gven by Algorthm 1 s then conducted usng the mage from each level of the pyramd startng at the coarsest level (level 4) and termnatng at the fnest level (level 0). In the shape search, the LGGM correspondng to each level of the mage pyramd obtaned n the tranng stage s utlzed to fnd a new poston of the landmarks of the model shape. As a result, there are large movements of landmarks of the model shape n the coarser levels, whle smaller movements are made at the fner levels as shown n Fgure 2.9. The mult-resoluton actve shape model successfully mproves the fttng accuracy and the computatonal speed by usng fewer teratons and shorter search lnes than those used n the classcal ASM [34]. Moreover, the MRASM tends not to be stuck on local mage structures due to search at coarser resoluton [34]. 22

40 Actve Shape Model wth 2-D Profles In order to mprove the performance of the shape search usng the ASM, the authors n [35] have utlzed 2-D profles n addton to 1-D profles n the LGGM of the classcal ASM. Whle the 1-D profle shown n Fgure 2.4 captures the grey-level ntensty Fgure 2.8: Image pyramd wth 3 levels and ther relatonshps n terms of the sze [32]. Fgure 2.9: Mult-resoluton teratve shape search method [36]. 23

gradent pattern along a lne orthogonal to the shape boundary at each landmark, the 2-D profle captures more nformaton from a square regon around each landmark. Fgure 2.

41 gradent pattern along a lne orthogonal to the shape boundary at each landmark, the 2-D profle captures more nformaton from a square regon around each landmark. Fgure 2.10 shows the 2-D profles of the same landmarks as those shown n Fgure 2.4. Fgure 2.10: The square regon around each landmark used to generate the 2-D profle. In [35], n order to buld the 2-D profle, the response of a t x q lnear spatal flter w (, j) s calculated for each pxel n a 13 x 13 square regon around a partcular landmark, and the resultng flter responses are stored n a 13 x 13 profle matrx G [40]. The flter response G ( x, y) at the pxel poston ( x, y) s gven by the sum of the products of the flter coeffcents and the correspondng grey-level ntenstes n the regon spanned by the flter mask [41] as a b G ( x, y) w(, j) ( x, y j) (2.17) a j b where a = (t - 1) / 2 and b = (q - 1) / 2. The elements of the profle matrx G are then arranged nto a vector by concatenatng ts rows. The resultng vector g correspondng 24

42 to the th landmark s normalzed by dvdng each element g j by the mean of the absolute values of all the elements as g j 1 n g g n g j k 1 g k (2.18) where n g (= 169) s the number of the elements n equalzed by applyng a sgmod transform as g. Each normalzed g j s then g j g j (2.19) g c j where c s the shape constant, whch defnes the shape of the sgmod [35]. The resultng vector g wth elements g j s used for representng the 2-D ntensty gradent pattern around a landmark. The use of the so called stacked ASM that uses the 2-D profle n addton to the 1-D profle sgnfcantly mproves the fttng accuracy compared to the classcal ASM usng only 1-D profles [35] Actve Shape Model wth a PCA-based LGGM Seshadr et al. n [36] have developed an actve shape model wth a PCA-based LGGM employng 2-D profles to further mprove ts fttng accuracy. It employs the PCA to buld a subspace whch models the varaton of pxel appearance n a square regon around each landmark [36]. The mean profle and the covarance matrx are obtaned from the 2-D profle obtaned by usng a 13 x 13 square regon around each landmark n the same manner as n [35]. Then, the egenvectors correspondng to the frst 97% of the egenvalues of the covarance matrx of each landmark are computed. In the testng stage, 25

43 the 2-D profle obtaned at each canddate poston of a 5 x 5 pxel search regon around a partcular landmark s projected onto the subspace usng the egenvectors to obtan a set of projecton coeffcents. The Mahalanobs dstance between the 2-D profle at the canddate poston and ts reconstructed profle obtaned by usng the projecton coeffcents and the egenvectors s computed. The optmal poston at whch the profle yelds the mnmum Mahalanobs dstance s determned, and s used as the new poston for the landmark. Ths process s repeated for each landmark to generate a new shape. Thus, ASM wth a PCA-based LGGM utlzes the PCA to determne a new poston for each landmark nstead of usng the mean profle and the covarance matrx. The use of the PCA subspace contrbutes to mprovng the fttng accuracy, and helps to mtgate the problem of local mnma [39]. 2.2 Dscrete Cosne Transform The dscrete cosne transform (DCT) has a powerful energy compacton property [44]. Ths property of the DCT could be effectvely utlzed to reduce sze of the 2-D profles used n the ASM, and consequently, decrease the computatonal complexty of ASM tself. In ths secton, the fundamental concepts of the 1-D and 2-D DCTs and ther useful propertes as well as fast methods for ther computaton are brefly revewed D DCT The 1-D DCT of a data sequence x (n) of length N s defned by 26

44 (2n 1) k Cx( k) ( k) (2.20) N N 1 x( n) cos n 0 2 where k = 0, 1, 2,, N - 1 and 1 for k 0 ( k) N. (2.21) 2 otherwse N The 1-D nverse DCT of length N s gven by x( n) (2n 1) k (2.22) N N 1 ( k) Cx( k) cos k 0 2 where n = 0, 1, 2,, N - 1. A. Bass Functons As seen n (2.20), the DCT coeffcents are represented by a weghted sum of the functons (2n 1) k cos. These functons are called the cosne bass functons, and are 2N orthogonal to one another [44]. Smlarly, the nput sequence x (n) as gven by (2.22) can be seen to be a weghted sum of the cosne bass functons. The plot of the cosne bass functons for N = 8 s shown n Fgure The frst bass functon (k = 0) s represented by a zero-frequency constant functon, and the DCT coeffcent correspondng to k = 0 s known as the DC coeffcent. Other bass functons (k > 0) are descrbed by a set of cosne functons wth dfferent frequences, and the DCT coeffcents calculated usng these functons are called the AC coeffcents. The lower (smaller k) and hgher (larger k) frequency bass functons represent, respectvely, smooth and varyng parts of the nput sequence. 27

45 f(k,n) f(k,n) f(k,n) f(k,n) k = n k = n k = n k = n f(k,n) f(k,n) f(k,n) f(k,n) k = n k = n k = n k = n Fgure 2.11: A plot of the cosne bass functons for N = 8. B. Energy Compacton Property The dscrete cosne transform s well-known for ts excellent energy compacton property, and t packs most of the sgnal energy of hghly correlated nput data n a small set of the DCT coeffcents [45]. These coeffcents represent the lower frequency components of the orgnal sgnal, and contan most of the sgnal energy. The other coeffcents correspondng to the hgher frequency components can be safely dscarded wthout losng 28

46 much of the sgnal energy. Therefore, the orgnal sgnal can be approxmated usng only the small set of the DCT coeffcents wth a small reconstructon error. C. Fast Computaton of DCT A drect calculaton of the N-pont DCT requres N multplcatons and N-1 addtons for each of the N coeffcents, and the computatonal complexty of the calculaton s gven 2 by O ( N ). However, the 1-D DCT s derved from the 1-D Dscrete Fourer Transform (DFT) and t can be calculated wth the computatonal complexty of O ( N log N) usng the fast Fourer transform (FFT) [46]. Further acceleraton of the calculaton s acheved by mnmzng the number of multplcatons and addtons requred to compute the N- pont DCT. Loeffler et al. n [47] developed a fast DCT algorthm whch nvolves only 11 multplcatons and 29 addtons to compute the 8-pont DCT. Moreover, Ara et al. [48] proposed a scaled DCT algorthm that requres 13 multplcatons and 20 addtons but defers 8 of these multplcatons for the fnal scalng n quantzaton step to decrease the number of multplcatons n the computaton of the 8-pont DCT. Therefore, the DCT can be rapdly calculated usng these fast DCT mplementatons D DCT The 2-D DCT of a 2-D mage data x n 1, n ) s defned by C ( 2 N 1 N 1 (2n1 1) k1 (2n2 1) k k1, k2) ( k1) ( k2) x( n1, n2)cos cos (2.23) 2N 2N 2 x( n1 0 n2 0 where k 1, k 2 = 0, 1, 2,, N - 1, and ( ) s defned n (2.21). 29

47 The 2-D nverse DCT of C k 1, k x ) s then gven by x ( 2 (2n1 1) k1 (2n2 1) k ( k1) ( k2) Cx( k1, k2)cos cos (2.24) 2N 2N N 1 N 1 2 ( n1, n2) k1 0 k2 0 where n 1, n 2 = 0, 1, 2,, N - 1. A. Bass Images The cosne functons (2n 1) k1 (2n2 1) k2 cos cos 2N 2N 1 n (2.23) and (2.24) form a set of 2-D orthogonal functons called the bass mages. The 2-D DCT coeffcents are thus represented by a weghted sum of the bass mages. Smlarly, the orgnal mage x n 1, n ) s a weghted sum of the bass mages, as seen n (2.24). Fgure 2.12 s a ( 2 vsualzaton of the bass mages for N = 8. As seen n the fgure, when u and v are both zero (top left), the bass mage s a constant 8 x 8 mage (DC mage) and represents the zero frequency components of the mage. The bass mages wth hgher values of k 1 and k 2 represent the hgh frequency components of the mage. B. Row-Column Decomposton The drect computaton of the 2-D DCT gven by (2.23) has a computatonal complexty 4 of O ( N ). However, the 2-D DCT can be decomposed nto a seres of 1-D DCTs as C ( k, k x 1 2 ) ( k ) ( k ) N 1 N 1 1 n1 0 ( k ) 1 N n1 0 n2 0 C ( n, k x 2 x( n, n 1 2 (2n2 1) k )cos 2N (2n1 1) k1 )cos 2N (2n1 1) k1 cos 2N 2 (2.25) 30

48 Fgure 2.12: A plot of the bass mages of the 2-D DCT for N = 8 [44]. where C N 1 (2n2 1) k n1, k2) ( k2) x( n1, n2) cos (2.26) 2N 2 x( n2 0 and k 1, k 2 = 0, 1, 2,, N - 1. In the row-column decomposton method, the 1-D DCT of the mage data x n 1, n ) for each row s frst calculated, and t s stored n n, k 1 ). The ( 2 C ( x 2 1-D DCT of C n 1, k x ) for each column s then computed to obtan the fnal 2-D DCT ( 2 transform coeffcents C k 1, k x ) as llustrated n Fgure As a result, the row-column ( 2 decomposed 2-D DCT computaton descrbed above reduces the computatonal 4 3 complexty from O ( N ) to O ( N ). 31

Fgure 2.13: Row-Column Decomposton of the 2-D DCT. C. Fast Computaton of 2-D DCT 3 As descrbed prevously, the 2-D DCT s effcently calculated n O ( N ) usng the rowcolumn decomposton method.

49 Fgure 2.13: Row-Column Decomposton of the 2-D DCT. C. Fast Computaton of 2-D DCT 3 As descrbed prevously, the 2-D DCT s effcently calculated n O ( N ) usng the rowcolumn decomposton method. The computaton complexty can be further reduced to 2 O ( N log N) by usng the fast 1-D DCT based on the FFT descrbed n [46]. Use of the fast 1-D DCT computaton technques such as these descrbed n [47] or [48] can addtonally reduce the number of multplcatons and addtons to accelerate the computaton of the 2-D DCT. 2.3 Summary In ths chapter, some background materal necessary for the development of the work undertaken n ths thess has been presented. Actve shape model (ASM), whch s a flexble model utlzed to teratvely ft the model shape to a target face n an mage, has been presented n detal. It conssts of pont dstrbuton model (PDM) that can generate any shape smlar to the tranng set of shapes by controllng the model parameters, and local grey-level gradent model (LGGM) that descrbes the grey-level ntensty gradent pattern around each landmark. In order to search a face n an mage, the model shape gven by the ASM s teratvely reformed n order to better ft the face. Three mproved 32

50 versons of ASM, namely, the mult-resoluton ASM, ASM wth 2-D profles and ASM wth a PCA-based LGGM, have also been dscussed. A mult-resoluton ASM employs an mage pyramd consstng of a set of mages wth dfferent resolutons to accelerate the shape fttng. An ASM wth 2-D profles utlzes a 2-D profle obtaned from a square regon around each landmark n order to better represent the grey-level ntensty gradent pattern around the landmark. An ASM wth a PCA-based LGGM makes use of a PCA subspace to determne a new poston for each landmark. Snce the energy compacton property of the dscrete cosne transform (DCT) can be utlzed to reduce the computatonal complexty of ASM, fnally n ths chapter, ths transform, some of ts propertes and a method for ts fast computaton have been revewed. 33

51 CHAPTER 3 Proposed Actve Shape Model Usng Dscrete Cosne Transform 3.1 Introducton As descrbed n Chapter 2, the actve shape model (ASM) usng the 2-D profles [35] and ASM wth a PCA-based local grey-level gradent model (LGGM) [36] sgnfcantly mprove the shape fttng accuracy. However, these mprovements are acheved at the expense of substantally ncreased computatonal complexty. It was also ponted out that the dscrete cosne transform (DCT) has a powerful energy compacton property. As a result of ths property, the DCT has been successfully utlzed n a varety of pattern recognton applcatons for the purpose of dmensonalty reducton [49]-[51]. In order to reduce the sze of the 2-D profles, and subsequently decrease the computatonal complexty of ASM tself, we propose a low-complexty ASM that utlzes a novel profle constructed by makng use of the energy compacton property of the DCT [52]. In Secton 3.2, the process of buldng the novel profle and ts ntegraton nto an ASM s descrbed n detal. In Secton 3.3, an llustratve example of the buldng process of ths profle s provded. In Secton 3.4, a computatonal complexty analyss of stacked ASM [35] and that of the proposed ASM employng the new profle s carred out. 34

52 3.2 Proposed Actve Shape Model usng a DCT-Based Gradent Profle In ths secton, a low-complexty actve shape model (ASM) s developed by ntroducng a 2-D profle of a landmark of a facal shape based on the dscrete cosne transform (DCT) of the local ntensty gradents of the pxels around t. Because of the energy compacton property of the DCT, the sze of the 2-D profle can be made much smaller than that of the conventonal spatal-doman 2-D profle. The use of the compressed DCT profle for the landmark can, therefore, be expected to reduce the computatonal complexty of the resultng ASM. The process of buldng the compressed profle conssts of fve steps descrbed n the followng sub-sectons Samplng Computaton of a 2-D profle of a landmark requres nformaton not only on the pxel representng the landmark n queston but also that on pxels n ts neghbourhood. The buldng process of such a profle for a landmark starts wth samplng grey-level ntenstes from an m x m square regon around a partcular landmark. Fgure 3.1 llustrates an 8 x 8 square regon around a landmark annotated on the upper lp of an mage, from whch the grey-level ntenstes are sampled for computng the profle matrx of the landmark. 35

Fgure 3.1: An llustraton of an 8 x 8 square regon (shown n yellow color) around a landmark (shown as a red dot) annotated on the upper lp of an mage, used for computng a 2-D profle for the landmark.

53 Fgure 3.1: An llustraton of an 8 x 8 square regon (shown n yellow color) around a landmark (shown as a red dot) annotated on the upper lp of an mage, used for computng a 2-D profle for the landmark Computaton of the Profle Matrx Usng the grey-level ntenstes sampled from an m x m regon around a landmark, a profle matrx of the landmark s computed by calculatng the response of a t x q lnear spatal flter w (, j) at each pxel of the square regon [40]. Fgure 3.2 llustrates a 3 x 3 spatal flter mask utlzed for computng the profle matrx of the landmark. The flter response G ( x, y) at the pxel poston ( x, y) s gven by the sum of products of the flter coeffcents and the correspondng grey-level ntenstes n the regon spanned by the flter mask [41], that s, a b G ( x, y) w(, j) ( x, y j) (3.1) a j b where a = (t 1) / 2 and b = (q 1) / 2. For example, usng a 3 x 3 grey-level ntensty regon centered at the pxel poston ( x, y) shown n Fgure 3.3, the response of the 3 x 3 spatal flter at the pxel poston ( x, y) of the regon s computed as 36

54 w(-1, -1) w(-1, 0) w(-1, 1) w(0, -1) w(0, 0) w(0, 1) j w(1, -1) w(1, 0) w(1, 1) Fgure 3.2: An llustraton of a 3 x 3 spatal flter mask utlzed for computng the profle matrx of a landmark. (x-1, y-1) (x-1, y) (x-1, y+1) (x, y-1) (x, y) (x, y+1) y (x+1, y-1) (x+1, y) (x+1, y+1) x Fgure 3.3: An llustraton of a 3 x 3 grey-level ntensty regon centered at the pxel poston (x, y), utlzed for computng the response of a 3 x 3 spatal flter. 37

55 G( x, y) w( 1, 1) ( x 1, y 1) w( 1,0) ( x 1, y) w( 1,1) ( x 1, y 1) w(0, 1) ( x, y 1) w(0,0) ( x, y) w(0,1) ( x, y 1) w(1, 1) ( x 1, y 1) w(1,0) ( x 1, y) w(1,1) ( x 1, y 1) (3.2) Fgure 3.4 shows examples of some of the 3 x 3 flter masks that can be utlzed for computng the profle matrx of a landmark [40]. It s obvous that the use of the mask shown n Fgure 3.4 (a) does not make use of any neghbourhood nformaton around the landmark. The mask shown n Fgure 3.4 (b) and (c) utlze the nformaton on the forward ntensty change only n one drecton (.e. horzontal or vertcal). The mask shown n Fgure 3.4 (d) would capture the forward ntensty varatons along both the x- and y-drectons. The mask shown n Fgure 3.4 (e) would capture the forward and backward ntensty varatons along the x- and y-drectons, whereas the mask of Fgure 3.4 (f) would, n addton, capture the dagonal ntensty varatons. Sutablty of a partcular mask s applcaton dependent. Choce of a flter mask would be examned n Chapter 4 n the context of an applcaton of the proposed scheme n facal landmark annotaton. In order to compute the flter response at a pxel poston, we need the grey-level ntensty at ths poston and that at the postons around t. Therefore, the computaton of the flter response at the pxel postons n the frst row (column) and the mth row (column) requres the grey-level ntenstes at the pxel postons outsde of the m x m regon around the landmark. Fgure 3.5 llustrates an example of an 8 x 8 regon around a landmark. It s clear from ths fgure that n order to compute the flter response at pxel postons wth x = 0 or x = m - 1 or at pxel postons wth y = 0 or y = m - 1, we need the grey-level ntenstes at the pxel postons wth x = -1, x = m, y = -1 or y = m. Consequently, addtonal 4m + 4 pxel postons outsde of the m x m square regon are 38

56 requred n order to compute the m x m array of the spatal flter response. The profle matrx G of a landmark s smply an m x m array of the flter response at the pxel postons n the m x m regon correspondng to the landmark (a) (b) (c) (d) (e) (f) Fgure 3.4: Examples of some of the 3 x 3 flter masks that can be used for computng the profle matrx of a landmark. (a) A mask that does not make use of neghbourhood nformaton around the landmark. (b) A mask that utlzes the nformaton on the forward ntensty change n the x-drecton. (c) A mask that utlzes the nformaton on the forward ntensty change n the y-drecton. (d) A mask that captures the forward ntensty varatons along both the x- and y-drectons. (e) A mask that captures the forward and backward ntensty varatons along the x- and y-drectons. (f) A mask that captures the forward and backward ntensty varatons along the dagonals as well as the x- and y- drectons [40]. 39

57 (-1, -1) (-1, 0) (-1, 1) (-1, 2) (-1, 3) (-1, 4) (-1, 5) (-1, 6) (-1, 7) (-1, 8) y (0, -1) (0, 0) (0, 1) (0, 2) (0, 3) (0, 4) (0, 5) (0, 6) (0, 7) (0, 8) (1, -1) (1, 0) (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (1, 7) (1, 8) (2, -1) (2, 0) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) [(2, 6) (2, 7) (2, 8) (3, -1) (3, 0) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (3, 7) (3, 8) (4, -1) (4, 0) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (4, 7) (4, 8) (5, -1) (5, 0) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (5, 7) (5, 8) (6, -1) (6, 0) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) (6, 7) (6, 8) (7, -1) (7, 0) (7, 1) (7, 2) (7, 3) (7, 4) (7, 5) (7, 6) (7, 7) (7, 8) (8, -1) (8, 0) (8, 1) (8, 2) (8, 3) (8, 4) (8, 5) (8, 6) (8, 7) (8, 8) x Fgure 3.5: An llustraton of addtonal pxel ntensty values requred for the computaton of an 8 x 8 profle matrx G correspondng to a gven landmark Computaton of the DCT Coeffcents The profle matrx of the landmark n the spatal doman s transformed to the frequency doman by usng the 2-D dscrete cosne transform (DCT). Specfcally, the 2-D DCT s appled to the m x m profle matrx G of the landmark to obtan an m x m array of the 2-40

58 D DCT coeffcents. Usng the defnton of the 2-D DCT descrbed n Secton 2.2.2, the 2-D DCT of the m x m profle matrx G s obtaned as where C m 1 m 1 (2x 1) k1 (2y 1) k k1, k2) ( k1) ( k2) G( x, y) cos cos (3.3) 2m 2m 2 G( x 0 y 0 ( k) 1 m 2 m for k 0 otherwse and k 1, k 2 = 0, 1, 2,, m 1. The resultng m x m array C G of the DCT coeffcents C G (k 1, k 2 ) conssts of a DC coeffcent representng the zero frequency component and the AC coeffcents representng the low- and hgh-frequency components of the profle matrx. Representaton of the elements of the profle matrx of the landmark n the frequency-doman would allow the DCT coeffcents to be arranged n a sequence wth a decreasng order of ther mportance, and thus, facltate the selecton of a small subset of the DCT coeffcents representng the profle matrx Selecton of the DCT Coeffcents The 2-D dscrete cosne transform effcently packs most of the energy of the profle matrx nto a small subset of the DCT coeffcents representng the low-frequency components [44]. The DCT coeffcents representng the hgh-frequency components can, therefore, be safely dscarded wthout losng much nformaton on the profle matrx. Makng use of ths energy compacton property of the DCT, a set of n c sgnfcant coeffcents s selected from the m x m array of the DCT coeffcents. The m x m array of 41

59 the 2-D DCT coeffcents are zg-zag scanned n order to select the low-frequency coeffcents before selectng the hgh-frequency coeffcents [53]. Fgure 3.6 llustrates the zg-zag scannng of an 8 x 8 array of the DCT coeffcents. The scannng process starts wth the DC coeffcent C G (0, 0) representng the average of the elements of the profle matrx, and contnues wth the AC coeffcents representng ncreasngly hgherfrequency components of the profle matrx. The resultng zg-zag sequence of the m x m DCT coeffcents can be represented by a vector gven by Y CG [ C (0,0), C (0,1), C (1,0), C (2,0), C (1,1), C (0,2),, G G G G G G T C ( m 2, m 1), C ( m 1, m 2), C ( m 1, m 1)] G G G (3.4) The frst n c DCT coeffcents representng the low-frequency components of the profle matrx are selected from the vector Y CG, gvng Y (3.5) nc T [ y 0, y1, y2,, yn ] c Normalzaton and Equalzaton of the Selected DCT Coeffcents The n c DCT coeffcents are normalzed by dvdng each coeffcent by the mean of the absolute values of the DCT coeffcents as y y (3.6) n 1 1 c y j n c j 0 where y s the th element n Y n and stored n c Y (3.7) n c T [ y 0, y1, y2,, yn 2, yn 1] c c 42

60 Fgure 3.6: An llustraton of a zg-zag scannng of an 8 x 8 array coeffcents. C G of the DCT As a result of the operaton, the normalzed DCT coeffcents are more robust to varatons n brghtness and contrast over the mages n the tranng set [40]. In order to reduce the effect of outlers n the normalzed 2-D DCT coeffcents, each coeffcent s then equalzed by applyng a sgmod transform as y y (3.8) y c 43

61 where c s a shape constant [40]. The resultng n c DCT coeffcents are fnally used as a compressed DCT profle of the landmark, denoted by Y (3.9) n c T [ y 0, y1, y2,, yn 2, yn 1] c c 3.3 An Illustratve Example In ths secton, an example s consdered n order to llustrate the buldng process of the compressed DCT profle of a landmark. The example utlzes an mage shown n Fgure 3.7 for buldng a compressed DCT profle of a landmark (shown as a red dot) annotated on the face contour of the mage. Each step of the process of buldng the compressed DCT profle descrbed n Secton 3.2 s appled to ths landmark, and the results are numercally presented Samplng The frst step of buldng the compressed DCT profle s samplng the grey-level ntenstes from the 10 x 10 regon around the landmark n queston. The 8 x 8 regon s shown by a yellow square n Fgure 3.7, and the values of the grey-level ntenstes wthn the 10 x 10 regon are gven n Fgure Computaton of the Profle Matrx A profle matrx of the landmark s obtaned by computng the response of a 3 x 3 lnear 44

Fgure 3.7: An mage utlzed for buldng a 2-D profle of a landmark (shown as a red dot) annotated on the face contour and the 8 x 8 square (shown n yellow color) around t [54].

62 Fgure 3.7: An mage utlzed for buldng a 2-D profle of a landmark (shown as a red dot) annotated on the face contour and the 8 x 8 square (shown n yellow color) around t [54]. spatal flter w 3x3 (, j) at each pxel poston (x, y) n the 8 x 8 regon. In ths example, the flter mask that captures the forward ntensty varatons both along the x- and y- drectons, gven n Fgure 3.4 (d) and reproduced for convenence n Fgure 3.9 (a), s utlzed for computng the flter response. For example, the flter response at the pxel poston (0, 7) located at the top rght corner of the 8 x 8 regon n Fgure 3.8 s obtaned by computng the sum of products of the flter coeffcents of the spatal flter w 3x3 (, j) and the grey-level ntenstes from a 3 x 3 regon centered at the pxel poston (0, 7). 45

63 y x Fgure 3.8: An llustraton of the grey-level pxel ntensty values requred for buldng an 8 x 8 profle matrx for the landmark shown n Fgure 3.7. Fgure 3.9 (b) shows the grey-level pxel ntensty values of the 3 x 3 regon centered at the pxel poston (0, 7) n Fgure 3.8, utlsed for computng the profle matrx of the landmark. 46

64 (a) (b) Fgure 3.9: An llustraton of (a) a 3 x 3 flter mask that captures the forward ntensty varatons both along the x- and y-drectons, and (b) a 3 x 3 grey-level ntensty regon centered at the pxel poston (0, 7), utlzed for computng the flter response n ths example. The flter response at the pxel poston (0, 7) s computed by usng (3.1) as G(0,7) j 1 w(, j) (0,7 j) (0)(55) (0)(56) (0)(57) (0)(55) ( 2)(56) (1)(56) (0)(56) (1)(56) (0)(55) (3.9) The resultng 8 x 8 array of the flter response (profle matrx) s shown n Fgure Computaton of the DCT Coeffcents By usng (3.3), the 2-D dscrete cosne transform s appled to the 8 x 8 profle matrx of the landmark n order to obtan an 8 x 8 array of the 2-D DCT coeffcents representng 47

65 y x Fgure 3.10: The 8 x 8 array of the flter response (profle matrx) of the landmark n the mage of Fgure 3.7. the profle matrx. Fgure 3.11 shows the resultng DCT coeffcents representng the profle matrx of the landmark. The DC coeffcent for example consdered can be calculated as C G (0,0) (0) (0) 7 7 x 0 y 0 G( x, y) 1 1 ( ) (3.10) 48

66 The entre 2-D array of the computed DCT coeffcents correspondng to the profle matrx of Fgure 3.10 s shown n Fgure k k 1 Fgure 3.11: The array of the DCT coeffcents of the profle matrx of Fgure

67 Selecton of the DCT Coeffcents A set of 25 DCT coeffcents representng the low-frequency components of the profle matrx s selected from the 8 x 8 array of the 2-D DCT coeffcents gven n Fgure For ths selecton, a zg-zag scannng of the DCT coeffcents s performed as shown n Fgure 3.12, snce ths scannng as explaned n Secton takes nto account the energy compacton property of the DCT coeffcents. Fgure 3.12: A zg-zag scannng of the 8 x 8 array of the DCT coeffcents of the profle matrx correspondng to the landmark of Fgure

68 It s seen from ths fgure that through the zg-zag scannng, the low-frequency components of the profle matrx get, n general, selected before the others. The frst 25 DCT coeffcents from the resultng zg-zag scannng of the 2-D array of the DCT coeffcents are selected and stored n a vector (1-D array) Y 25 gven by Y 25 [ , , , , , , , , , , , , , , , , , T , , , , , , , ] (3.11) Normalzaton and Equalzaton of the Selected DCT Coeffcents The selected 25 DCT coeffcents are normalzed by usng (3.6) and subsequently equalzed by usng (3.8). For example, the frst coeffcent n Y 25 s dvded by the mean of the absolute values of the DCT coeffcents as gven n (3.6) to obtan the normalzed DCT coeffcent as y 0 y y j 25 j ( ) (3.12) The entre array of the normalzed DCT coeffcents s stored n Y 25 as Y 25 [ , , , ,1.4341, , , , , , , , , , , , , T , , , , , , , ] (3.13) 51

69 The sgmod transform of (3.8) wth a shape constant of 100 (c = 100) s then appled to the normalzed DCT coeffcent. For nstance, the ffth normalzed coeffcent of Y 25 s equalzed as y 4 y 4 y (3.14) The entre array of the DCT coeffcents after normalzaton and equalzaton of each element s gven by Y 25 [ , , , , , , , , , , , , , , , , T , , , , , , , , ] (3.15) The array Y 25 of the resultng DCT coeffcents s used as a compressed DCT profle of the landmark on the face contour of the mage shown n Fgure Computatonal Complexty Analyss In ths secton, computatonal complexty analyses of stacked actve shape model (ASM) [35] and the proposed ASM usng a DCT-based gradent profle are carred out. These versons of actve shape model are utlzed n Algorthm 1 for searchng a facal shape n an mage presented n Chapter 2. The stacked ASM utlzes both the 1-D and 2-D profles whle ASM usng a DCT-based gradent profle employs the compressed 2-D DCT profle for searchng a sutable locaton of a landmark. Thus, dfference between the two 52

70 models arses from the process of buldng the profle(s) of a landmark used n searchng a facal shape n an mage. As descrbed n Secton 2.1, n the stacked ASM, both 1-D and 2-D profles need to be bult n the spatal doman. The 1-D profle of a landmark s bult by samplng the grey-level ntenstes of m pxels centered at the landmark, computng the ntensty gradent at each pxel poston, and normalzng each ntensty gradent. Computaton of the ntensty gradents of the sampled grey-level ntenstes requres 24m + 35 addton or multplcaton operatons. Normalzaton of the resultng ntensty gradents nvolves 2m + 2 addton or multplcaton operatons. The total number of arthmetc operatons n buldng the 1-D profle s 26m Therefore, the complexty of buldng the 1-D profle for a landmark s O (m). On the other hand, buldng a 2-D profle of a landmark requres samplng the grey-level ntenstes from an m x m regon around the landmark, computng an m x m array of the response of a lnear spatal flter at each pxel poston n the regon (profle matrx), and normalzng and equalzng each element of the profle matrx. In order to compute the profle matrx of the landmark, a 3 x 3 flter mask s correlated wth the grey-level ntenstes n a 3 x 3 regon centered at each pxel poston n the m x m regon. Each computaton of the flter response requres 8 addtons and 9 multplcatons, and the computaton of the profle matrx thus requres 17m 2 addton or multplcaton operatons. The normalzaton and equalzaton of the profle matrx requre n total 4m addton or multplcaton operatons. Thus, the total number of arthmetc operatons n buldng a 2-D landmark profle for the stacked ASM s 21m Thus, the complexty of buldng ths 2-D profle s O ( m ). 53

71 The process of buldng the compressed DCT profle of a landmark conssts of samplng the grey-level ntenstes from an m x m regon around the landmark, computng an m x m profle matrx, computng ts DCT coeffcents, selectng the n c sgnfcant DCT coeffcents, and normalzng and equalzng each element of the selected DCT coeffcents. As seen n the process of buldng the 2-D profle for the stacked ASM, computaton of the profle matrx requres 17m 2 addton or multplcaton operatons. The DCT coeffcents representng the profle matrx can be computed by usng a fast 1-D FFT algorthm wth row-column decomposton. For example, f we use the 1-D FFT algorthm descrbed n [55], then computng the m x m array of the 2-D DCT coeffcents requres 2 2 4m log ( m) 2m 2m addton or multplcaton operatons. 2 The normalzaton and equalzaton of a set of n c (<= m 2 ) sgnfcant coeffcents selected from the m x m array of the DCT coeffcents requre 4n c + 3 addtons or multplcatons. Thus, the total number of arthmetc operatons n buldng the compressed 2-D DCT 2 2 profle of a landmark s 4m log ( m) 15m 2m 4n 3. Therefore, the complexty of 2 2 buldng the 2-D profle n the proposed actve shape model s O m log m). c ( 2 Table 3.1 summarzes the number of arthmetc operatons of each step as well as the total number of operatons nvolved n buldng the profles of a landmark for the stacked and proposed actve shape models. The last row of ths table provdes the overall complexty of buldng the profles utlzed for the two models. It s seen from ths table that the overall computatonal complexty of the proposed scheme n buldng the profle for a landmark s larger than that n the stacked ASM by a factor of log m. However, snce the value of m s not too large (m = 13 s used for computaton of the 2-D profle, and m = 8 s used for computaton of the compressed DCT profle), the dfference 54

72 between the computatonal complextes of the profles of two schemes are not that sgnfcant. It s also to be noted that, for the proposed ASM, only one compressed DCT profle needs to be bult, whereas for the stacked ASM, both 1-D and 2-D profles are requred. In addton, the sze of the compressed DCT profle (.e. n c ) s much smaller than m 2, the sze of the 2-D profle of the stacked ASM. Thus, n the proposed scheme, the use of only one compressed profle compared to the use of one 1-D profle and another larger 2-D profle, can be expected to reduce the total number of operatons, and therefore the overall computaton tme for searchng a facal shape n an mage usng the proposed ASM. Table 3.1: Computatonal complexty of each step of the process of buldng the profles of a landmark for the stacked and proposed ASMs. Steps Computaton of ntensty gradents /profle matrx Calculaton of DCT coeffcents Normalzaton /Equalzaton Total number of arthmetc operatons Overall computatonal complexty Number of arthmetc operatons Stacked ASM Compressed 2-D DCT Profle 1-D Profle 2-D Profle n the proposed ASM 24m+35 17m 2 17m 2 n/a n/a 4m 2 log 2 (m)-2m 2 +2m 2m+2 4m n c +3 21m 2 +26m+40 4m 2 log 2 (m)+15m 2 +2m + 4n c +3 O(m 2 ) O(m 2 log m) 55

73 3.5 Summary In ths chapter, we have proposed a low-complexty actve shape model (ASM), whch employs a novel 2-D profle of a landmark of a facal shape based on the dscrete cosne transform (DCT). The process of buldng such a profle conssts of samplng, computng a profle matrx and ts DCT coeffcents, and selectng, normalzng and equalzng the DCT coeffcents. The grey-level ntenstes are frst sampled from an m x m regon around a landmark. A profle matrx of the landmark s obtaned next by computng the response of a lnear spatal flter at each pxel poston n the sampled regon. The flter response s gven by the sum of products of the coeffcents of a flter mask and the correspondng grey-level ntenstes n the regon spanned by the flter mask. Next, a 2-D array of the DCT coeffcents s obtaned by computng 2-D DCT of the profle matrx. A subset of the DCT coeffcents, representng the low-frequency components of the profle matrx, s selected through a zg-zag scannng of the 2-D array of the DCT coeffcents. The selected DCT coeffcents are normalzed by dvdng each element by the mean of the absolute values of the DCT coeffcents, whch, n turn, are equalzed by applyng a sgmod transform to each element. The resultng DCT coeffcents are fnally used as a compressed DCT profle representng the landmark. A numercal example has been provded n order to llustrate the process of buldng the compressed DCT profle of a landmark by applyng each step of the buldng process to the landmark. A computatonal complexty analyss of stacked ASM [35] and that of the proposed ASM has also been carred out. The dfference between the two models results from the process of buldng the profle(s) of a landmark used for searchng a facal shape n an mage. The overall computatonal complexty of the proposed ASM s theoretcally 56

74 hgher than that of the stacked ASM by a factor of log m. However, n practce the computaton tme of searchng a face usng the proposed ASM can be expected to be lower than that usng the stacked ASM because of the followng reasons. The value of m s generally not very large. The proposed ASM requres buldng only one compressed DCT profle n comparson to the stacked ASM, whch requres buldng a 1-D profle as well as a full-sze 2-D profle. 57

75 CHAPTER 4 Expermental Results of Applyng the Proposed Actve Shape Model n a Facal Annotaton Applcaton In Chapter 3, we have proposed a low-complexty actve shape model (ASM) that utlzes a 2-D profle of a landmark of a facal shape based on the dscrete cosne transform (DCT). Due to the energy compacton property of the DCT, the sze of the compressed DCT profle could be made much smaller than that of the conventonal spatal-doman 2- D profle. Searchng a facal shape n an mage usng the proposed ASM s thus expected to be faster than that usng stacked ASM [35]. In ths chapter, we study the effectveness of the proposed ASM n an applcaton of automatc facal landmark annotaton of frontal faces usng a facal shape search method ntroduced n Chapter 2. In order to buld the pont dstrbuton model (PDM) and the local grey-level gradent model (LGGM) parts of the proposed ASM and to perform the facal shape search on a varety of mages, tranng and test sets of samples are chosen from three databases. The effectveness of the proposed ASM s analyzed by usng an evaluaton method to determne the fttng accuracy of the model shape, as well as by measurng the computaton tme of the facal shape search. In Secton 4.1, the facal shape search method, usng the proposed ASM, for fndng the best locaton of the landmarks of a face n an mage s presented. Three 58

76 databases used for tranng and test of the proposed ASM n the face annotaton applcaton are presented n Secton 4.2. An evaluaton method utlzed for computng the fttng accuracy of the proposed ASM s descrbed n Secton 4.3. In Secton 4.4, an emprcal study s conducted for determnng the number of DCT coeffcents and for selectng a sutable flter mask. The expermental results of an automatc facal landmark annotaton of frontal faces usng the proposed and two other actve shape models are presented and compared. 4.1 Facal Annotaton usng the Proposed Actve Shape Model Actve shape model (ASM) can be utlzed for searchng a facal shape n an mage by automatcally fndng the sutable locatons of landmarks of a face. It conssts of two submodels, pont dstrbuton model (PDM) and local grey-level gradent model (LGGM). The pont dstrbuton model can generate any shape smlar to the shapes n the tranng set by controllng the model parameters gven by a vector b, whch s a vector appled to the egenvectors obtaned by the prncpal component analyss of the tranng shapes. In order to transform the shape n the model space to the mage space, a smlarty transform T s used for translatng, scalng, and rotatng the shape by settng the transformaton ( x y parameters ( tx, ty, s, ), where t, t ), s and are, respectvely, the translaton, scalng and rotaton parameters. The pont dstrbuton model along wth the smlarty transform s utlzed for approxmatng a gven target shape through an approprate choce of the model parameters b and the transformaton parameters ( t, t, s, ). The local grey-level gradent model, whch employs a 1-D profle to capture the grey-level ntensty gradent x y 59

77 nformaton n the vcnty of each landmark, s used for determnng a sutable locaton for each landmark of the model shape. In order to closely ft the model shape to a gven face, the shape search teratvely mproves the ft of the model shape to the gven face by movng each landmark of the current model shape to a new poston usng the LGGM to generate a new shape x S, and then updatng the model parameters b and the transformaton parameters ( tx, ty, s, ) wth the new sets of parameters that closely approxmate x S. In our scheme of face annotaton usng ASM, we employ the compressed DCT profle for buldng the LGGM. Ths 2-D profle captures more nformaton around each landmark of the model shape than the 1-D profle, and the sze of the 2-D profle s much smaller than that of the conventonal spatal-doman 2-D profle. The shape search method usng the proposed ASM s thus expected to be faster than that usng the stacked ASM [35]. A detaled descrpton of varous steps of the shape search method employng the proposed ASM s gven n the followng sub-sectons Generatng an Intal Model Shape The search method begns wth localzng a rectangular regon n a gven mage contanng a face usng the Vola Jones face detector [43]. The rectangular regon, ( c c specfed by ts center pont x, y ), wdth w, and heght h, s used for determnng the model parameters b and the transformaton parameters ( tx, ty, s, ) to generate an ntal model shape. The mean shape x of the tranng set s mapped onto the mage usng the center of the regon x, y ) as the reference pont. The resultng shape s then scaled and ( c c 60

78 rotated by assgnng a value to the scalng parameter s and the rotaton parameter dependng on the value of w and h, so as to approxmately ft t n the rectangular regon. The ntal model shape represented by the model parameter b 0 0 and the transformaton parameters ( tx, ty, s, ) 0 s utlzed as an ntal shape n LGGM for generatng a refned shape Refnng the Landmark Postons usng LGGM Each landmark of the shape, represented by the model parameters b and the transformaton parameters ( t, t, s, ), s moved to a sutable poston determned by x y usng the local grey-level gradent model (LGGM) for the landmark. A new shape x S, whch descrbes the facal shape n the mage better than the current model shape does, s formed by new postons of the landmarks. In the proposed ASM, the LGGM s bult by usng the compressed DCT profle descrbed n Secton 3.2. The process of buldng the profle for the th landmark starts wth samplng the grey-level ntenstes from an 8 x 8 regon around the landmark. A profle matrx G of the landmark s obtaned by computng the response of a lnear spatal flter, whch captures the forward ntensty varatons along both n the x- and y- drectons, correspondng to the pxels n a 3 x 3 mask, at each pxel poston n the square regon. A 2-D array of the DCT coeffcents C G s obtaned by computng 2-D DCT of G usng a fast 1-D FFT algorthm gven n [47] wth row-column decomposton. A subset of n c DCT coeffcents representng the low-frequency components of G s selected by usng the zg-zag scannng of 61 C G and stored n an array

79 Y n c for each landmark. The coeffcents n Y are normalzed by dvdng each n c element by the mean of the absolute values of the DCT coeffcents contaned n Y n c. These coeffcents are then equalzed by applyng a sgmod transform to each element. An array of the resultng DCT coeffcents Y s used as the compressed DCT profle of the th landmark. In order to determne a sutable poston for the th landmark, the cost of ft of Y to the mean profle Y of the LGGM s evaluated at each canddate poston n a 5 x 5 regon around the landmark usng the Mahalanobs dstance gven by where profle T 1 f Y ) ( Y Y) S ( Y Y) (4.1) ( S s the covarance matrx for the th landmark. The poston d best at whch the Y yelds the smallest Mahalanobs dstance (the mnmum cost of ft) s then selected as the new poston of the landmark. The process s repeated for each landmark of the model shape n order to obtan a new shape x S Fndng a Model Shape by Calculatng the Model and Transformaton Parameters The model parameters b and the transformaton parameters t, t, s, ) that best ( xbest ybest best best best approxmate the new shape x S obtaned n the prevous sub-secton are determned by mnmzng the expresson gven by (2.7) usng the smlarty transform T and the pont dstrbuton model. In order for the model shape to be consstent wth those n the tranng set, the model parameters b best are constraned accordng to (2.6). 62

80 Updatng the Model and Transformaton Parameters The model parameters b and the transformaton parameters ( tx, t, s, ) are then updated y to the prevously determned model and transformaton parameters b and best ( best txbest, tybest, sbest, ) whch closely approxmate the new shape x S as tx t xbest (4.2) ty t ybest (4.3) best (4.4) s s best (4.5) b b best (4.6) The updated model shape s then used as the startng shape n the second teraton of the shape search method. The steps of sub-sectons 4.1.2, 4.1.3, descrbed above are repeated untl the teratve process satsfes a pre-specfed termnatng condton. 4.2 Databases In order to perform the facal annotaton usng the scheme descrbed n the prevous secton, we need a tranng set and a test set of mages and landmarks of the facal shapes. The fttng error between the landmarks of the model shape and the manually labelled landmarks from the test set s computed for evaluatng the effectveness of the proposed ASM. In order to buld the tranng and test sets, varous samples are chosen from three databases, namely, the Mlborrow / Unversty of Cape Town (MUCT) database, the 63

landmarks. Fgure 4.1 shows a few examples of the samples from ths database consstng of a facal mage and the 76 landmarks representng dfferent parts of the face n an mage. Fgure 4.1: Examples of the samples from the MUCT database consstng of a facal mage and 76 manually annotated landmarks representng partcular parts of a face n the mage [54].

81 BoID database and the IMM database, contanng both mages and the landmarks of the correspondng facal shapes. The Mlborrow / Unversty of Cape Town (MUCT) database presented n [54] contans 3755 samples of 276 dfferent ndvduals, each of whch conssts of a facal mage wth 480 x 640 pxels and 76 manually labelled landmarks. Fgure 4.1 shows a few examples of the samples from ths database consstng of a facal mage and the 76 landmarks representng dfferent parts of the face n an mage. Fgure 4.1: Examples of the samples from the MUCT database consstng of a facal mage and 76 manually annotated landmarks representng partcular parts of a face n the mage [54]. In order to generate the samples, each ndvdual s photographed from fve dfferent postons of the camera, as shown n Fgure 4.2. Snce we are nterested n an applcaton of automatc facal landmark annotaton of frontal faces, only the mages photographed from the frontal camera postons (the camera postons a, d and e) are utlzed. Fgure 4.3 shows examples of the mages photographed from the camera postons a, d and e. Each ndvdual s also photographed under three lghtng condtons, q, r and s. Fgure

shows examples of the mages photographed under the lghtng condtons q, r and

2: The fve dfferent postons of the camera [54]. (a) (b) (c) Fgure 4.

postons under the same lghtng condton [54].

82 shows examples of the mages photographed under the lghtng condtons q, r and s from the fontal poston a. Fgure 4.2: The fve dfferent postons of the camera [54]. (a) (b) (c) Fgure 4.3: Examples of the mages of an ndvdual photographed from the frontal camera postons under the same lghtng condton [54]. (a) An mage photographed from the camera poston a. (b) An mage photographed from the camera poston d. (c) An mage photographed from the camera poston e. 65

(b) An mage photographed under the lghtng condton r.

The BoID database ntroduced n [55] conssts of 1521 samples of 23

286 pxels and 20 manually annotated landmarks. Fgure 4.

83 (a) (b) (c) Fgure 4.4: Examples of the mages of an ndvdual photographed under three dfferent lghtng condtons [54]. (a) An mage photographed under the lghtng condton q. (b) An mage photographed under the lghtng condton r. (c) An mage photographed under the lghtng condton s. The BoID database ntroduced n [55] conssts of 1521 samples of 23 dfferent ndvduals, each of whch contans a facal mage wth 384 x 286 pxels and 20 manually annotated landmarks. Fgure 4.5 shows a few examples of the samples from ths database. Fgure 4.5: Some examples of the samples from the BoID database consstng of a facal mage and 20 manually annotated landmarks [56]. 66

The nformatcs and mathematcal modellng (IMM) database presented n [57] contans 240 samples of 40 dfferent ndvduals, each of whch conssts of a facal mage wth 640 x 480

6. Fgure 4.6: Some examples of the samples from the IMM database consstng of a facal mage and 58 manually annotated landmarks [57].

and by negatng the x-coordnate of each landmark and changng the order of the landmarks of each shape n the samples. Fgure 4.

84 The nformatcs and mathematcal modellng (IMM) database presented n [57] contans 240 samples of 40 dfferent ndvduals, each of whch conssts of a facal mage wth 640 x 480 pxels and 58 manually annotated landmarks. A few examples of the samples from ths database are shown n Fgure 4.6. Fgure 4.6: Some examples of the samples from the IMM database consstng of a facal mage and 58 manually annotated landmarks [57]. In order to double the number of samples n each database, a set of mrrored samples s obtaned by horzontally flppng each mage n a database usng the photo edtng software, and by negatng the x-coordnate of each landmark and changng the order of the landmarks of each shape n the samples. Fgure 4.7 shows an example of the mrrored pars of the samples from the MUCT database. As a result of generatng the mrrored samples, the number of samples n the MUCT, BoID and IMM databases are, respectvely, ncreased to 7510, 1521 and 480 samples. Table 4.1 summarzes the characterstcs of the three databases used n our experment. 67

(a) (b) Fgure 4.7: An example of the mrrored pars of the samples from the MUCT database [54]. (a) The mage of a sample from the MUCT database. (b) The mrrored mage of the same sample. Table 4.

85 (a) (b) Fgure 4.7: An example of the mrrored pars of the samples from the MUCT database [54]. (a) The mage of a sample from the MUCT database. (b) The mrrored mage of the same sample. Table 4.1: Summary of the three databases used for buldng the PDM and LGGM parts of the proposed ASM and performng the facal shape search usng the ASM. MUCT Database BoID Database IMM Database Total Number of Samples Number of Landmarks Number of Indvduals Image Sze (Pxels) x x x Evaluaton Method In order to measure the effectveness of the proposed ASM n fttng the landmarks of the model shape onto a target facal shape n an mage, the fttng accuracy s computed by determnng the dstance between each landmark n the target shape and ts correspondng 68

86 landmark n the model shape. The dstance between the two ponts n the spatal doman s computed by usng the Eucldean dstance gven by T d (a b ) (a b ) (4.7) where a s a vector contanng the x- and y-coordnates of the th landmark of the model shape, and b s a vector contanng the x- and y-coordnates of the th landmark of the target shape. Snce facal shapes come from dverse condtons and ndvduals, an evaluaton method, namely, the average normalzed fttng error, whch takes the dfference n facal shape sze nto account, s utlzed for computng the fttng accuracy. The normalzed fttng error s obtaned by takng the average of the Eucldean dstance between the manually annotated landmarks from a target facal shape and the correspondng landmarks from the model shape, and normalzng the result by a factor that ensures that s, the dstance between two landmarks representng the extreme ponts on the left and rght eyes, as shown n Fgure 4.8, s equal to 50 pxels [36]. These two landmarks are selected, snce they are suffcently far apart to represent the wdth of a facal shape, and all three databases used n our experment consst of these landmarks, as seen from Fgures 4.1, 4.5, and 4.6. As a result of normalzaton of the dstance, the fttng accuracy obtaned by usng shapes wth dfferent scales can be drectly compared. In order to compute the overall fttng accuracy of the proposed ASM, the average normalzed fttng error s computed by takng the mean of the normalzed fttng error over all the samples from the test set as E average 1 N test n 50 d j 1 ns j 1 N test (4.8) 69

87 where n a sample. N test s the number of samples from the test set, and n s the number of landmarks Fgure 4.8: Two landmarks representng the extreme ponts on the left and rght eyes, and the dstance s between the two landmarks. 4.4 Expermental Results In order to study the effectveness of the proposed ASM n the applcaton of an automatc facal landmark annotaton of frontal faces, the facal annotaton usng the scheme descrbed n Secton 4.1 s performed on varous facal mages. In ths secton, we conduct an experment usng two groups of tranng and test sets of mages and landmarks of the facal shapes. The effectveness of the ASM s evaluated by computng the average normalzed fttng error, and by measurng the executon tme of the facal shape search. An emprcal study s conducted n order to select the optmal number of DCT coeffcents and to choose a sutable flter mask, whch are used for buldng the 70

88 compressed DCT profle of the ASM. We also present and compare the expermental results of an automatc facal landmark annotaton of frontal faces usng the proposed and varous other actve shape models, namely, stacked ASM [35] and ASM wth a PCAbased LGGM [36] Expermental Settngs In order to examne the performance of the proposed ASM expermentally, two groups of tranng and test sets of mages and landmarks of the facal shapes are selected from the three databases presented n Secton 4.2. The frst group of samples conssts of a tranng set of 3000 samples from the MUCT database and a test set of 1471 samples from the BoID database. As seen from Table 4.1, the number of landmarks n each sample of the MUCT database and that n the BoID database are, respectvely, 76 landmarks and 20 landmarks. Although the number of landmarks n the former s larger than that n the latter, 17 of these landmarks, as shown n Fgure 4.9, are common n each of samples n the two databases. Thus, these 17 landmarks (n = 17) are utlzed for computng the average normalzed fttng error n the experments usng ths group of samples. The second group of samples conssts of a tranng set of 240 samples of the frst 20 ndvduals of the IMM database and a test set of 240 samples of the other 20 ndvdual of the same database. Snce both the tranng and test sets of samples are chosen from the same database, all the landmarks of a facal shape (n = 58) are used for computng the average normalzed fttng error n the experments usng ths group of samples. Table 4.2 summarzes the two groups of tranng and test sets of mages and landmarks of the facal shapes belongng to the three databases. 71

Fgure 4.9: The 17 landmarks nternal to the face utlzed for computng the average normalzed fttng error n the experment wth the frst group of samples [59]. Table 4.

89 Fgure 4.9: The 17 landmarks nternal to the face utlzed for computng the average normalzed fttng error n the experment wth the frst group of samples [59]. Table 4.2: Summary of the number of tranng and test sets n the two groups of the mages chosen from the three databases and the number of landmarks n each facal shape. Group 1 Group 2 Tranng Set Test Set Tranng Set Test Set Database MUCT BoID IMM Number of Samples Number of Landmarks Publcly avalable STASM software [59] s modfed and utlzed to produce the expermental results for the proposed and varous other actve shape models under the same expermental condton. The actve shape models used n the experments are all mplemented by usng Mcrosoft Vsual Studo 2012 and the C++ programmng language. The experments are performed on a 2.6 GHz Intel Core 7 CPU wth 6-GB RAM and Wndows 7 operatng system. The executon tme of a facal landmark annotaton of the frontal face n an mage s gven by the sum of the executon tme for locatng the face n the mage and the executon tme for conductng the facal shape search n the mage. In order to conduct 72

90 the experments under the same condton for all the ASMs, the locaton of a rectangular regon contanng the face n each test mage s pre-computed usng the Vola-Jones face detector [43]. Thus, the executon tme for locatng the face n the mage s a constant tme for loadng the pre-computed locaton of the regon contanng the face. The average executon tme s computed by takng the average of the executon tme over all the test samples after the frst 50, snce the ntalzaton of the ASM may have an effect on the computaton tme for some samples at the begnnng of the executon Results and Analyss of the Proposed Actve Shape Model As descrbed n Chapter 3, the proposed ASM employs a compressed DCT profle of a landmark of a facal shape. In order to buld the compressed DCT profle of the th landmark, the profle matrx G s obtaned by calculatng the response of a spatal flter mask at each pxel poston of an m x m regon around the landmark. A 2-D array DCT coeffcents s obtaned by computng 2-D DCT of coeffcents s selected through a zg-zag scannng of C G of G, and a subset of n c DCT C G. The selected DCT coeffcents are normalzed by dvdng each element by the mean of the absolute values of the selected DCT coeffcents, whch are subsequently equalzed by applyng a sgmod transform to each element. The resultng DCT coeffcents are fnally used as the compressed DCT profle of the landmark. The number of DCT coeffcents n c and the flter mask used n buldng the profle are mportant choces for the proposed ASM, snce they have an effect on the fttng accuracy and the computatonal complexty. The proposed ASM s utlzed n an automatc facal annotaton system that 73

91 automatcally locates facal features of the face n an nput mage. The facal annotaton system maps the ntal model shape onto the nput mage usng a rectangular regon contanng a face, reforms the model shape teratvely, and fnally yelds the model shape that fts closely to the face n the mage. An example of the ntal and fnal model shapes of the automatc facal annotaton system s shown n Fgure In our experments, the proposed ASM s frst bult by usng a tranng set of samples, and the facal annotaton s performed on each mage n a test set of samples usng dfferent combnatons of the number of DCT coeffcents and a flter mask. Three dfferent flter masks, gven n Fgure 3.4 (d), (e) and (f), are used for generatng the profle matrx. The number of DCT coeffcents utlzed for buldng the compressed DCT profle s vared from 10 to 35 wth ncrements of 5. The fttng accuracy of the model shape to the face n the mage s also computed by usng the normalzed fttng error presented n Secton 4.3. In order to study the performance of the facal annotaton usng the proposed ASM, we run an experment usng the two groups specfed n Table 4.2 of tranng and test sets of mages and landmarks of the facal shapes specfed n Secton 4.4.1, and the average normalzed fttng error s computed n each experment. Table 4.3 lsts the average normalzed fttng error obtaned from the experment usng the tranng and test sets of Group 1. It s seen from ths table that the lowest average normalzed fttng error results from usng 25 DCT coeffcents for each of the three flter masks. It s also observed that when the number of the DCT coeffcents s made smaller or larger, the error n facal annotaton becomes progressvely larger. The overall lowest average normalzed fttng error of s obtaned by usng the 4- Laplacan flter mask wth 25 DCT coeffcents. 74

(b) The fnal model shape (shown n red color) that fts closely to the face. Table 4.

92 (a) (b) Fgure 4.10: An example of the ntal and fnal model shapes of the automatc facal annotaton system. (a) The ntal model shape (shown n red color) mapped onto an nput mage usng a rectangle regon contanng a face (shown n yellow color). (b) The fnal model shape (shown n red color) that fts closely to the face. Table 4.3: Average normalzed fttng error obtaned from the experment wth Group 1 of samples by usng dfferent combnatons of the number of DCT coeffcents and a flter mask. Average Normalzed Fttng Error (pxels) Number of DCT Coeffcents Type of Flter Masks Gradent 4-Laplacan 8-Laplacan (Fgure 3.4 (d)) (Fgure 3.4(e)) (Fgure 3.4 (f)) Table 4.4 gves the average normalzed fttng error obtaned from the experment usng the tranng and test sets of Group 2. It s seen from ths table that the lowest average normalzed fttng error s obtaned by usng 30 DCT coeffcents for the gradent 75

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth