3D OBJECT RECOGNITION USING SCALE SPACE OF CURVATURES

Size: px

Start display at page:

Download "3D OBJECT RECOGNITION USING SCALE SPACE OF CURVATURES"

Monica Cameron
5 years ago
Views:

1 3D OBJECT RECOGNITION USING SCLE SPCE OF CURVTURES THESIS SUBMITTED TO THE GRDUTE SCHOOL OF NTURL ND PPLIED SCIENCES OF MIDDLE EST TECHNICL UNIVERSITY BY ERDEM KGÜNDÜZ IN PRTIL FULFILLMENT OF THE REQUIREMENTS FOR DOCTOR OF PHILOSOPHY IN ELECTRICL ND ELECTRONICS ENGINEERING JNURY 2011

2 ppoval of thesis: 3D OBJECT RECOGNITION USING SCLE SPCE OF CURVTURES submitted by ERDEM KGÜNDÜZ in patial fulfillment of the equiements fo the degee of Docto of Philosophy in Electical and Electonics Engineeing Depatment, Middle East Technical Univesity by, Pof. D. Canan ÖZGEN Dean, Gaduate School of Natual and pplied Sciences Pof. D. İsmet ERKMEN Head of Depatment, Electical and Electonics Engineeing sst. Pof. D. İlkay ULUSOY Supeviso, Electical and Electonics Engineeing Dept., METU Examining Committee Membes: Pof. D. Volkan TLY Compute Engineeing Dept., METU sst. Pof. D. İlkay ULUSOY Electical and Electonics Engineeing Dept., METU Pof. D. Uğu HLICI Electical and Electonics Engineeing Dept., METU Pof. D.. ydın LTN Electical and Electonics Engineeing Dept., METU sst. Pof. D Pına DUYGULU ŞHİN Compute Engineeing Dept., Bilkent Uni. Date: 06.Januay.2011

3 I heeby declae that all infomation in this document has been obtained and pesented in accodance with academic ules and ethical conduct. I also declae that, as equied by these ules and conduct, I have fully cited and efeenced all mateial and esults that ae not oiginal to this wok. Name, Suname : Edem KGÜNDÜZ Signatue :

4 BSTRCT 3D OBJECT RECOGNITION USING SCLE SPCE OF CURVTURES Edem KGÜNDÜZ Ph.D., Depatment of Electical and Electonics Engineeing Supeviso: sst. D. İlkay ULUSOY Januay 2011, 136 pages In this thesis, a geneic, scale and esolution invaiant method to extact 3D featues fom 3D sufaces, is poposed. Featues ae extacted with thei scale (metic size and esolution) fom ange images using scale-space of 3D suface cuvatues. Diffeent fom pevious scale-space appoaches; connected components within the classified cuvatue scale-space ae extacted as featues. Futhemoe, scales of featues ae extacted invaiant of the metic size o the sampling of the ange images. Geometic hashing is used fo object ecognition whee scaled, occluded and both scaled and occluded vesions of ange images fom a 3D object database ae tested. The expeimental esults unde vaying scale and occlusion ae compaed with SIFT in tems of ecognition capabilities. In addition, to emphasize the impotance of using scale space of cuvatues, the compaative ecognition esults obtained with single scale featues ae also pesented. Keywods: 3D Object Recognition, 3D Featue Extaction, 3D Patten Recognition iv

5 ÖZ EĞRİLİK ÖLÇEK UZYI KULLNRK 3B NESNE TNIM Edem KGÜNDÜZ Doktoa, Elektik Elektonik Mühendisliği Bölümü Tez Yöneticisi: Yd. D. İlkay ULUSOY Ocak 2011, 136 sayfa Bu tezde, 3B yüzeyleden ölçek ve çözünülükten bağımsız bi 3B öznitelik çıkaım yöntemi öneilmişti. Öznitelikle, 3B yüzey eğiliklei kullanılaak ölçek bilgilei (metik boy ve çözünülük) kaybedilmeden çıkaılmaktadı. Önceki ölçek uzayı yaklaşımlaından faklı olaak, sınıflandıılmış ölçek eğilik uzayındaki bileşik elemanla öznitelik olaak kullanılmıştı. Bu öznitelikle metik uzunluk ve öneklemeden bağımsız çıkaılmaktadıla. Geometik kıyım yöntemi, nesne tanıma amacıyla kullanılmış ve faklı ölçeklede, engelli nesnele içeen 3B yüzey vei tabanlaıyla test edilmişti. Deney sonuçlaı nesne tanıma kabiliyetlei bakımından SIFT ile kaşılaştıılmıştı. Ölçek uzayı kullanmanın önemini göstemek için yöntem hem ölçek uzayı kullanılaak hem de kullanılmadan çalıştıılıp, sonuçla kaşılaştıılmıştı. nahta Kelimele: 3B Nesne Tanıma, 3B Öznitelik Çıkaımı, 3B Öüntü Tanıma v

6 CKNOWLEDGEMENTS The autho wishes to expess his gatitude to his supeviso sst. D. İlkay Ulusoy fo he guidance, advice, citicism, encouagements and insight thoughout the eseach. The autho would also like to thank his paents, Nuhayat and Feyzi kagündüz and his bothe Haldun kagündüz fo thei endless suppot. nd finally fo Sevil, Like eveything in my life, this thesis would neve be complete without you. vi

7 TBLE OF CONTENTS BSTRCT...iv ÖZ...v CKNOWLEDGEMENTS...vi TBLE OF CONTENTS...vii CHPTER 1. INTRODUCTION Poblem Definition Motivation Contibution Outline of the thesis RELTED WORK D Featue Extaction D Suface Cuvatues D Scale Invaiant Featues D Desciptos D Object Repesentation D Object Recognition D Facial Detection D Object Registation PROCESSING 3D INFORMTION D Infomation Fomats Range Images Polygonal Data Paametic Repesentations Digital Elevation Models Common Defects of 3D Scanne Outputs vii

8 3.2.1 False Holes (False Invalid Points) Exploded/Imploded Regions Noise Pocessing 3D ange scans Pocessing False Holes Pocessing Exploded/Imploded Regions Pocessing Noise SURFCE CURVTURES Reliable Cuvatue Estimation Pincipal Cuvatues Mean (H) and Gaussian (K) Cuvatues Shape Index (S) and Cuvedness (C) Tansfom Invaiance of Cuvatues Effect of Scale and Sampling on Cuvatue Scale/Resolution Ratio Rule fo Thesholding Compaison of HK and SC Cuvatue Spaces Mathematical Compaison of HK and SC Cuvatue Spaces HK&SC Coupled Cuvatue Space SCLE-SPCE OF CURVTURES Scale-Space Theoy in Compute Vision Gaussian Pyamiding Gaussian Pyamiding ove 3D Suface Scans Invalid Points in 3D Suface Pyamids Scale Space of Cuvatues Cuvatues in Scale-Space FETURE EXTRCTION ND OBJECT REPRESENTTION Constuction of the UVS Volume Extaction of Featues Scale-Space Localization of the Featues Tansfom Invaiant Topology Constuction Impotance of Scale Space Seach Featue Extaction Robustness unde Noise viii

9 6.7 Local Region Desciptions Depth Histogams Nomal Histogams Cuvatue Histogams Spin Images EXPERIMENTL WORK Face Detection and Pose Estimation Tansfom Invaiant Fou-Node Face Topology Face Detection on Bospous Database Expeimental Wok on 3D Face Detection D Facial Pose Estimation Conclusion on 3D Facial Detection and Pose Estimation D Object Recognition Tansfom Invaiant n-node Topology Leaning the n-node Topology Geometic Hashing HK-SC Cuvatue Space Classification Compaison and Theshold Fine Tuning D Object Recognition Tests D Object Registation Expeiments Conclusions on 3D Object Recognition and Registation Delineation of Slope Units Based On Scale nd Resolution Invaiant 3D Cuvatue Extaction Slope Unit Geneation by Conventional Method The Poposed Method Results and Compaison CONCLUSIONS ND FUTURE DIRECTIONS Conclusions Futue Diections REFERENCES CURRICULUM VITE ix

10 LIST OF TBLES TBLES Table 1. Shape Classification in HK cuvatue space. Coesponding examples of suface egions ae seen in Figue Table 2. Shape Index Classification. Colous ae consistent with the pevious figues Table 3. Shape Index and Cuvedness Classification. Colous ae consistent with the pevious figues Table 4. The definition of the egions seen in Figue Table 5. HK&SC coupled space Classification. Colous ae consistent with the pevious figues Table 6. Extacted featues fom the sufaces given in Figue 38 with thei locations in UVS space Table 7. The adius values (in mm) of the extacted featues in Figue Table 8. Mean of the diffeences between the maked and calculated featues fo the nose tips and inne eye pits in the taining set Table 9. Tested poses and numbe of facial points included fo each pose Table 10. Detection pecentage of the nose and the inne eye pits fo vaious poses ae listed fo all methods. The fist column fom left (ous) in each main column is the poposed method. Then M1, M2, M3 and M4 ae given in the columns fom left to ight espectively. Best esults fo each pose ae undelined and bolded Table 11. Detection eos ae compaed fo all methods. The mean and the standad deviation of the eo (the absolute distance between the maked points and the calculated points) ae given. The values ae in millimetes. The mean is witten ove the standad deviation x

11 Table 12. C(U,N) numbe of featues obtained fom k lagest featues, using N- node topology. Fo the sake of simplicity the last node is chosen as the base node Table 13. veage Recognition Rates fo ll Methods in ll Expeiments Table 14. Simila and diffeent slope unit aeas xi

12 LIST OF FIGURES FIGURES Figue 1. Sample ange image: bunny fom [Stuttgat Database] Figue 2. The polygonal fom of the object in Figue 1 is obtained by combining the valid vetices obtained fom diffeent ange scans of the object Figue 3. a) The DEM image is depicted. b) The 3D ende of the DEM image is shown Figue 4. facial scan taken fom FRGC v1.a 3D database is shown. Invalid egions aound eyebows ae maked as invalid in the depth image Figue 5. spike on the iis of a facial scan fom FRGC 3D facial database Figue 6. Nose is imploded in the 3D ange scan (left) because of exteme illumination on the nose tip (FRGC v1.a) Figue 7. a) 3D atificial suface without noise b) Nomal diections ove the noise-fee suface. c) 3D atificial suface with noise. d) Nomal diections ove the noisy suface [Bozkut et. al. 2009] Figue 8. The hub object fom the Stuttgat 3D ange image database is shown. Thee is no defect on the ange scan; howeve the object possesses a numbe of actual holes Figue 9. a) Textue fo the 3D Scan. b) Black aeas denote the backgound. Red aeas denote the invalid points selected to be extapolated. Blue aeas denote the neighbouing valid points used fo extapolation. White aeas denote valid points. c) Pocessed point cloud Figue 10. Sufaces de-noised with Gaussian (left) and Bilateal (ight) filtes [Bozkut 2008] Figue 11. The esult of the bilateal filteing on a FRGC sample Figue 12. a) The nomal plane with the maximum cuvatue is seen. b) The nomal plane with the minimum cuvatue is seen c) κ 1 = 1.56 and κ 2 = The suface is a patch fom a monkey saddle: z(x,y) = x 3-3 x y xii

13 Figue 13. Shape types in coespondence with Table 1. Shapes at ightmost column ae hypebolic and fom top to bottom they have the popeties: κ 1 + κ 2 < 0, κ 1 + κ 2 = 0 and κ 1 + κ 2 > 0. The cente column show paabolic o flat-umbilic (plana) shapes whee one of the pincipal cuvatues is equal to zeo (κ 1 = 0 κ 2 = 0). The leftmost column show the elliptic shapes whee both pincipal cuvatues have the same sign. The uppe is convex elliptic (a peak: κ 1 < 0 κ 2 < 0) and the lowe one is concave elliptic (a pit: κ 1 > 0 κ 2 > 0) Figue 14. The HK classification on the (κ 1, κ 2 ) plane. Region colous (and the numbes) coespond to Table 1. The sepaating lines ae the zeothesholds fo H values and thei equations ae: (κ 1 + κ 2 ) / 2 = +H zeo and (κ 1 + κ 2 ) / 2 = -H zeo. The sepaating cuves ae the zeo-theshold fo K values with the equation: (κ 1 κ 2 ) > +K zeo and (κ 1 κ 2 ) > -K zeo Figue 15. The SC classification on the (κ 1, κ 2 ) plane. Region colous coespond to Table 3. The sepaating lines ae the constant S values: S = ±5/8, ±3/8 and ±3/16. The cicle aound the oigin is the zeo-theshold fo C values κ and its equation is: ( ) κ = C zeo Figue 16. a) Two scaled vesion of the same shape. Bigge one (1) is 10 times lage than the othe. (Smalle one (2) is baely seen on the lowe left of the bigge one). b) Two S maps ae identical (in both S maps a gey scale level is mapped to a distinct eal value. So having same intensities on each SI map poves that they have the same eal S values) c) C (cuvedness) map intensities (i.e. the C values) ae diffeent. Fo the small shape, the cuvedness is baely detected. Fo H and K maps, the effect would be simila to C Figue 17. a) 25x25 sampled suface. b) 50x50 sampled suface. c) On the left S map fo the 25x25 sampled suface is seen. In the middle S map fo 50x50 sampled suface is seen. On the ight we have the S map fo the oiginal 100x100 sampled suface. s the suface esolution gets lowe and lowe, the S map esolution also changes but the S values (the gey level intensities) do not get affected xiii

14 Figue 18. a) The oiginal suface (1) with 100x100 esolution; Suface 2 with 50x50 esolution scaled by ½; Suface 3 with 25x25 esolution scaled by ¼; b) H maps fo each suface is depicted. Intensities (e.g. eal H values) do not change fo diffeent scaled H maps, since scale/esolution atio is constant Figue 19. The diffeence in egion desciption of HK and SC spaces ove the (κ 1, κ 2 ) plane is depicted. Each numbe designates a egion which is descibed in Table 4. The shape index thesholds ae ±5/8, ±3/8, ±3/16, as it is fo Figue 4. The othe thesholds ae selected such that K zeo = H 2 zeo and C zeo = H zeo Figue 20. The HK&SC coupled space classification in (κ 1, κ 2 ) plane. The colous and numbes ae coespondent with Table Figue 21. a) Gaussian pyamiding ove a 1D signal, (highe scale levels towads the bottom). b) Gaussian Pyamiding on a 2D image. In each step, the image is both down-sampled and smoothed (highe scale levels towads the ight) [But and delson 1983] Figue 22. Gaussian Pyamiding on 3D scans is seen. In each scale level, the sampling ate is halved, but the metic distances ae peseved Figue 23. The scale levels of a depth image constucted by Gaussian pyamiding ae seen. The black egions ae the invalid points. ound the valid point boundaies the smoothing filte avoids blending with invalid points so that the shap boundaies ae peseved Figue 24. The smalle elements vanish as we go up the scale axis Figue 25. The scale-space of the suface nomals. Red values coespond to x components, geen values coespond to y components and blue components coespond to z components of the suface nomals Figue 26. The scale-space of mean cuvatue values. The concave egions which have positive mean cuvatue values, ae painted in ed, whee convex egions with negative mean cuvatue values ae painted in blue Figue 27. The scale-space of Gaussian cuvatue values. The paabolic egions with positive Gaussian cuvatue values ae painted in ed colou, whee xiv

15 the hypebolic egions with negative Gaussian cuvatue values ae painted in blue colou Figue 28. The scale-space of shape index values. The shape index values and colou is mapped accoding to Table 2 in Chapte Figue 29. The scale-space of cuvedness values. The gay level intensities designate the cuvedness values, whee zeo cuvedness value coesponds to black Figue 30. Thee levels fom the HK pyamid of a face model suface Figue 31. The HK UVS volume afte mophological opeations pinted above the suface patch. The featues ae indicated with thei suface shapes as colous (blue fo peaks, cyan fo pits, ed fo saddle idges, etc.) Figue 32. UVS Volume Constuction Figue 33. The weights ae demonstated in the HK UVS volume. The bigge the weight is the lighte the colou becomes Figue 34. Labelled layes of UVS space constucted by HK values whee scale level is inceasing fom left to ight. The oiginal suface level is indicated by S=0. Labels ae given by colous. (peak: blue, saddle idge: ed, convex cylinde: puple, pit: cyan, saddle valley: yellow, concave cylinde: geen, hypebolic: oange, plane: gay.) Figue 35. Ten lagest extacted featues ae shown as squaes whee the featue cente (x i ) is given by the squae cente, the featue size (adius) ( i ) is given by the squae size and the featue type is given by its colou Figue 36. Thee unit volume Gaussian sufaces at diffeent scales (σ = σ K, σ B = 2*σ K, σ C = 4*σ K ) and thei espective UVS volumes Figue 37. Localization of the peak featue (a) using both methods (top: weighted aveage, bottom: single maximum). b) Fo the ideal suface both methods localize the peak featue coectly. c) Fo the noisy suface, the featue is still coectly localized using weighted aveages (top) but the localization fails unde noise when the single maximum method is used (bottom) Figue 38. UVS volumes fo a. otated towads left, b. fontal and c. expessional (disgusted) faces xv

16 Figue 39. The featue extaction algoithm is scale invaiant since the extacted featues numeically depend on the scaling atio of a suface. The numbes ae the adii of the extacted featue elements Figue 40. Ten lagest featues extacted using a) oiginal image without scalespace seach, b) scaled image (by 0.8) without scale-space seach, c) oiginal image with scale-space seach, d) scaled image (by 0.8) with scalespace seach Figue 41. Ten lagest extacted featues fom fou diffeent object sufaces. Fo each object the oiginal (middle), 1.2 times scaled (left) and 0.8 times scaled (ight) vesions ae given. The featue types and sizes ae given by colou and squae size espectively (peak: blue, saddle idge: ed, convex cylinde: puple, pit: cyan, saddle valley: yellow, concave cylinde: geen, hypebolic: oange, plane: gay) Figue 42. Ten lagest extacted featues fom the noisy vesions of the objects: scewdive (top), duck (middle) and pig (bottom). a) %15 noise b) %30 noise c) %45 noise d) %60 noise Figue 43. a) Inteest point cente localization eo in mm fo diffeent noise levels. b) Radius calculation eo in mm fo diffeent noise levels Figue 44. a) The oiginal ange image taken fom [Stuttgat Database]. b) %25 occluded vesion of the ange image in (a). c) The depth histogam of the oiginal ange image (blue) and the %25 occluded image (ed) Figue 45. a) Nose is a featue element of peak type. Its adius is the adius of the blue cicle aound the featue. The local depth histogam of the egion within the adius of the featue cente is depicted Figue 46. a) The oiginal ange image taken fom [Stuttgat Database]. b) The 2D nomal histogam of the oiginal ange image. c) %25 occluded vesion of the ange image in (a). d) The 2D nomal histogam of the %25 occluded image Figue 47. a) The nose is a featue element of peak type. Its adius is the adius of the blue cicle aound the featue. The local nomal histogam of the egion within the adius of the featue cente is depicted xvi

17 Figue 48. a) The oiginal ange image taken fom [Stuttgat Database]. b) %25 occluded vesion of the ange image in (a). c) The shape index histogam of the oiginal ange image (blue) and the shape index histogam of the %25 occluded image (ed) Figue 49. a) The nose is a featue element of peak type. Its adius is the adius of the blue cicle aound the featue. The local shape index histogam of the egion within the adius of the featue cente is depicted Figue 50. The spin images of point p fom diffeent views ae shown. The spin images ae simila [Johnson and Hebet 1999] Figue 51. The taining phase Figue 52. a) The maked nose tip. b) The calculated nose peak cente. c) The maked inne ight eye pit. d) The calculated inne ight eye pit cente. e) The maked inne left eye pit. f) The calculated inne left eye pit cente.. 80 Figue 53. The testing phase Figue 54. a) Textue fo the 3D Scan. b) Black aeas denote the backgound. Red aeas denote the invalid points selected to be extapolated. Blue aeas denote the neighbouing valid points used fo extapolation. White aeas denote valid points. c) Cleaned copped and e-sampled point cloud Figue 55. successful and a poblematic case fo the poses a) with glasses. b) Mouth open c) Hand on one eye d) Look ight 75º e) Look ight and down 45º Figue 56. tificially ceated facial scans. Leftmost scan is the oiginal scan fom FRGC database. Othe scans ae ceated by applying the following tansfomations espectively fom left to ight: otation aound Y-axis by +10º,+20º,-10º,-20º, otation aound X-axis by +15º,-15º and otation aound Z-axis by +45º Figue 57. Vitually ceated facial scans and thei estimated pose angles Figue 58. Demonstation of geometic hashing: ll the topologies constucted fom a ange image ae compaed to the topology vectos in the database. The database model which eceives the geatest numbe of votes is taken as the match of the test object Figue 59. ngle theshold tests fo both methods xvii

18 Figue 60. Length theshold tests fo both methods Figue 61. Type tests fo both methods Figue 62. Featue numbe tests fo both methods Figue 63. Database size tests fo both methods Figue 64. Confusion matices of the methods: SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT, fom left to ight espectively. Oiginal testing images ae used Figue 65. Confusion matices of the methods: SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT, fom left to ight espectively. Scaled vesions of the testing images ae used Figue 66. Confusion matices of the methods: SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT, fom left to ight espectively. Occluded vesions of the testing images ae used Figue 67. Confusion matices of the methods: SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT, fom left to ight espectively. Scaled and occluded vesions of the testing images ae used Figue 68. Two atificial sufaces with a Gaussian peak and a Gaussian pit, which ae 135º in-plane otated (otation aound z-axis) of each othe Figue 69. Matched featues fom the egisteed atificial ange images Figue 70. Two ange images of the scewdive fom Stuttgat database. Thee is otation aound y-axis Figue 71. Matched featues fom the egisteed ange images fom Stuttgat Database Figue 72. Matched featues fom the egisteed ange images fom Stuttgat Database. The sufaces ae scaled vesions of the same ange image Figue 73. a) Wateshed boundaies detemined using the DEM. b) Dainage line ovelaid with wateshed of the egion c) Wateshed boundaies detemined using the evese DEM (shown with ed lines) ovelaid with wateshed boundaies which ae pesent on both the left and ight side of a sub-basin. d) Slope unit of egion obtained in 3D Figue 74. a) Oiginal Suface Patch. b) 4th Scale Level of the suface patch c) Candidate idge and dainage lines ae shown in black. d) Ridge and xviii

19 dainage lines afte thinning. e) Final idge and dainage lines afte the elimination of outlies. f) Detected line tips and cones ae shown in pink and an example inteest egion fo a tip is shown by a tiangle. g) Line tips connected to available cones and tips ae shown in geen. h) Segmented landslide egions ae shown in diffeent colous Figue 75. The ovelay of conventional and poposed method outputs. Legend: black line: conventional slope unit, pink line: poposed method slope unit, yellow colou: simila egions, blue colou: genealized units of conventional method o detailed units of poposed method, geen colou: detailed units of conventional method o genealized units of poposed method Figue 76. 3D view of slope units fom poposed and conventional methods Legend: black line: conventional slope unit, pink line: poposed method slope unit xix

20 CHPTER 1 INTRODUCTION Mankind s cuiosity on the natue and science has led him fom the invention of the wheel to this vey age of unmanned machines. Each and evey day, numeous new cleve systems ae being intoduced to ou lives. They ae able to see, compehend and act accoding to ou needs. I pesonally choose to call this an invasion. I stongly believe that this intelligent and technological invasion will linge until mankind is able to ceate his alike. In othe wods, until he is satisfied with playing God, if he will eve be On ou pat of playing God, we ae inteested in visual undestanding, the most intimate ability, the temendous divine gift we ae given. Compute vision, the science of mimicking this ability, will be ou main concen in this dissetation. Moeove, we will be specifically examining 3D object epesentation, the key human ability that enables him to undestand and diffeentiate eal wold objects. lthough quite young fo a scientific field, compute vision has supisingly had fuitful fifty yeas of past. With the inceasing availability of vaious digital imaging systems, fist image pocessing then compute vision have become one of the most pomising scientific fields in signal pocessing. Moeove, the field has advanced so fast that even the academicians sometimes find it difficult to catch up with the new advances. Fo instance face detection, once a vey hot topic of the field, is now a simple application in a low-cost cellula phone. Since it is highly a pactical field, the achievements in imaging technology pofoundly affect the diection of eseach. Fo instance, some poblems like segmentation of human body movements fom shot distances, which was hot and difficult poblem fo 2D imagey, is now a simple and solved poblem using time-of-flight cameas, and has 1

21 seveal usages in game industy. This is a single example among seveal, which shows the inceasing usage of 3D infomation in compute vision. Pocessing 3D infomation, namely point 3D clouds of 3D sufaces, is not always simple than the conventional 2D image pocessing, as in the human body segmentation example. 3D image pocessing and 3D object epesentation ae elatively new poblems and have vaious open points. Howeve since they pomise acquisition of the eal wold with depth infomation, (not fully pactically) independent of illumination and with metic size infomation; they may offe easie solutions compaed to thei coesponding solutions in 2D. 3D infomation can be acquied using diffeent devices such as, 3D ange scannes, steeo systems, lida scannes, time-of-flight cameas, satellite imagey, etc. The acquied signal is usually a gid o cloud of 3D points. The signal caies vast amount of infomation which is difficult to pocess in eal-time. Thus, methods to tansfom this data into a spase epesentation ae usually needed and ae common poblems of the field. Futhemoe some epeatability popeties of signal pocessing ae inheited fo thei 3D vesions as well, such as tansfom invaiance, scale invaiance, and obustness to noise because finding epeatable salient points o egions ove 3D sufaces (point clouds) is a basic peequisite fo 3D object detection, ecognition and even matching simila objects, namely egistation. In this thesis, we aim at constucting a epesentation method fo 3D object sufaces. We popose a featue extaction method which is scale, sampling and oientation invaiant. In this chapte, we commence by the poblem definition, ou motivation and contibution. 1.2 Poblem Definition Finding epeatable, obust and invaiant salient points and/o egions, is one of the vey impotant poblems of visual patten ecognition. These salient points and/o egions, usually called featues, can be used to epesent an object o match simila objects. Depending on the signal type, the featue extaction method may change. In ode to evaluate a featue extaction method, some basic popeties may be defined. [Tuytelaas and Mikolajczyk 2007] popose a numbe of basic popeties featue detectos. These ae epeatability (invaiance and obustness), distinctiveness, locality, quantity, accuacy and efficiency. Most of these ae competing popeties, such that it is impossible to satisfy all 2

22 of them simultaneously. Usually, a method is designed o selected accoding to the application needs and signal type. Scale and sampling invaiance of a featue detecto implies that the detecto is capable of poviding the same salient point o egion fo the scaled and e-sampled vesions of a signal. Extacting featue sizes (adii o effective egions) covaiant to scaling is anothe ability stongly elated to scale invaiance, which implies that the extacted size of the featue is covaiant with the size o sampling of the signal. In othe wods, fo a escaled vesion of a signal, the same featues ae detected and fo each detected featue, thei sizes ae scaled covaiant to the scaling atio. Repesenting the meaningful potion of infomation within a signal is cucial. Fo this eason, epesenting an object within a 3D point cloud is an impotant and difficult poblem. The basic needs of a epesentation ae being spase, epeatability and distinctiveness. In this thesis, we deal with the above poblems of 3D point cloud pocessing. In paticula, we focus on obtaining highe levels of scale and sampling invaiance in featue extaction. Futhemoe we stuggle to constuct a spase yet epeatable 3D object epesentation. We paticulaly chose these poblems since they ae still open poblems of 3D compute vision. 1.2 Motivation In the pevious decade, some impotant appoaches on featue extaction fom 3D point clouds and 3D object epesentation can be found in the liteatue. The elated studies ae given in the next chapte. Two impotant open points in the liteatue can be summaized as featue extaction with full scale invaiance and a genealized object epesentation scheme that can be used by diffeent applications. Little wok on the limits of scale invaiance on 3D featue extaction is epoted. Moe impotantly, fo the epoted scale invaiant 3D featue extactos, none to ou knowledge deal with the poblem of covaiant featue size extaction on scale vaying databases. Theefoe we analyze the existing scale space appoaches and seek fo a way to achieve highe scale invaiance, by manipulating the constuction of the signal scale-space. By this way, we also look fo a method to achieve signals with thei effective egion size. 3

23 Second, a geneic tansfom invaiant topological 3D suface epesentation standad is missing. Fo these easons we ae motivated on finding a geneic 3D epesentation fo 3D objects with bette scale invaiance popeties compaed to methods fom the liteatue. 1.3 Contibution lthough the poblems efeed in the poblem definition section ae compehensively examined, and still being examined in the liteatue, thee ae still open points. In this thesis we popose solutions fo some of these poblems such as the ones indicated in the pevious section. The poposed featue extaction method is novel in the sense that it constucts the signal scale-spaces with a diffeent stategy than the pevious appoaches, as it will be explained late in detail late. Via contolled expeiments on atificial sufaces, we show that ou appoach might be beneficial in tems of accuacy and obustness to noise. Secondly, as it will be explained late in the succeeding chaptes, sufaces can be classified into cetain types using diffeent cuvatue spaces. Fo instance using Gaussian (K) and Mean (H) cuvatues o using Shape Index (S) and Cuvedness (C), sufaces ae classified into the same eight types (pit, peak, etc.). Since shape index is scale invaiant, pevious compaisons show that using SC classification gives bette esults. Howeve, using a scale-space appoach, we show that HK classification gives bette esults in object ecognition, which is a valuable clue fo suface cuvatues in scale space. We popose a tansfom invaiant topological suface epesentation which can be used to ecognize, egiste o detect objects. The epesentation is defined within a featue vecto, which is expeimented fo object ecognition and detection. Last, but not the least, we show that using suface cuvatues in Digital Elevation Models (DEMs) povide fast and efficient methods fo landslide egion detection, compaed to slow conventional methods. 1.4 Outline of the thesis We commence this thesis by examining the elated wok in chapte 2. Pevious studies on 3D featue extaction, 3D face detection and 3D object ecognition using ange images ae given in this chapte. 4

24 The essential definitions of 3D signal pocessing ae pesented in chapte 3, Pocessing 3D Infomation. Basic 3D suface data fomats ae intoduced and basic methods on pepocessing 3D data fo the pupose of cleansing common defects such as noise ae discussed. The next chapte, Suface Cuvatues gives fundamental 3D cuvatues definitions, such as: pincipal cuvatues (κ 1, κ 2 ), mean cuvatue (H), Gaussian cuvatue (K), shape index (S), cuvedness (C). In addition, the classification capabilities of these cuvatue definitions ae descibed and discussed. Chapte 5, Scale Space of 3D Sufaces and 3D Cuvatue, dives into to the pincipal discussion in the thesis. Methods to obtain scale-spaces of 3D sufaces and cuvatue values fom 3D sufaces, is analyzed. Cucial clues fo scale and esolution invaiant object ecognition ae obtained fom the behaviou of 3D cuvatue values in scale-space. The key technique poposed in this thesis is descibed in Chapte 6, Featue Extaction and Object Repesentation. s its title implies, the method to extact tansfom invaiant featues fom 3D sufaces is given in detail. Moeove the constuction of tansfom invaiant topologies using the extacted featues is given in detail. Finally, a numbe of desciptos to define local egions aound the extacted featues ae intoduced. The succeeding chapte, Expeimental Wok povides all implementations we have done using the poposed featue extaction and object epesentation techniques. The fist subsection 3D Face Detection gives the implementation of the poposed 3D featue extaction technique on 3D face detection and 3D facial pose estimation. Detection of facial sufaces is tested on Bospous Database [Bospous Database 2007] and compaed to fou main methods fom the liteatue [Lu and Jain 2005], [Lu and Jain 2006], [Chang et. al. 2006], and [Colombo et. al. 2005]. Then the implementation of the main subject of this thesis is given in 3D Object Recognition. Using moe than ten thousand ange images fom Stuttgat ange image database [Stuttgat Database], object categoy ecognition tests ae caied out. Futhemoe, compaisons between, multi-scale vs. single Scale featues and ecognition capabilities of HK and SC cuvatue definitions ae given. Most impotantly ou poposed method is compaed to 2D SIFT. In addition, the qualities of the extacted scale invaiant featues ae evaluated fo the pupose of 3D egistation. Finally in Landslide Region Detection Using Cuvatue values obtained fom DEMs, using scale space of cuvatues, egions with possible landslide activity ae extacted fom digital elevation models (DEMs). The poposed method is compaed to a conventional, high computation cost technique [Multi-Wateshed]. 5

25 Finally, Chapte 8 concludes the dissetation. Final discussions ae given and possible futue diections ae evaluated. 6

26 CHPTER 2 RELTED WORK s intoduced in the pevious chapte, in this dissetation, we discuss majo issues of featue extaction and object epesentation. We also intend to popose novel techniques fo them. Futhemoe, the poposed techniques ae evaluated on vaious applications such as, 3D facial ecognition, 3D object ecognition and 3D egistation. Consideable numbes of studies and contibutions, which ae elated to the theoy and the expeiments povided in this thesis, have been epoted. Fo the sake of claity, liteatue on each subject is examined sepaately in this chapte D Featue Extaction Featue extaction is one of the basic topics in image pocessing. Featue is a salient egion on a signal, which locally defines o epesents the signal behaviou. Fo example, cone is a good featue fo 2D images since expeiments show that emoving the cones fom images intensely impedes human ecognition [Biedeman 1987]. The main poblem in this thesis, extacting featues fom 3D sufaces, is stongly coelated to the 2D featue extaction liteatue. Fo a detailed analysis on 2D invaiant featue extaction, eade may efe to [Tuytelaas and Mikolajczyk, 2007]. Fo 3D suface change is elated to suface cuvatues. We commence by 3D suface cuvatues, and then we continue with scale invaiance concept in 3D patten ecognition. Finally we visit the liteatue on descibing 3D suface behaviou locally. 7

27 D Suface Cuvatues Using suface cuvatues on ange images fo the pupose of image ecognition and image segmentation [Fan et. al. 1986, Fan et. al. 1989, Besl and Jain 1986, Ittne and Jain 1985] dates back to 1980s, when ange image acquisition stated to become available. The pactical side of the poblem, estimating tue cuvatue values fom acquied ange image is also studied in this ea [Flynn and Jain 1989]. In the following yeas, mean and Gaussian cuvatues ae pefeed methods in 3D suface classification liteatue. Howeve, this epesentation lacks the notion of scale invaiance, and the shape index (S) and cuvedness (C) [Koendeink and Doon 1992] ae poposed, which ae also based on pinciple cuvatue values. Using H and K values o S and C values, HK and SC cuvatue spaces ae constucted in ode to classify suface patches into types such as: pits, peaks, saddles, paabolas, hypebolas o planes. Since both HK and SC spaces classify suface patches in to simila types, thei classification capabilities ae compaable. Fo this eason, thee is an ongoing debate on the advantages and disadvantages of using mean & Gaussian (HK) o shape index & cuvedness (SC) cuvatue spaces fo object ecognition applications. In [Cantzle and Fishe 2001], HK and SC cuvatue desciptions ae compaed in tems of classification, impact of thesholds and impact of noise levels and it is concluded that SC appoach has some advantages at low thesholds, in complex scenes and in dealing with noise. Howeve in that study the cuvatues ae calculated only at the lowest scale, i.e. the given esolution. Scale-spaces of the sufaces o the cuvatues ae not defined. nothe compaative study has been caied out in [Li and Hancock 2004] whee cuvatue values obtained fom the shading in 2D images ae used and HK and SC histogams ae ceated. The compaison esults show that SC histogams ae slightly moe successful in tems of classification. Yet again, the tested esolution is the pixel esolution of the 2D image and the effect of sampling is ignoed D Scale Invaiant Featues In the pevious decade, scale concept has become common in compute vision. [Lindebeg 1998] s wok on automatic scale selection in featue point detection and same autho s enowned book on scale space [Lindebeg 1994] attacted so much attention that, the vey famous scale invaiant featue tansfom [Lowe 2004] was discoveed in the following yeas. Howeve, fo that decade, nealy all of the studies on scale concept wee on 2D image pocessing. lthough shape index [Koendeink and Doon 1992] was intoduced 8

28 fo 3D sufaces, scale invaiant featue extaction fom sufaces o ange images did not attact much attention until ecent yeas. Recently [Li and Guskov 2007] extacts multi-scale salient featues using only two scale levels of the suface nomals and analyzes its pefomance on object ecognition fo the Stuttgat ange image database. [Pauly et. al. 2003] extacts multi-scale featues which ae classified based on suface vaiation estimation using covaiance analysis of local neighbouhoods, in ode to constuct line featues. [Lo and Siebeta 2009] define the 2.5D SIFT, the diect implementation of SIFT [Lowe 2004] famewok on ange images, howeve they pesent thei compaison with simple match matices and avoid giving a compaison of ecognition capabilities. ll of these methods constuct a scale space of the suface using diffeence of Gaussians (DoG) and seek fo the maxima within this scale space. In addition, although these studies all pomise scale invaiance, none of them, o any othe study to ou knowledge, test the ecognition ability of thei poposed scale invaiant 3D epesentation technique using a scale vaying database of 3D models. The othe cucial equiement fo 3D ecognition, namely obustness unde occlusion, has been issued in vaious ways peviously [Johnson and Hebet 1999], [Mechán et. al. 2008]. Yet again to ou knowledge, no paticula study has been epoted on testing the ecognition capabilities with espect to both scale and occlusion in ange image databases D Desciptos Range images, which have been usually pocessed as 2D images, cay both the 3D metic and geometic infomation of objects in the scene. Seveal local o global 3D point featues and 3D desciptos ae deived fom these sampled sufaces and used fo 3D object ecognition, 3D object categoy ecognition, 3D suface matching and 3D egistation. Some of these featues ae SIFT [Lowe 2004] (as diectly applied to 2D endeed ange images), 2.5D SIFT [Lo and Siebeta 2009], multi-scale featues [Li and Guskov 2007], [Pauly et. al. 2003], spin images [Johnson and Hebet 1999], 3D point signatues [Chua and Javis 1997], 3D shape context [Fome et. al. 2004], suface depth, nomal and cuvatue histogams [Hetzel et. al. 2001], 3D point fingepints [Sun et. al. 2001], extended Gaussian images [Hon 1984] o density-based 3D shape desciptos [kgül et. al. 2007]. 3D desciptos o histogams geneally define the whole o a pat of an object, using diffeent popeties of the sufaces such as cuvatues, nomal diections, distances to a 9

29 base point etc. They ae vey poweful in epesenting a suface patch fo ecognition puposes. Howeve when they ae globally defined, they ae bittle against occlusion. On the othe hand, local desciptos ae defined aound featue points. Howeve, detecting featue points and estimating the effect egion of the local descipto aound a featue point ae seious poblems. Using fixed sized local desciptos obtained fom andom points on the suface [Johnson and Hebet 1999] is one of the ealie appoaches D Object Repesentation The concept of epesenting a designated element on a signal, fo example an object in an image, is application-based and application-dependent. Thus any poposed method elated to the application field actually defines its own epesentation in some cetain manne. Fo this eason, it is difficult to mention about a homogenous object epesentation liteatue. Howeve thee is still an obsevable evolution of 3D object epesentation methods in the liteatue. This pioneeing studies on 3D object ecognition focus on model-based appoached since fist applications ae on CD-based vision systems. [man and ggawal 1993] pesent a dense eview on these type appoaches. Since these appoaches geneally define an object using basic shapes, like cubes, cylindes, etc.; the applications wee limited to industial objects whee object is clealy acquied, and thee is no need fo segmentation. Howeve as the need fo detecting objects in a clutteed 3D scene ises up, moe cuvatue based methods ae equied. s explained in the pevious sections, cuvatue based appoaches locally define shapes using suface behaviou. ppoaches to epesent objects using cuvatue values can be found in the liteatue [Flynn and Jain 1988]. These appoaches ae followed by featue based studies, which use cuvatue infomation ove the suface to detect epeatable and invaiant salient points and egions. dvanced epesentations of 3D sufaces can be divided into two majo banches. The fist one is using diffeent suface desciptos as defined in the pevious sub-section. Othe appoach is using topologies whee the spatial infomation of salient egions is taken into consideation. This topological epesentation may be consideed both local and global. Futhemoe vaious poposed methods povide obustness unde occlusion fo topological (o gaph-based) 3D object epesentations. 10

30 D Object Recognition Simila to face detection, the liteatue on 2D object ecognition is highly populated. Howeve it is difficult to say that it is satuated, because thee ae still many open poblems fo both 2D and 3D object ecognition. Reade may efe to [Edelman 1997, Roth and Winte 2008] fo a detailed suvey on 2D object ecognition. 3D object ecognition liteatue is mainly the combination of the studies on 3D suface cuvatues, 3D featue extactos, 3D desciptos and 3D object epesentation. Fo this eason the majo appoaches on the subject is aleady given within the pevious subsections. Thee ae a couple of good suvey papes on the subject [Osman et. al. 2004, Campbell and Flynn 2001]. It is quite expected that updated vesions of these epots will soon be published since new developments quickly affect the field, such as pobabilistic models inceasing usage in object ecognition [Simon and Seitz 2007, Hu and Zhu 2010] D Facial Detection 2D face detection and ecognition have attacted so much attention fo the last foty yeas that the eseach in these fields is faily satuated. Fo a eview of 2D face detection and ecognition algoithms, eade may efe to [Yang et. al. 2002] and [Zhao et. al. 2000] espectively. lthough it is ecently epoted that 2D and 3D face ecognition algoithms ae compaable [Chang et. al. 2005], studies on 3D face detection and ecognition methods have shown a emakable pogess in the last few yeas as 3D scannes become moe and moe available. Fo detailed suveys of 3D face ecognition, eade may efe to [Kittle et. al. 2005] and [Bowye et. al. 2004]. Most of the 3D face detection and ecognition appoaches ae based on local facial featues such as eyes, mouth, nose, pofile, silhouette and face bounday. Success of these appoaches depends highly on the success of the featue detection algoithms. Since, in this study, we also pesent a method fo local featue extaction and we apply ou method fo face detection, we would athe summaize liteatue fo 3D facial featue extaction and face detection. Pevious studies show that, nose tip and inne eye pits have usually been selected as ancho points on faces. [Colby et. al. 2005] use shape index [Koendeink and Doon 1999] fo detemining the ancho points. Chang and Bowye [Chang et. al. 2006] also use shape index fo the detection of inne eye pits and the nose tip is detected by using the eye pit locations. [Colombo et. al. 2005] find ancho point candidates by using shape index 11

31 values and PC is used in ode to classify the ancho points. In addition shape index has also been used to ecognize objects othe than faces [Wothington and Hancock 2001]. [Nagamine et.al. 1992] finds five featue points by using a simila method of [Colombo et. al. 2005] and use those featue points to standadize face pose. [Lu and Jain 2005] fist extact nose tip as the closest point to the camea. Secondly, possible locations of the two inne eye pit points ae selected by using shape index and coneness measue extacted fom the face suface. ncho point detection is pefomed in a simila way fo otated 3D data in [Lu and Jain 2006] D Object Registation Suface egistation is an intemediate but cucial step within the compute vision systems wokflow. The goal of egistation is to find the Euclidian motion between a set of ange images of a given object taken fom diffeent positions in ode to epesent them all with espect to a efeence fame. Registation in geneal can be divided into two: coase egistation and fine egistation [Salvi et. al. 2006]. In coase egistation, the main goal is to compute an initial estimation of the gid motion between two clouds of 3D points using coespondences between both sufaces. In fine egistation, the goal is to obtain the most accuate solution as possible. Needless to say that the latte method usually uses the output of the fome one as an initial estimate so as to epesent all ange image points with espect to a common efeence system. Then it efines the tansfomation matix by minimizing the distance between the tempoal coespondences, known as closest points. In coase egistation, which is the main type of egistation we intend to accomplish in this thesis, uses point-to-point coespondences, such as the [Chua and Javis 1997, Johnson and Hebet 1999]. n impotant aspect of coase egistation is the way of computing the motion when coespondences ae found. Robustness in the pesence of noise is anothe impotant popety, because thee ae usually no coesponding egions between views. Studies on using a (patial) gaph-to-gaph appoach on ange images, instead of point-to-point coespondences, is not epoted to ou knowledge. Fo a wide liteatue suvey on egistation of ange images eade may efe to [Salvi et. al. 2006]. 12

32 CHPTER 3 PROCESSING 3D INFORMTION In this chapte, diffeent fomats of 3D infomation ae investigated. Since aw output of 3D scannes is used as the main souce of 3D infomation, methods to cleanse cetain defects fom these types of data ae also examined D Infomation Fomats The 3D infomation can be found in vaious foms. Basically it is a point cloud of vetices in a thee-dimensional coodinate system. Howeve depending on the type of the signal, 3D infomation take diffeent names Range Images Range image (scan) is a 2.5D image showing the distance to points in a scene fom a specific point, nomally associated with some type of senso device. The esulting image has pixel values which coespond to the distance. If the scanne which is used to poduce the ange image is popely calibated, the pixel values can be given exactly in physical units such as metes. The sensos used to obtain ange images ae usually called 3D scannes. The aw output of these 3D scannes is a egula gid of 3D points lying on a u-by-v mesh, namely a depth image (map) of the scene. This epesentation is known as the Gaph Repesentation [Flynn and Jain 1989], whee a single 3D point on the suface is given by: [ x( u, v) y( u, v) z( u, )] X( u, v) = v (1) 13

33 In Figue 1, a 2D demonstation of a ange image is seen. Each pixel value indicates the distance of that pixel fom the scanne. Howeve a ange image encapsulates moe infomation than a 2D image since fo each pixel coodinate (u, v), thee ae thee metic values: x(u, v), y(u, v) and z(u, v). Fo this eason, they ae also efeed as 2.5D data. The egions with zeo pixel value ae invalid egions. These egions have no valid depth values thus no valid 3D coodinates. Figue 1. Sample ange image: bunny fom [Stuttgat Database] Thee ae seveal types of 3D scannes such as lase, ultasound and optical. Each technology comes with its own limitations, advantages and costs. Fo patten ecognition and obot vision applications using eal wold objects, 3D optical scannes ae usually pefeed. The ange image is the most basic fom of 3D data. This egula data may be manipulated in diffeent ways fo diffeent applications. The next subsection biefly goes ove some othe fequently used 3D data fomats Polygonal Data Fo visualization puposes, compute gaphics based applications equie a set of polygonal sufaces. Fo instance, if the valid points ae extacted fom a ange image and vetex goups that fom a plane ae found, the 3D infomation within the ange image is conveted to 3D polygonal data. Polygonal data is cucial since most commecial 3D gaphical intefaces used in today s computes ae designed to pocess polygonal meshes. 14

Unlike ange images, polygonal data may include a complete model of the object. When a ange image of an object is taken, only the 3D points acoss the viewing angle ae captued.

34 Unlike ange images, polygonal data may include a complete model of the object. When a ange image of an object is taken, only the 3D points acoss the viewing angle ae captued. Howeve combining vaious ange images of an object taken fom diffeent viewing angles, a polygonal mesh of the complete model of the object can be constucted (Figue 2). Figue 2. The polygonal fom of the object in Figue 1 is obtained by combining the valid vetices obtained fom diffeent ange scans of the object. The esolution of polygonal models may be alteed without losing 3D shape infomation. Fo example, an aea with low cuvatues (e.g. a plana aea) may be epesented using few vetices. Howeve depending on the data acquisition technique, this aea might be full of 3D points, which ae mostly edundant. In this case, a pocess called decimation is applied on polygonal meshes. Regions with low cuvatue value ae found and the sampling aound these egions is educed using seveal techniques Paametic Repesentations Some compute vision based techniques and gaphics applications equie a continuous epesentation of 3D data in ode to apply analytic methods instead of numeic ones. Fo this eason methods like spline fitting may be applied to 3D scanne outputs in accodance with the equied application. In these methods the main motivation is to tansfom the digitized data into continuous and paametic fom by fitting a cetain type of function to the valid vetices of the object suface. Having continuous suface functions and deivatives ae the benefits of this technique. Regession and spline fitting ae some the examples of paametic epesentations. 15

3.1.4 Digital Elevation Models digital elevation model (DEM) is a digital epesentation of gound suface topogaphy o teain (Figue 3). It is also widely known as a digital teain model (DTM).

35 3.1.4 Digital Elevation Models digital elevation model (DEM) is a digital epesentation of gound suface topogaphy o teain (Figue 3). It is also widely known as a digital teain model (DTM). DEM can be epesented as a aste (a gid of squaes) o as a tiangula iegula netwok. DEMs ae commonly built using emote sensing techniques; howeve, they may also be built fom land suveying. DEMs ae used often in geogaphic infomation systems, and ae the most common basis fo digitally-poduced elief maps. a) b) Figue 3. a) The DEM image is depicted. b) The 3D ende of the DEM image is shown. DEMs ae simila to 2D images whee the pixel intensity designates the elevation in metic coodinates. Usually each pixel is captued with a fixed metic sampling ate (such as using 20m intevals), thus the hoizontal coodinates also epesent metic coodinates and the infomation is actually 2.5D. The quality of a DEM is a measue of how accuate elevation is at each pixel and how accuately is the mophology pesented. Seveal factos play an impotant ole fo quality of DEM-deived poducts, such as: teain oughness, sampling density, gid esolution o pixel size, intepolation algoithm, vetical esolution and teain analysis algoithm. DEMs ae vey simila to ange scans in the sense that the depth and the pixel size ae in metic coodinates. Thus 3D cuvatues techniques can be applied to them as well. In chapte 8, a method to detect landslide egions using cuvatue infomation of DEMs is poposed. Thee ae a numbe of othe 3D data fomats, such as volumetic epesentations. We chose not to mention them hee, since the fou fomats mentioned above, namely ange scans, polygonal data, paametic epesentations and DEMs ae the basic fomats that will be used in this thesis. Range scans will be ou basic souce signal on which we will pocess ou featue extaction techniques. We will sometimes convet them to polygonal 16

36 data fo visualization puposes since most 3D visualization softwae and hadwae use 3D polygonal fomats. Paametic epesentations will be needed fo finding suface gadients on local suface patches. Splines will be fit on small local egions and suface gadients will be extacted fom thei explicit analytical equations. Finally suface cuvatues will be calculated on scale-spaces of DEMs in ode to automatically detect landslide egions. 3.2 Common Defects of 3D Scanne Outputs 3D scannes can send tillions of light photons towad an object and eceive only a small pecentage of those photons back via the optics that they use. Fo widely used optical scannes, a lighte suface will eflect lots of light whee a dake suface will eflect only a small amount. Fo this eason moe specula and dake egions, such as dak hai, might not be detected by these types of scannes and might be maked as invalid on the scan. Moeove, physical limitations of the senso may lead to noise on the acquied data set. Sample points can be coupted by quantization noise o motion atefacts. Futhemoe multiple eflections and heavy noise can poduce off suface points (outlies). Besides, holes on the model suface occu due to occlusion, citical eflectance popeties, constaints in the scanning path o limited senso esolution. Many scannes tend to ceate ghost geomety when the scanned object is textued. In this section some common defects of 3D scanne ae investigated False Holes (False Invalid Points) Due to vaious easons, 3D scannes may mak egions as invalid. 3D scannes usually wok on optical pinciples to calculate 3D coodinates of points on thei image planes. Howeve fo example, dak hai may eflect light photons like a mio and the scanning system may be unable calculate valid coodinates fo that egion. Then, the scanne maks points in this paticula egion as invalid. It is clealy seen in 3D facial databases [Bospous Database 2007] that the, haiy egions such as: dak hai, eyebows, eyelashes, dake facial hai ae usually maked as invalid by the optical scanne used (Figue 4). 17

Figue 4. facial scan taken fom FRGC v1.a 3D database is shown. Invalid egions aound eyebows ae maked as invalid in the depth image. 3.2.

Fo egions with citical eflectance popeties (such as specula egions like dak hai as mention in the pevious subsection) the eflected light on image planes might be localized incoectly.

37 Figue 4. facial scan taken fom FRGC v1.a 3D database is shown. Invalid egions aound eyebows ae maked as invalid in the depth image Exploded/Imploded Regions Optical scannes tend to ceate spikes mostly because of simila easons why they ceate holes. Fo egions with citical eflectance popeties (such as specula egions like dak hai as mention in the pevious subsection) the eflected light on image planes might be localized incoectly. Thus instead of having invalid coodinates, these points might have valid flag, but they cay incoect depth values. In Figue 5, a spike on the iis of a facial scan fom FRGC v1.a 3D facial database is seen. Since iis is a highly specula egion, one paticula pixel depth is incoectly calculated and a spike is fomed. Figue 5. spike on the iis of a facial scan fom FRGC 3D facial database. These incoectly calculated points ae not necessaily outwads the diection of the camea. The might be imploded inwads the 3D suface. In addition these egions ae not 18

3 Noise Fo any emote sensing device, fom a adio eceive to 3D scanne, some kind of noise is obseved on the output signal. Tansmitting media o the components of the device itself may cause the noise.

38 necessaily single points; they might be a goup of points if the eflectance is citical fo that whole egion (Figue 6). Figue 6. Nose is imploded in the 3D ange scan (left) because of exteme illumination on the nose tip (FRGC v1.a) Noise Fo any emote sensing device, fom a adio eceive to 3D scanne, some kind of noise is obseved on the output signal. Tansmitting media o the components of the device itself may cause the noise. Despite any effot to neutalize this noise, it is impossible to completely cleanse it fom any electonic emote sensing device. a) b) c) d) Figue 7. a) 3D atificial suface without noise b) Nomal diections ove the noise-fee suface. c) 3D atificial suface with noise. d) Nomal diections ove the noisy suface [Bozkut et. al. 2009]. Since most 3D scannes use optics to acquie the signal, the type of noise is simila to 2D noise. 3D scanne outputs expeience amplifie noise, which is highly Gaussian. Thee is also poweful quantization noise if the esolution of the scanne is elatively low [Bozkut 19

39 2008]. 2D noise is convolved within the intensity of the pixels. 3D noise distubs the 3D coodinates of the points in the point cloud. demonstation of noise ove an atificial suface is depicted in Figue Pocessing 3D ange scans In ode to neutalize the defects of 3D ange scans, a pocessing on the 3D point cloud may be needed. Unsupisingly, depending on the type of the defect, the pocessing method vaies. Only afte these defects ae neutalized, the ange scan is popely pepaed fo the equied application. In this section, methods to pocess these defects ae inspected Pocessing False Holes Theoetically holes ae invalid maked points, which actually belong to a goup of points with valid 3D coodinates. In Figue 4, a good example is given fom FRGC database, whee some point on the eyebows of a 3D facial scan ae maked as invalid. Howeve these so called holes may belong to the image backgound, o an actual hole which sees though the image backgound (Figue 8), which should have invalid coodinates. Figue 8. The hub object fom the Stuttgat 3D ange image database is shown. Thee is no defect on the ange scan; howeve the object possesses a numbe of actual holes. The ange image in Figue 8, has no defects, howeve the object possesses a vaiety of actual holes. Thus, befoe pocessing the image fo false invalid points, one should decide if the invalid maked points is a defect o not. In ode to solve this poblem manual and 20

automatic methods ae used. The manual methods ae not feasible when thousands of ange image ae concened, whee automatic methods have no guaantee of tuly disciminating false holes among invalid points.

40 automatic methods ae used. The manual methods ae not feasible when thousands of ange image ae concened, whee automatic methods have no guaantee of tuly disciminating false holes among invalid points. Most 3D scannes come up with a softwae which can be used to manually fill these holes. These intefaces help the use to mak the egion encapsulating the false hole, and intepolate it using the neighbouing valid points. The intepolation could be linea o quadatic. On the othe hand a fully automatic method would detect the false holes by means of some intelligent citeion. This citeion could be as simple as the size of the hole, o as complex as a model-based appoach guessing whee the object might have an actual hole. In this thesis, we have completed ou expeiments on FRGC v.1a 3D facial database [FRGC1a], Bospous 3D facial database [Bospous Database 2007] and Stuttgat 3D ange image database [Stuttgat Database]. These databases include nea twenty thousand ange images in total. FRGC database includes the aw scanne outputs which ae noisy and full of false holes. We have applied automatic pocessing to these holes, whee holes having a cetain size smalle than a theshold wee consideed false, and intepolated linealy using the neighbouing valid points (Figue 9). Figue 9. a) Textue fo the 3D Scan. b) Black aeas denote the backgound. Red aeas denote the invalid points selected to be extapolated. Blue aeas denote the neighbouing valid points used fo extapolation. White aeas denote valid points. c) Pocessed point cloud. The same pocess was also applied to Bospous database although a manual pocessing was applied to this database duing the acquisition. Stuttgat database is an atificial database, which is constucted atificially by obtaining ange images fom 3D objects by 21

41 vitually scanning them. Fo this eason this database is fee of defects and does not need any pocessing Pocessing Exploded/Imploded Regions It is nealy impossible to constuct a decision system which will detect any incoect valid point, because the stuctue of the scanned suface is usually unknown. Fo example a spike on the suface may vey well be a valid suface stuctue if the scanned suface includes a needle. Fo this eason widely used techniques to pocess these egions ae manual. Simila to manual hole filling softwae, thee ae seveal pogams which povide semiautomatic tools to cleanse these defects fom 3D ange images. On the othe hand, if thee is peliminay stuctual knowledge about the scanned suface, then cetain model-based automatic methods may be applied. Fo example, Nesli Bozkut, a fome gaduate student fom ou compute vision laboatoy in METU, constucted a facial defect cleansing system which uses the symmety infomation of the face. So fo facial scans, a lateal symmety citeion is sought ove the 3D facial suface and any element (a hole o a spike) violating this citeion is consideed as a defect and handled popely [Bozkut 2008] Pocessing Noise 3D scanne outputs expeience amplifie noise, which is highly Gaussian. Thee is also poweful quantization noise if the esolution of the scanne is elatively low [Bozkut 2008]. This noise can be patially emoved by using a Gaussian smoothing filte. Howeve Gaussian filtes expeience poblems with peseving the edge details. mong numeous methods, Bilateal Filteing is poven to be a poweful yet simple, non-iteative scheme fo edge-peseving smoothing in 2D [Bozkut 2008]. This image pocessing technique has been successfully extended to 3D by making small modifications in the application. Bilateal filte is distinguished fom Gaussian blu by its additional ange tem. In this case, not only the distances between positions matte, but also the vaiation of the intensities ae also taken into account in ode to keep the featues fom deogation. Consequently, the edges whee high intensity diffeences occu ae successfully peseved. In 2D, the intensity values ae a function of position values, wheeas in 3D the position in fact, is the signal itself. Hence, the modification of bilateal filte to be applied to 3D data is not vey staightfowad. Like most image pocessing algoithms that ae extended to 22

sufaces, nomal infomation at each point of the suface can be used to fom an intensity space like in images.

2008]. s seen fom the figue, the edges ae popely peseved in 3D bilateal filteing (Figue 10). Figue 10.

When the bilateal filte is applied to oiginal scans fom the FRGC v1.

42 sufaces, nomal infomation at each point of the suface can be used to fom an intensity space like in images. The effect of noise cleaning using 3D bilateal filteing is depicted in Figue 10, which is taken fom [Bozkut 2008]. s seen fom the figue, the edges ae popely peseved in 3D bilateal filteing (Figue 10). Figue 10. Sufaces de-noised with Gaussian (left) and Bilateal (ight) filtes [Bozkut 2008]. When the bilateal filte is applied to oiginal scans fom the FRGC v1.a database, the esults show that the edge details aound the eye is well peseved wheeas the facial suface is cleaed off the noise (Figue 11). Figue 11. The esult of the bilateal filteing on a FRGC sample.. 23

43 CHPTER 4 SURFCE CURVTURES In ode to constuct intelligent systems capable of classifying diffeent shapes, one should find a way to undestand o model the behaviou of a solid shape. When it comes to 3D shapes and sufaces, the suface cuvatues ae the key concept defining this behaviou. Fo a geneal eview of suface cuvatues and suface behaviou, eade may efe to Koendeink s enowned book on solid shapes [Koendeink 1990]. In this chapte the geneal concepts of suface cuvatues, such as: suface gadients, pincipal cuvatues (κ 1, κ 2 ), Mean cuvatues (H) and Gaussian cuvatues (K), shape index (S) and cuvedness (C) ae eminded. In addition, the tansfom invaiance and the effects of scaling and sampling on these cuvatues ae discussed. By the end of the chapte, a theoetical compaison of HK and SC fundamental shape definitions is pesented. We commence by discussing the most eliable way to obtain a cuvatue of a digitized suface. 4.1 Reliable Cuvatue Estimation The behaviou of the suface may be calculated by the fist patial deivative of the suface function. In the pevious chapte, the geneal digital suface data fomat is analyzed specified as 2.5D data o the Monge Patch, which is defined as: [ u v f ( u, v) ];{ u v} ε R X( u, v) =, (2) s it is peviously stated, u and v paametes ae matched to x and y coodinates and this suface can be specified as the height f (u, v) above the so-called suppot plane defined by the two coodinates (u, v). t any point p on this continuous suface, we can choose two othogonal vectos in the tangent plane and examine the suface s behaviou in those 24

44 diections. The vectos in the diection of the othogonal paametes (u, v) ae called the suface gadients and ae equal to: X(u, v)/ u = [ 1 0 f u ], X(u, v)/ v = [ 0 1 f v ] (3) f u and f v ae the patial deivatives of the height function f (u, v) with espect to u and v. ccodingly the suface nomal n, which is othogonal to suface gadients at point p, is defined as: X X n = = [ f u f v 1] (4) u v Theoetical computation of the suface gadients is simple when an explicit definition of the continuous suface function is known. Howeve this is not the case when you ae dealing with digitized scans in eal wold poblems. Fo this eason a digital vesion of all functions o kenels must be used. Thee ae diffeent methods of obtaining patial deivates fom a digital signal. [Flynn and Jain 1989] focuses on eliable digital cuvatue estimation using diffeent types of these methods and stives to find a conclusion on the best method in ode to estimate suface gadients of discete sufaces. They implement analytical methods such as: fitting othogonal polynomials and splines and linea egession aound the point at which the gadient estimates ae done. In addition they implement numeical estimates whee finite diffeence is used. This pioneeing study emphasises on the impotance of Gaussian smoothing befoe any digital cuvatue estimation. They check the obustness of all methods against noise and indicate that pe-smoothing enhances the estimation success pofoundly. Fo this eason in this study all of the sufaces ae pe-smoothed in ode to lowe the effect of any type of noise. In thei study, Flynn and Jain use 7x7 o 5x5 patches aound the inteest point, accoding to the method implemented. Needless to say that when estimating digital cuvatue this neighbouing patch size is of exteme impotance since the discete deivatives highly depend on the sampling ate. Fo this eason, even though they make vey impotant indications in thei study; it is difficult to ely on some of thei obsevations since the data they use is vey much lowe in esolution compaed to today s digitized scans. One of the main motivations of this thesis is to eliminate the effect of sampling o the neighbouing patch selection when estimating the digital cuvatues fo object ecognition. In this study, 3x3 patches ae used and quadatic functions ae fit to estimate cuvatues. 25

45 Howeve these cuvatues ae investigated in a scale space of the suface. The details of these pocedues ae given in the succeeding sections. 4.2 Pincipal Cuvatues In diffeential geomety, the two pincipal cuvatues at a given point of a suface measue how the suface bends by diffeent amounts in diffeent diections at that point. t each point p of a diffeentiable suface in 3D Euclidean space one may choose a unique unit nomal vecto. nomal plane at p is one that contains the nomal, and will theefoe also contain a unique diection tangent to the suface and cut the suface in a plane cuve. This cuve will in geneal have diffeent cuvatues fo diffeent nomal planes at p. The pincipal cuvatues at p, denoted κ 1 and κ 2, ae the maximum and minimum values of this cuvatue. The figue below depicts these cuvatues and thei nomal planes. a) b) c) Figue 12. a) The nomal plane with the maximum cuvatue is seen. b) The nomal plane with the minimum cuvatue is seen c) κ 1 = 1.56 and κ 2 = The suface is a patch fom a monkey saddle: z(x,y) = x 3-3 x y 2. The pincipal cuvatue values and the pinciple diections of the cuvatues ae calculated by taking the eigenvalue decomposition of the Hessian Matix which is defined as: 26

46 2 2 X X 2 u uv H = (5) 2 2 X X 2 uv v The eigenvalues of this symmetic matix give the pincipal cuvatues κ 1 and κ 2, whee the eigenvectos give the pinciple cuvatue diections. Suface points can be classified accoding to thei pincipal cuvatue values at that point. point on a suface is called: Elliptic: (κ 1 κ 2 > 0) if both pincipal cuvatues have the same sign. The suface is locally convex o concave. Umbilic: (κ 1 = κ 2 ) if both pincipal cuvatues ae equal and evey tangent vecto can be consideed a pincipal diection (and Flat-Umbilic if κ 1 = κ 2 = 0). Hypebolic: (κ 1 κ 2 < 0) if the pincipal cuvatues have opposite signs. The suface will be locally saddle shaped. Paabolic: (κ 1 = 0, κ 2 0) if one of the pincipal cuvatues is zeo. Paabolic points geneally lie in a cuve sepaating elliptical and hypebolic egions. This is the basic classification fo sufaces accoding to thei pincipal cuvatues. Examples of these types of shapes can be seen in Figue 13. Mean (H) and Gaussian (K) cuvatues, shape index (S) and cuvedness (C) ae also calculated using the pincipal cuvatues and moe essential classification ae made accoding to thei values. 4.3 Mean (H) and Gaussian (K) Cuvatues Using pincipal cuvatues, Mean (H) and Gaussian (K) Cuvatues ae calculated as: κ 1 + κ H = 2, K = κ1 κ2 (6) 2 H is the aveage of the maximum and the minimum cuvatue at a point, thus it gives a geneal idea on how much the point is bent. K is the multiplication of the pincipal cuvatues and its sign indicates whethe the suface is locally elliptic o hypebolic. [Besl 1986] was the fist to use HK values fo the pupose of suface segmentation. Using HK values, the egions ae defined as in 27

47 Table 1. Shape Classification in HK cuvatue space. Coesponding examples of suface egions ae seen in Figue 13. K>0 K=0 K<0 H<0 Convex (1) (Elliptic o Umbilic) Ridge (2) (Convex Paabolic) Saddle Ridge (3) (Hypebolic) H=0 (Not possible) (4) Plana (5) Minimal (6) H>0 Concave (7) (Elliptic o Umbilic) Valley (8) (Concave Paabolic) Saddle Valley (9) (Hypebolic) Figue 13. Shape types in coespondence with Table 1 Shapes at ightmost column ae hypebolic and fom top to bottom they have the popeties: κ 1 + κ 2 < 0, κ 1 + κ 2 = 0 and κ 1 + κ 2 > 0. The cente column show paabolic o flat-umbilic (plana) shapes whee one of the pincipal cuvatues is equal to zeo (κ 1 = 0 κ 2 = 0). The leftmost column show the elliptic shapes whee both pincipal cuvatues have the same sign. The uppe is convex elliptic (a peak: κ 1 < 0 κ 2 < 0) and the lowe one is concave elliptic (pit: κ 1 > 0 κ 2 > 0). 28

Because of noise and othe easons it is impossible to have an exact zeo value fo HK values. Thus zeo-thesholds ae used to decide if a value is zeo o not.

48 Because of noise and othe easons it is impossible to have an exact zeo value fo HK values. Thus zeo-thesholds ae used to decide if a value is zeo o not. Eveything below this theshold is consideed as zeo. ( κ + κ ) if ( κ + ) κ 2 2 > H H = o othewise zeo, κ1 κ 2 if κ1 κ 2 > K K = o othewise In ode to bette compehend the elation between the HK values and the pincipal cuvatues κ 1 and κ 2, the egions in Table 1 ae dawn on the (κ 1, κ 2 ) plane in the figue above: zeo (7) κ 2 H>0 K<0 H=0 K<0 H>0 K>0 H<0 K<0 H<0 K=0 H=0 K=0 H>0 K=0 H>0 K<0 H<0 K>0 H=0 K<0 H<0 K<0 κ 1 Figue 14. The HK classification on the (κ 1, κ 2 ) plane. Region colous (and the numbes) coespond to Table 1. The sepaating lines ae the zeo-thesholds fo H values and thei equations ae: (κ 1 + κ 2 ) / 2 = +H zeo and (κ 1 + κ 2 ) / 2 = -H zeo. The sepaating cuves ae the zeo-theshold fo K values with the equation: (κ 1 κ 2 ) > +K zeo and (κ 1 κ 2 ) > -K zeo. 29

49 s discussed in [Cantzle and Fishe 2001], thee ae two main ambiguities in HK epesentation. The plana egion is not symmetic and the paabolic egion gets naowe fo highe cuvatues. Most impotantly H and K values ae vey much dependent on thesholds. Shifting the thesholds diectly changes the egions on the (κ 1, κ 2 ) plane. Since fo a scaled o esampled vesion of a suface patch, κ 1 and κ 2 values will diffe. Thus HK values ae not scale o esolution invaiant. Howeve since they depend on κ1 and κ2 values (which ae oientation invaiant) they ae also oientation invaiant. 4.4 Shape Index (S) and Cuvedness (C) [Koendeink and Doon 1992] defines an altenative cuvatue epesentation using the pincipal cuvatues. This appoach defines two measues: the shape index (S) and the cuvedness (C). Shape index (S) defines the shape type and cuvedness (C) decides if the shape is locally plana o not. 2 κ + κ 2 S = actan π κ1 κ 2 1 κ 2 2 ( κ1 > 2 ), and κ 1 + κ 2 C = (8) 2 The shape index value of a point is independent of the scaling of that shape and it changes between [-1,+1]. Howeve C is not scale o esolution invaiant. Both S and C ae oientation invaiant. discussion on tansfom, scale and esolution independency of all cuvatue types ae given in the next subsection. [Koendeink and Doon 1992] uses S value in ode to classify a point. S values changes between [-1,+1] whee -1 defines cup shapes (convex elliptical) and + 1 defines cap shapes (concave elliptical). ll othe types of local shape coespond to eal numbe between [-1,+1] (Table 2). Table 2. Shape Index Classification. Colous ae consistent with the pevious figues Cup Valley Saddle Ridge Cap Tough Saddle Valley Saddle Ridge Dome -1-5/8-3/8 0 +3/5 +5/

50 Table 3. Shape Index and Cuvedness Classification. Colous ae consistent with the pevious figues. Convex (Elliptic) (1) S [+5/8,1] C>C zeo Convex (Paabolic) (2) Saddle Ridge (3) Plana (5) Hypebola (6) Concave (Elliptic) (7) Concave (Paabolic) (8) Saddle Valley (6) S [+3/8,+5/8] C>C zeo S [+3/16,+3/8] C>C zeo C<C zeo S [-3/16,+3/16] C>C zeo S [-1,-5/8] C>C zeo S [-5/8,-3/8] C>C zeo S [-3/16,+3/16] C>C zeo κ 2 C > C [ /16, 3 /16] S 3 + C > C S C > zeo + 5 S, C > zeo [ , + 3/8] C zeo C < C zeo 5 S 1, 8 C > C zeo [ + 3 8, + 5 ] S [ 5 8, 3 8] S 8 C zeo [ 3/16, + 3/ ] S 16 C > C > C zeo C zeo [ 3/ 8, 3 ] S 16 C > C zeo κ 1 Figue 15. The SC classification on the (κ 1, κ 2 ) plane. Region colous coespond to Table 3. The sepaating lines ae the constant S values: S = ±5/8, ±3/8 and ±3/16. The cicle aound the oigin is the zeo-theshold fo C values and its κ equation is: ( ) κ = C zeo. 31

51 [Koendeink and Doon 1992] defines constant shape index values in ode to define shape types. These values ae given in Table 3above. Howeve thei oiginal classification does not diffeentiate hypebolic egions into thee diffeent types (yellow-oange-ed egion, i.e. saddle valley, hypebola and saddle idge). Fo this eason we define anothe constant shape index value (3/16) fo this pupose in the table above. The Cuvedness (C) values ae used to undestand if the egion is plana o not. Fo plana egions C value is vey close to zeo (i.e. below the zeo theshold C zeo ). In ode to bette compehend the classification of S and C values, it is a good idea to obseve them on the (κ 1, κ 2 ) plane. The egions ae coloued (and numbeed) coespondent to Table 3 (and the pevious figues). 4.5 Tansfom Invaiance of Cuvatues The cuvatue values obtained fom a suface is independent of any 3D otation. In othe wods fo the same suface, when otated in 3D space the pincipal cuvatue values stay the same. This is a vey impotant popety since when ecognizing objects, local sufaces on that object may be captued in an abitay oientation and povided that the pincipal cuvatue values stay the same; the featues obtained fom pincipal cuvatue ae said to be oientation invaiant. Thus the value H, K, S and C ae independent of tanslation and otation in 3D space. Unfotunately when it comes to scaling, things get complicated. Since scaling changes the finite diffeence between each digital sample, the discete patial deivatives on the suface also changes fo a scaled vesion of the same suface. Given that the pincipal cuvatues ae calculated fom the Hessian, i.e. the second ode patial deivatives of a suface; κ 1 and κ 2 ae diectly dependent to scaling and vay popotionally when the suface is scaled in 3D space. In addition since the 3D scans ae discete sufaces, the sampling ate also diectly affects the pincipal cuvatue values. Obviously while pincipal cuvatue values ae affected, any othe type cuvatue will also be affected. Fo this eason H, K and C values ae not scale o esolution invaiant. Howeve S values, due to thei mathematical design, ae independent of scale and esolution. The effect of scaling and sampling (esolution) is pofoundly analysed in the next subsection. 32

4.5.1 Effect of Scale and Sampling on Cuvatue Scale/Resolution Ratio In this subsection the scale and esolution invaiance of the fou cuvatue types (H, K, S, C) ae pactically tested.

be intoduced within this subsection. Shape Index is theoetically (mathematically) poven and designed to be scale invaiant.

52 4.5.1 Effect of Scale and Sampling on Cuvatue Scale/Resolution Ratio In this subsection the scale and esolution invaiance of the fou cuvatue types (H, K, S, C) ae pactically tested. lthough thei scale/esolution independency o dependency is theoetically o mathematically poven, the empiical tests ae obseved fo a bette undestanding of the concept of scale/esolution atio, which will be intoduced within this subsection. Shape Index is theoetically (mathematically) poven and designed to be scale invaiant. In ode to test the scale invaiance of S pactically, the simplest test would be having diffeent scaled vesions of a digitized suface (a Monge Patch) and compaing thei S maps. Imagine we have two vesions of the suface, one of which is 10 times bigge than the othe in x, y and z coodinates (10 times scaled vesion). Howeve thei esolutions ae exactly the same: 100x100 (Figue 16). a) b) c) Figue 16. a) Two scaled vesion of the same shape. Bigge one (1) is 10 times lage than the othe. (Smalle one (2) is baely seen on the lowe left of the bigge one). b) Two S maps ae identical (in both S maps a gey scale level is mapped to a distinct eal value. So having same intensities on each SI map poves that they have the same eal S values) c) C (cuvedness) map intensities (i.e. the C values) ae diffeent. Fo the small shape, the cuvedness is baely detected. Fo H and K maps, the effect would be simila to C. It is seen that both shapes have identical S maps but diffeent C maps. ll gey level intensities ae calibated such that, a gey level indicates a distinct eal cuvatue value. Thus, if two maps ae of the same gey values, then the eal values coesponding to those maps ae exactly the same just like in Figue 16.b. In ode to see the esolution (sampling) invaiance of the shape index S, we make anothe test using thee diffeent sampled vesions of the same shape fom the pevious figue: one digitized with 25x25 esolution, othe with 50x50 and the oiginal suface with 100x

Note that the shapes have identical size, which means they occupy the same size in 3D space. a) b) c) Figue 17. a) 25x25 sampled suface. b) 50x50 sampled suface.

53 Note that the shapes have identical size, which means they occupy the same size in 3D space. a) b) c) Figue 17. a) 25x25 sampled suface. b) 50x50 sampled suface. c) On the left S map fo the 25x25 sampled suface is seen. In the middle S map fo 50x50 sampled suface is seen. On the ight we have the S map fo the oiginal 100x100 sampled suface. s the suface esolution gets lowe and lowe, the S map esolution also changes but the S values (the gey level intensities) do not get affected. In Figue 17.c we have the S map fo the thee sufaces. s the suface esolution gets lowe and lowe, the S map esolution also changes but the S values (the gey level intensities) do not get affected, which means shape index is independent of esolution even though the pincipal cuvatues values ae not. Needless to say that if the same test was caied out fo H, K and C maps, the values would have been affected and thus H, K and C ae not esolution invaiant. Howeve thee s a way to get the same pincipal cuvatue values, thus the same H, K and C values fo diffeent sampled o scaled vesions of the same shape. It can be mathematically poven that if the scale/esolution atio is held constant fo diffeent scaled and sampled vesions of the same shape, the pincipal cuvatue values also stay the same. Thus, H, K and C may also become scale and esolution invaiant. In ode to pove this idea pactically, we pepae ou final test on thee diffeent vesions of the same shape. In this test the scale and the esolution is modified coespondingly. When the scale is halved, 34

This test shows that if we keep the scale/esolution atio constant fo a suface patch, the pincipal cuvatue values becomes independent of scale and esolution and so the H, K and C values.

54 then the esolution is also exactly halved. So the scale/esolution atio stays constant fo each suface. a) b) Figue 18. a) The oiginal suface (1) with 100x100 esolution; Suface 2 with 50x50 esolution scaled by ½; Suface 3 with 25x25 esolution scaled by ¼; b) H maps fo each suface is depicted. Intensities (e.g. eal H values) do not change fo diffeent scaled H maps, since scale/esolution atio is constant. This test shows that if we keep the scale/esolution atio constant fo a suface patch, the pincipal cuvatue values becomes independent of scale and esolution and so the H, K and C values. Thus thesholds we use fo H, K and C in any ecognition system also become univesal if we keep the scale/esolution atio constant. Howeve this atio is affected by many things such as: the settings and type of the digitize (scanne) and even the distance of the object fom the digitize. This atio must be set to a constant value befoe any theshold is used. Fo this pupose in this thesis any 3D scan is e-sampled into a constant scale/sampling atio of 0.5mm/sample in each u and v diections. Futhemoe by settings the scale/esolution values constant and by making H, K, and C values independent of scale and esolution, the advantage of S as a natual scale invaiant values vanishes. Consequently a pope compaison of HK and SC spaces can be made since both spaces become independent of scale and esolution. The next section makes an intoductoy compaison of the two cuvatue spaces and intoduces a coupled vesion of the two. 35

55 4.5.2 Rule fo Thesholding s poven in the pevious subsection, a constant scale/esolution atio will make H, K and C thesholds univesal. Howeve these univesal theshold values depend on the fixed scale/esolution atio. In ou studies the fixed scale/esolution atio is 0.5mm/sample. The theshold values fo H, K and C ae 0.03, and 0.03 espectively. If a diffeent scale/esolution atio will be used; in ode to get the same classification esults, the theshold should be decided consistent with the fomulas below: Let new scale/sampling atio be atio ss ; H K zeo zeo = = Czeo = 0.03 ( 0. 5 atioss ) ( H ) 2 zeo (9) 4.6 Compaison of HK and SC Cuvatue Spaces Thee has been an ongoing debate on the advantages and disadvantages of using HK (Mean Gaussian) o SC (shape index cuvedness) cuvatues fo object ecognition applications. [Cantzle and Fishe 2001] make a compaison of HK and SC cuvatue desciptions in tems of classification, impact of thesholds and impact of noise levels. They conclude that SC appoach has some advantages at low thesholds, complex scenes and dealing with noise. Howeve they calculate the cuvatues at the lowest scale, e.g. the given esolution. Scale-spaces of the sufaces o the cuvatues ae not defined. nothe compaative study have been caied out in [Li and Hancock 2004] whee using cuvatue values obtained fom the shading in 2D images, HK and SC histogams ae ceated. The compaison esults show that SC histogams ae slightly moe successful in tems of classification. Yet again the tested esolution is the pixel esolution of the 2D image, and the effect of sampling is ignoed. When calculating the H, K and C values, the scale/esolution atio is highly effective. Howeve due to its scale invaiant natue shape index (S) values ae independent of the esolution o the scale. Thus it is no wonde that SC methods give bette esults against HK when the compaison is caied out at an uncontolled scale/esolution level. s explained in the pevious subsection, in ode to make H, K and C values scale invaiant the scale/esolution atio must be set to a constant value fo the whole database. In addition a scale space of the suface should be constucted so that all featues obtained by using H, K, S and C values will also cay the scale level infomation. Obtaining the absolute scales of these featues is a cucial ability fo 3D object epesentation and the 36

56 only way of achieving such infomation is constucting the scale-spaces of the sufaces and the scale-space of the cuvatue values. Thee have been diffeent attempts at constucting a scale-space of the suface and defining scale-invaiant featues howeve to ou knowledge, thee have been no study on compaison of HK and SC classification capabilities using scale-spaces of the suface in the database. In this thesis, ou main motivation and contibution is to make both theoetical (mathematical) and empiical compaisons of HK (Mean/Gaussian) and SC (Shape Index/Cuvedness) cuvatue desciptions in a scale-space of the tageted suface. By doing this we also aim at compaing the scale paamete obtained by using HK and SC, on a local suface element. Fo this pupose we calculate the scale spaces of the given sufaces and the cuvatues. The details of this pat ae given in the next chapte. Then we make empiical compaison on the classification esults of scale invaiant HK and SC fo the Stuttgat database. This compaison is given in the esults chapte. Befoe concluding this chapte, a mathematical compaison of HK and SC spaces is given below. Figue 19. The diffeence in egion desciption of HK and SC spaces ove the (κ 1, κ 2 ) plane is depicted. Each numbe designates a egion which is descibed in Table 4. The shape index thesholds ae ±5/8, ±3/8, ±3/16, as it is fo Figue 4. The othe thesholds ae selected such that K zeo = H 2 zeo and C zeo = H zeo Mathematical Compaison of HK and SC Cuvatue Spaces In this chapte mathematical analysis of HK and SC cuvatue spaces ae given ove the (κ 1, κ 2 ) plane. Eight fundamental types of egions ae indicated fo both of the methods. Since both HK and SC values ae obtained fom (κ 1, κ 2 ) values by using diffeent 37

57 mathematical opeatos, the eight fundamental egions have slight diffeences fo the two cuvatues spaces. ctually these slight diffeences play the decisive ole on thei success fo object classification. In Figue 19, the diffeences in egion definition ove (κ 1, κ 2 ) planes ae seen. Each egion is indicated by a numbe, which is listed in Table 4. Table 4. The definition of the egions seen in Figue 19 Reg. No. HK Definition SC Definition Mathematical Definition 1 Plane Convex Paabolic H< H zeo K< K zeo S [-5/8,-3/8] C>C zeo 2 Plane Convex Elliptic H< H zeo K< K zeo S [-1,-5/8] C>C zeo 3 Convex Paabolic Convex Elliptic H>+ H zeo K< K zeo S [-1,-5/8] C>C zeo 4 Convex Elliptic Convex Paabolic H>+ H zeo K>+ K zeo S [-5/8,-3/8] C>C zeo 5 Plane Saddle Valley H< H zeo K< K zeo S [-3/8,-3/16] C>C zeo 6 Saddle Valley Convex Paabolic H>+ H zeo K<- K zeo S [-5/8,-3/8] C>C zeo 7 Hypebolic Saddle Valley H< H zeo K<- K zeo S [-3/8,-3/16] C>C zeo 8 Hypebolic Convex Paabolic H< H zeo K<- K zeo S [-5/8,-3/8] C>C zeo 9 Plane Hypebolic H< H zeo K< K zeo S [-3/8,+3/8] C>C zeo 10 Plane Concave Paabolic H< H zeo K< K zeo S [+3/8,+5/8] C>C zeo 11 Plane Concave Elliptic H< H zeo K< K zeo S [+5/8,+1] C>C zeo 12 Concave Paabolic Concave Elliptic H<- H zeo K< K zeo S [+5/8,+1] C>C zeo 13 Concave Elliptic Concave Paabolic H<- H zeo K>+ K zeo S [+3/8,+5/8] C>C zeo 14 Plane Saddle Ridge H< H zeo K< K zeo S [+3/16,+3/8] C>C zeo 15 Saddle Ridge Concave Paabolic H<- H zeo K<- K zeo S [+3/8,+5/8] C>C zeo 16 Hypebolic Saddle Ridge H< H zeo K<- K zeo S [+3/16,+3/8] C>C zeo 17 Hypebolic Concave Paabolic H< H zeo K<- K zeo S [+3/8,+5/8] C>C zeo In ode to make the two spaces simila to each othe, the zeo-thesholds ae aanged such that of the theshold lines and cuves of HK and SC spaces ae tangent to each othe, when it is possible. Simila to [Cantzle and Fishe 2001] s appoach, H zeo and C zeo ae taken equal, so that the line H=H zeo is tangent to the cicle C=C zeo. In addition K zeo = H 2 zeo, so that the hypebola K=K zeo is tangent to the cicle C=C zeo. lthough, the egions defined by the cuvatue spaces become vey simila afte these aangements, thee ae still fundamental diffeences in egion definitions. 38

58 The plana egion definition fo SC space is a cicle cented on the (κ 1, κ 2 ) plane oigin and thus it is symmetic. Howeve the plana egion definition fo HK space is diffeent and not symmetic accoding to κ 1 = κ 2 line, although it is symmetic accoding to κ 1 =- κ 2 line. Fo this eason fo the plana egion defined by HK space, is defined much diffeently by the SC space (egions 1, 2, 5, 9, 10, 11, 14). Diffeent definitions of elliptic and paabolic egions fo the two cuvatue spaces also get divese when the pincipal cuvatue values get lage. Thee ae egions defined as paabolic fo one cuvatue space and elliptic fo the othe cuvatue space (egions 4, 6, 13, 15). The paabolic egion fo HK cuvatue space get naowe fo lage pincipal cuvatue values, while it gets wide fo the SC cuvatue space. Thee ae also othe egions (egions 3, 7, 8, 12, 16, 17) whee the definitions ae ambiguous HK&SC Coupled Cuvatue Space On the othe hand thee ae egions whee the egion definitions ae the same fo the both cuvatue spaces. Fo this eason in Figue 19, the egions fo which both HK and SC cuvatue spaces definitions ae the same ae seen. We name this combined space of HK and SC cuvatues, the HK&SC coupled cuvatue space. The mathematical definitions ae given in Table 5. This new cuvatue space do not cove all the egions ove the (κ 1, κ 2 ) plane. The egions which ae ambiguous, in othe wods which ae classified diffeent in HK and SC spaces, ae excluded fom the classification. So if a local suface patch is classified as one of the egions in Table 5, then they ae classified same in both HK and SC cuvatue spaces. It is vey difficult to decide which classification is best by just making a theoetical compaison among the cuvatue spaces. It is difficult to conclude on the best method without expeimentation. One egion may be classified as plana by HK cuvatue space, convex paabolic as SC cuvatue space, and may not be classified by the HK&SC coupled cuvatue space (egion 1 in Figue 20). Howeve the effect on classification success can only be undestood by expeimental study. In this thesis we compae the fist two cuvatue spaces fo object ecognition puposes. In addition we do this using the scale-space of the suface and cuvatues. The next chapte gives the details on constucting these scale-spaces. 39

Figue 20. The HK&SC coupled space classification in (κ1, κ2) plane. The colous and numbes ae coespondent with Table 5. Table 5. HK&SC coupled space Classification.

59 Figue 20. The HK&SC coupled space classification in (κ1, κ2) plane. The colous and numbes ae coespondent with Table 5. Table 5. HK&SC coupled space Classification. Colous ae consistent with the pevious figues Convex (Elliptic) (1) Convex (Paabolic) (2) Saddle Ridge (3) Plana (5) Hypebola (6) Concave (Elliptic) (7) Concave (Paabolic) (8) Saddle Valley (6) H<- H zeo K>+ K zeo S [+5/8,1] C>C zeo H<- H zeo K< K zeo S [+3/8,+5/8] C>C zeo H<- H zeo K<- K zeo S [+3/16,+3/8] C>C zeo H< H zeo K< K zeo C<C zeo H< H zeo K<- K zeo S [-3/16,+3/16] C>C zeo H>+ H zeo K>+ K zeo S [-1,-5/8] C>C zeo H>+ H zeo K< K zeo S [-5/8,-3/8] C>C zeo H>+ H zeo K<- K zeo S [-3/16,+3/16] C>C zeo 40

60 CHPTER 5 SCLE-SPCE OF CURVTURES The scale-space epesentation is the fomal theoy of handling an n-dimensional signal at diffeent scales, such that a one-paamete family of the smoothed vesions of that signal is constucted. In the next subsection, the use of scale space theoy in the field of compute vision is biefly summaized. 5.1 Scale-Space Theoy in Compute Vision Typically, an object contains stuctues at many diffeent scales. Thus a tue epesentation may only be constucted using the infomation obtained fom diffeent scales of the object. In this section the fomal definition of scale in compute vision is given. Fo futhe eading, eade should efe to Tony Lindebeg s enowned book on scale-space theoy in compute vision [Lindebeg 1994]. Compute vision deals with deiving meaningful o useful infomation fom 2D, 2.5D o 3D images. The digital equipments used to obtain such images povide this infomation as discete signals, which have cetain sampling ates (esolutions). Howeve the useful infomation within the image esides independent of this sampling ate. Scale-space theoy povides some diections to ovecome this dependency by constucting diffeent scales levels of the image. In this manne, scale levels ae constucted as diffeent stages of sampling of the oiginal image; so that an opeato will deal with diffeent useful featues esiding in diffeent scale levels. The most common method to obtain highe scales fo an image is pyamiding. pyamid epesentation of a signal is a set of successively smoothed and sub-sampled epesentations of the oiginal signal oganized in such a way that the numbe of pixels 41

deceases with a constant facto fom one laye to anothe [Lindebeg 1994]. The next section summaizes the most common pyamiding method: Gaussian pyamiding. 5.

61 deceases with a constant facto fom one laye to anothe [Lindebeg 1994]. The next section summaizes the most common pyamiding method: Gaussian pyamiding. 5.2 Gaussian Pyamiding Simila to othe low-pass pyamiding appoaches, Gaussian pyamiding is a type of multiscale signal epesentation, in which a signal o an image is subject to epeated Gaussian smoothing; so that a linea scale space of that signal is constucted. In [But and delson 1983] Gaussian pyamids of 2D images ae constucted by educing an image into its half esolution (Figue 21). Fo this pupose Reduce and Expand opeations ae defined such that at each Reduce opeation the data is smoothed and down sampled into half, and similaly at each Expand opeation the data is up-sampled into double esolution by Gaussian smoothing [But and delson 1983]. a) b) Figue 21. a) Gaussian pyamiding ove a 1D signal, (highe scale levels towads the bottom). b) Gaussian Pyamiding on a 2D image. In each step, the image is both downsampled and smoothed (highe scale levels towads the ight) [But and delson 1983]. The main motivation behind educing a signal is to obtain down-sampled vesion of that signal, so that the size of an applied opeato becomes independent of the esolution of that signal. Fo example, if a 3x3 kenel is convolved ove the image, only featues fitting inside 3x3 egion could be detected. But if the kenel is applied to all educed levels, lage featues can also detected using the same opeato. 42

62 5.2.1 Gaussian Pyamiding ove 3D Suface Scans In this thesis, Gaussian pyamiding is applied to 3D suface scans and 3D cuvatues in ode to obtain scale-space epesentations of 3D objects. Howeve it is not a staightfowad pocedue to apply these opeations to 3D, since thee is an ambiguity in the definitions of scale and esolution fo 3D scans. Befoe undestanding the scale and esolution (o the sampling ate) of a 3D scan, it is beneficial to examine thei meanings in 2D. Fo 2D images, the notion of scale is the same as esolution. In othe wods, the size of the signal is calculated by its sampling ate because a metic size within an image is not defined. Fo this eason, when a scale-space of a 2D signal is constucted, at each scale level the image is solely sub-sampled. On othe hand, the sampled points cay metic infomation fo 3D scans. In othe wods independent of the sampling ate of the 3D scan, the metic distances between points on the signal ae absolute. Fo this eason, two scans of the same object taken fom diffeent distances, will sample the object in diffeent esolutions; howeve the object suface within the 3D scan will cay the same absolute metic size. This little detail may seem insignificant; howeve it plays a pofound ole in constuction of scale-space fo 3D scans. Gaussian kenel and its deivatives ae singled out as the only possible smoothing kenels [Lindebeg 1994]. Pyamid appoach fo multi-esolution is usually chosen because in Gaussian pyamiding, image size deceases exponentially with the scale level and hence also the amount of computation equied to pocess the data. But it also has some dawbacks such as a coase quantization along the scale diection which makes it algoithmically had to elate image stuctues acoss scales [Lindebeg 1994]. Figue 22. Gaussian Pyamiding on 3D scans is seen. In each scale level, the sampling ate is halved, but the metic distances ae peseved. 43

63 s seen Figue 22, a Gaussian Pyamid of a 3D scan is depicted. t each scale level the sampling ate is halved as expected. But the metic distances within the image stay unaffected Invalid Points in 3D Suface Pyamids s it was discussed thooughly in Chapte 2, thee ae invalid points on 3D scans, which ae eithe backgound o simply invalid since they could not been acquied popely by the scanne. Fo most 3D data pocessing methods, these invalid points ae simply ignoed and calculations ae caied out using only the valid points. Howeve if a pyamid of a 3D ange image, which contains a goup of invalid points, is to be constucted; these invalid points should be handled popely. Figue 23. The scale levels of a depth image constucted by Gaussian pyamiding ae seen. The black egions ae the invalid points. ound the valid point boundaies the smoothing filte avoids blending with invalid points so that the shap boundaies ae peseved. When constucting a pyamid fo a 2D intensity image as in Figue 21.b, the Gaussian opeato is convolved thoughout the image with no exceptions. Howeve in ode to convolve a Gaussian filte ove a depth image, including invalid points; it is possible that one might expeience difficulties in bounday egions whee valid and invalid points ae next to each othe. In these occasions, the segments of the Gaussian filte, which coesponds to invalid points, should be omitted. This way the tue shape of the object 44

may be peseved in highe scales. In Figue 23, the scale space of a suface is constucted using Gaussian pyamiding. ound the valid point boundaies the smoothing filte avoids blending with invalid points.

64 may be peseved in highe scales. In Figue 23, the scale space of a suface is constucted using Gaussian pyamiding. ound the valid point boundaies the smoothing filte avoids blending with invalid points. This way, shap boundaies of an object can be peseved. simila appoach was caied out by [Lo and Siebeta 2009] in ode to avoid false key points which would be esulted as a consequence of applying a standad Gaussian mask ove the shap boundaies on the ange image. Instead they applied a Gaussian-tapeed segmentation mask in ode to isolate the aea of inteest while avoiding shap boundaies. 5.3 Scale Space of Cuvatues In this section, the scale levels of a depth signal ae analyzed. In the pevious figue, scale levels of a ange image wee given. In Figue 24, the same ange image is seen fom a diffeent angle. It is clealy seen that, as we go up the scale axis, the smalle elements vanish and only the lage elements eside. This is the basic motivation behind constucting a scale space. Figue 24. The smalle elements vanish as we go up the scale axis. The nomal diections of each scale level ae shown in Figue 25. In this figue, ed values coespond to x components, geen values coespond to y components and blue components coespond to z components of the suface nomals. It is also seen fom the nomal scale-space that smalle elements vanish in highe scales. The thin edge which is denoted by 1 in Figue 25, is visible in lowe scales but vanishes in highe scales, since this element is elative small. 45

Figue 25. The scale-space of the suface nomals. Red values coespond to x components, geen values coespond to y components and blue components coespond to z components of the suface nomals.

65 Figue 25. The scale-space of the suface nomals. Red values coespond to x components, geen values coespond to y components and blue components coespond to z components of the suface nomals. The benefit of constucting a scale space of a ange image is moe clealy seen, if the cuvatues obtained fom diffeent scale levels ae analyzed. The next subsections emphasize on constucting diffeent cuvatue scale-spaces Cuvatues in Scale-Space s explained in Chapte 3, the cuvatues ae calculated using suface gadients. Thus they give basic infomation on suface behaviou. In ode to calculate suface gadients analytically, explicit suface functions may be used. Howeve in eal wold applications, the 3D sufaces ae digitized into sampled points and thee s no global explicit function of the suface. Hence, the suface gadients ae calculated within a neighbouhood of sampled points. Fo this eason the calculated cuvatues ae local appoximations, which ae valid at a cetain scale. Theefoe the cuvatues ae calculated fo each scale level, and the cuvatue scale-spaces ae constucted. In this section diffeent cuvatue values obtained fom diffeent scale levels ae depicted, so that the geneal idea of cuvatue scale-space can be popely compehended. Fist, in Figue 26, the scale space of mean cuvatue values ae seen. The concave egions, which have positive mean cuvatue values, ae painted in ed, whee convex egions with 46

The concave egions which have positive mean cuvatue values, ae painted in ed, whee convex egions with negative mean cuvatue values ae

66 negative mean cuvatue values ae painted in blue. Fo both egions the magnitude of the cuvatue is demonstated by colou intensity. Figue 26. The scale-space of mean cuvatue values. The concave egions which have positive mean cuvatue values, ae painted in ed, whee convex egions with negative mean cuvatue values ae painted in blue. Figue 27. The scale-space of Gaussian cuvatue values. The paabolic egions with positive Gaussian cuvatue values ae painted in ed colou, whee the hypebolic egions with negative Gaussian cuvatue values ae painted in blue colou. 47

s seen fom the figue, the convex egion denoted by numbe 1 is designated as a peak in highe scales since this element is elative lage fo the given esolution of the 3D scan.

67 s seen fom the figue, the convex egion denoted by numbe 1 is designated as a peak in highe scales since this element is elative lage fo the given esolution of the 3D scan. Similaly if we examine the Gaussian cuvatue values, again the lage featues ae obseved in highe scales. In Figue 27, the Gaussian cuvatue values obtained fom diffeent scales of the suface ae depicted. The paabolic egions with positive Gaussian cuvatue values ae painted in ed, whee the hypebolic egions with negative Gaussian cuvatue values ae painted in blue. Fo both egions the magnitude of the cuvatue is demonstated by colou intensity. Moeove, when the shape index values obtained fom diffeent scale levels of the suface ae examined, the affect of scale-space is clealy seen. Since shape index value is capable of classifying the suface into the fundamental suface types (except planes), the scalespace of the shape index values clealy demonstate the scale coodinates of the suface featues.. In Figue 28, the shape index values obtained fom diffeent scales of the suface ae depicted. The shape index values and colou is mapped accoding to Table 2 in Chapte 3. Figue 28. The scale-space of shape index values. The shape index values and colou is mapped accoding to Table 2 in Chapte 3. s seen fom the figue, the convex egion denoted by numbe 1 is designated as a peak in highe scales and is not existent in lowe scales. 48

Finally if we examine the cuvedness values obtained fom diffeent scale levels of the suface, we clealy extact the plana egions in the scale-space.

68 Finally if we examine the cuvedness values obtained fom diffeent scale levels of the suface, we clealy extact the plana egions in the scale-space. Cuvedness values can be used to detect plana egions, since egions having sufficiently small cuvedness values ae defined as planes. In Figue 29, the gay level intensities designates the cuvedness values, whee zeo cuvedness value coesponds to black. Figue 29. The scale-space of cuvedness values. The gay level intensities designate the cuvedness values, whee zeo cuvedness value coesponds to black. The egion denoted by numbes 1, 2, 3 and 4 in diffeent scale levels coesponds to a plana egion on the suface and it is designated as plane in the fist fou scale levels in the cuvedness scale space. Howeve in the fifth cuvedness scale-space level is it not designated as plane, since in this scale, the egion is designated as pit. In this chapte the concept scale-space is defined fo 3D sufaces. Moeove, it is clealy shown that lage featues can only be extacted using the scale-space of 3D sufaces and cuvatues. In the next chapte, the details of featue extaction fom the scale-space of 3D sufaces and cuvatues ae pesented. 49

69 CHPTER 6 FETURE EXTRCTION ND OBJECT REPRESENTTION In this chapte, the poposed method to extact scale and oientation invaiant featues using scale-spaces of 3D sufaces and cuvatues is explained. Thee ae pevious methods which focus on extacting featues fom scale-space of diffeent types of signals [Lowe 2004]. Fo this eason, cetain aspects of the poposed method, such as scale-space localization ae given in compaison with the pevious scale-space techniques. 6.1 Constuction of the UVS Volume s it was stated in the pevious chapte, estimating suface featue labels at the given esolution of the oiginal suface esticts us to find the featues only at the lowest scale (the given esolution). In ode to detect othe cuvatues which have highe scales, a scale space epesentation must be intoduced. Fo this pupose, the Gaussian pyamid of the input suface may be geneated. It should be emembeed that Gaussian pyamiding does not change the absolute size of the model suface, but the esolution of the suface halves as we go up the scale space. In othe wods, the absolute metic infomation is kept. Subsequently the cuvatue values fo each scale ae calculated and H, K, S and C maps fo each pyamid level ae obtained. Towads the highe levels of a scale-space epesentation, the smalle suface elements vanish and bigge elements eside. Fo example in Figue 30, eye pits and chin peak vanish at the thid pyamid level but nose peak and nose saddle still eside. 50

fte this expansion, any label on the s th level of the pyamid will widen 2 s times in esolution. Thus all levels of the cuvatue pyamid will have the same esolution.

70 Figue 30. Thee levels fom the HK pyamid of a face model suface. Subsequently, the highe levels of each cuvatue pyamid is up-sampled to the oiginal size by using Expand (the invese of Reduce ) as explained in Chapte 4. fte this expansion, any label on the s th level of the pyamid will widen 2 s times in esolution. Thus all levels of the cuvatue pyamid will have the same esolution. n example of such an equalized cuvatue scale-space was intoduced in the pevious chapte. Putting each level of the expanded pyamid on top of each othe, we obtain a 3D volume which we call as UVS space whee u and v ae the suface dimensions and s is the scale dimension. ftewads suing diffeent cuvatue UVS volumes (all of which have the same esolution), the voxels ae classified as explained in Chapte 3. Fo instance, using H and K values, HK classification is caied out and fo each voxel, a featue label (o type) is obtained (Figue 31). Figue 31. The HK UVS volume afte mophological opeations pinted above the suface patch. The featues ae indicated with thei suface shapes as colous (blue fo peaks, cyan fo pits, ed fo saddle idges, etc.). 51

Fo each classification type (HK o SC) a diffeent classified UVS volume is obtained. We efe to them as the HK UVS volume o the SC UVS volume.

3D Gausssan Pyamiding Calculate H, K, S and C fo all scales 3D Range Scan 3D Suface Pyamid HK & SC Pyamids Expanding HK and SC spaces back

Within the UVS Volume, each voxel is classified accoding to its cuvatue value.

Unsupisingly some the cuvatue values ae much highe so that thei value above the theshold is highe than the othes.

The location and scale of a featue in the UVS volume ae estimated using these weights.

71 Fo each classification type (HK o SC) a diffeent classified UVS volume is obtained. We efe to them as the HK UVS volume o the SC UVS volume. The pocess of UVS volume constuction is depicted in Figue 32. 3D Gausssan Pyamiding Calculate H, K, S and C fo all scales 3D Range Scan 3D Suface Pyamid HK & SC Pyamids Expanding HK and SC spaces back to oiginal esolution Finding connected components UVS Volume HK & SC UVS Volumes Figue 32. UVS Volume Constuction. Within the UVS Volume, each voxel is classified accoding to its cuvatue value. s it was stated in Chapte 3, the voxels ae classified by thesholding. Unsupisingly some the cuvatue values ae much highe so that thei value above the theshold is highe than the othes. Fo this eason, a weight value is assigned to each voxel using the amount of cuvatue values that exceed the thesholds. The location and scale of a featue in the UVS volume ae estimated using these weights. Fo the HK UVS volume, the weights ae taken as the 2 nd nom of the absolute diffeences of the cuvatue values fom the applied theshold values: 1 2 ( H ) + ( ) ) 2 2 w (10) i, j = i, j H K i,j K In Figue 33, the weights ae demonstated within the HK UVS volume. The bigge the weight is the lighte the colou becomes. Fo example, the weights of the eye pit featue can be obseved fom its colou which changes fom light cyan at the smalle scales to dak cyan at the lage scales. 52

Figue 33. The weights ae demonstated in the HK UVS volume. The bigge the weight is the lighte the colou becomes Fo the SC UVS volume, the weights ae simply taken as the cuvedness (C) values.

72 Figue 33. The weights ae demonstated in the HK UVS volume. The bigge the weight is the lighte the colou becomes Fo the SC UVS volume, the weights ae simply taken as the cuvedness (C) values. These weights can vey-well be calculated in diffeent mannes. In the next subsections, we thooughly discuss the easons why we have selected equation, since it pofoundly affects scale-space localization. 6.2 Extaction of Featues In ode to extact scale invaiant featues (featues with thei scale infomation) fom a UVS Volume, seveal steps ae followed. Fist, each voxel of a UVS volume is checked fo its similaity within its 10 neighbous (8 at the same scale level, 2 up and down though scale). If all labels of the neighbous have the same suface shape label (pit, peak etc.), the cente voxel continues to cay its label. Othewise it becomes a blank voxel. fte voxel elabeling, a single opening (eosion and dilation espectively) opeation is applied whee sized kenels ae used. Then we extact connected labels in the UVS space. Finally, each connected component epesents a featue on the suface which will late become a node of a topology gaph. In Figue 34, it is clealy seen that a featue element, fo example the handle of the scewdive, has components at a numbe of successive scale layes (the oange hypebolic suface patch). 53

Figue 34. Labelled layes of UVS space constucted by HK values whee scale level is inceasing fom left to ight. The oiginal suface level is indicated by S=0. Labels ae given by colous.

73 Figue 34. Labelled layes of UVS space constucted by HK values whee scale level is inceasing fom left to ight. The oiginal suface level is indicated by S=0. Labels ae given by colous. (peak: blue, saddle idge: ed, convex cylinde: puple, pit: cyan, saddle valley: yellow, concave cylinde: geen, hypebolic: oange, plane: gay.) Inside the classified scale-space, each connected component consists of the same type of voxels and is consideed as a featue element on the suface. This paticula classification type such as pit, peak, and plane; is consideed as the type of that featue element (t i ). The total numbe of voxels inside the connected component epesents the featue s volume (v i ). When the cuvatue scale spaces ae calculated, the coesponding scale-space of suface coodinates and scale-space of suface nomals ae also constucted. Thus, fo each connected component, using the coesponding voxels within the scale space of suface coodinates; the cente of mass of the connected component is also calculated and called as the positional cente of the featue (x i ). Remembe that this cente of mass is calculated by aveaging the voxels using the coesponding weights calculated in (9). Similaly using the scale-space of the suface nomals, a weighted aveage of the nomal diection fo each connected component can be found (n i ). This vecto designates the aveage oientation of the featue in 3D wold. In addition, since a connected component may have diffeent numbe of elements in diffeent scales, a weighted aveage of the scale value is calculated fo each connected component as the actual scale of that featue (s i ). 54

lthough a connected component may have vaious numbes of elements in diffeent scale levels, it has the biggest numbe of elements on the scale which is closest to its actual scale.

74 lthough a connected component may have vaious numbes of elements in diffeent scale levels, it has the biggest numbe of elements on the scale which is closest to its actual scale. The aea ( i ) of the connected component at this laye is used to calculate the adius ( i ) (10) which also defines the size of that fundamental element. = (10) 4 π Finally, fo each featue element extacted fom the suface, the following attibutes ae obtained: the type (t i ), the volume (v i ), the positional cente of mass (x i ), the oientation vecto (n i ), the scale (s i ) and the size ( i ) (Figue 35). Figue 35. Ten lagest extacted featues ae shown as squaes whee the featue cente (xi) is given by the squae cente, the featue size (adius) (i) is given by the squae size and the featue type is given by its colou. 6.3 Scale-Space Localization of the Featues Since we use Gaussian smoothing to constuct the scale-space volume, the elation between the smoothing kenel size (σ) and the scale dimension (s) between two scale layes ( and B) of the UVS volume is as follows: σ σ B ( S S B = 2 ) (11) This elation is veified on synthetic Gaussian sufaces having diffeent scales as follows. Figue 36 shows UVS spaces fo thee synthetically geneated unit volume Gaussian 55

75 suface models. Each Gaussian suface has the half standad deviation of the one on its ight, i.e. σ = σ K, σ B = 2*σ K, σ C = 4*σ K. Figue 36. Thee unit volume Gaussian sufaces at diffeent scales (σ= σk, σb= 2*σK, σc= 4*σK) and thei espective UVS volumes. Then accoding to (3) the scale values fo these Gaussian sufaces must have the following elation: σ B σ ( S S ) B = 2 = 2 S S = 1 B (12) fte computing the weighted aveages we obseve that the estimated locations ae the centes of the synthetic Gaussians and scales satisfy the elation given in (4) whee S = 3.86, SB = 2.80 and SC = These values ae consistent with Equation (4). s stated by [Lindebeg 1994], the scale, at which a scale space blob assumes its maximum nomalized gey-level blob volume ove scales, is likely to be a elevant scale fo epesenting that blob. In ode to do so, the scale should be nomalized. Howeve, in 3D data whee we actually know the metic measuements, all data is nomalized if sampling is the same fo all and in that case nomally the scale laye fo which the blob extends the most can be taken as the scale of that blob. Based on this idea, we did not take the maximum extending laye but a weighted aveage ove the layes so that we could intepolate between the layes which helps us in eaching a bette esolution in scale dimension. The localization of the featues in the UVS volume and on the suface is cucial. This localization should be obust to noise and any type of tansfomations. Fo example in [Lowe 2004], the SIFT descipto is sought in a scale space of diffeent octaves, whee all 56

76 local maxima (o minima) is selected as a featue. Instead of using the weighted aveages, a simila appoach to [Lowe 2004] may be applied to ou method whee a single maximum fo each connected component in the UVS space is found. Howeve, as it is seen in Figue 37, this appoach will fail unde noise o in complex scenes. Imagine we have a simple suface with a single peak and its noisy vesion. When the cente of the peaky featue is sought ove the suface with noise, it is seen that a local maximum value inside a connected component may divet the cente fom its oiginal position although the suface is smoothed in highe scales (Figue 37.c). Ou method localizes the featues coectly (Figue 37.b) and is obust to noise. a) b) c) Figue 37. Localization of the peak featue (a) using both methods (top: weighted aveage, bottom: single maximum). b) Fo the ideal suface both methods localize the peak featue coectly. c) Fo the noisy suface, the featue is still coectly localized using weighted aveages (top) but the localization fails unde noise when the single maximum method is used (bottom). Ou appoach is tested on the thee face scans shown in Figue 38 with thei HK UVS volumes. Table 6 lists the estimated UVS coodinates of the featues fo all thee faces. s 57

can be seen in the table, scales of the featues ae nealy the same fo all faces. Locations of the featues fo fontal face and fo expessional face ae vey close to each othe as can be expected.

77 can be seen in the table, scales of the featues ae nealy the same fo all faces. Locations of the featues fo fontal face and fo expessional face ae vey close to each othe as can be expected. bsolute coodinates (x,y,z) ae computed fom the UV coodinates and they coincide with the coect locations of the featues on the facial suface. a. b. c. Figue 38. UVS volumes fo a. otated towads left, b. fontal and c. expessional (disgusted) faces. Table 6. Extacted featues fom the sufaces given in Figue 38 with thei locations in UVS space U V S U V S U V S L.Eye R.Eye Nose Saddle a. otated face b. fontal face c. expessional face 6.4 Tansfom Invaiant Topology Constuction In this section, using the featue elements extacted fom the UVS space, a global 3D object epesentation is constucted. scale and oientation invaiant epesentation is 58

78 poposed, whee the spatial topology of the object is given as a gaph stuctue which caies the elative infomation among the featues ove the 3D suface. The elativity is not only in tems of spatial infomation but in tems of oientation and scaling as well. s explained in the pevious subsection, fo each featue element the following attibutes ae obtained: the type (t i ), the volume (v i ), the positional cente of mass (x i ), the oientation vecto (n i ), the scale (s i ) and the size ( i ). If each featue is efeed as a node in a topology gaph whee the nodes cay the featue element s attibutes and the links between the nodes cay some elative infomation; a topological epesentation may be obtained. In ode to make this epesentation oientation and scale invaiant, the links between the nodes must cay elative o in othe wods nomalized infomation. n example of this type of elation could be the length between two nodes nomalized using a scale invaiant measue specific fo that topology. The elative 3D diection between two nodes might also be used. Futhemoe scale diffeence between the nodes would cay scale invaiant infomation. These elative link attibutes can be listed such as: s Nomalized distance fom Node to B: x x o x x 2. The distance between two nodes can be nomalized using the scale o the size of a base node () in the topology. Thus this elation stays invaiant unde scaling and oientation of the souce signal. Link Vectos o Link ngles: ( xb x ) xb x : The unit vecto fom a node (B) to a specific base node () in the topology will also emain invaiant unde scale. Howeve this link vecto will be vaiant unde oientation. In ode to make this infomation both scale and oientation invaiant, the angles between these unit vectos might be used. Fo an n-node topology thee would be n-1 unit vectos. Fo any two these unit vectos, an angle can be calculated. This angle will be invaiant of both scale and oientation. Fo n-1 numbe of unit vectos, we would obtain C(n-1,2) numbe of angles, which is also equal to (n-1) (n-2)/2. The angle can be calculated as: T α = α = cos 1 ( x x ) x x ( x x ) x x BC CB ([ B B ] [ C C ]) Nomal Vecto Diffeence: n B n : The diffeence vecto between the unit nomal vecto of node B (n B ) and unit nomal vecto of the specific base node (n ) will stay invaiant of oientation and scale. B B 59

79 Featue Scale Diffeence and Size Ratio: s B s & B : s explained in the pevious chapte, the scale diffeence between two nodes is invaiant to scaling. The atio of the size of a node (B) (which is stongly elated to scale of that featue) to the size a specific base node () will stay invaiant of scaling as well. Imagine we have a fou-node topology with nodes, B, C and D. ssume that node is defined as the base node of the topology. Then the following vecto will be scale and oientation invaiant: λ = i [ t, t, t, t,... B C D α x B BC x s ( o 2 ), α BD, α x, DC C x... x, D x,... (13)... n B n, n C n, n D n, B ( o s s ), B C, D ] This featue vecto λ i has 16 elements (as scalas o vectos). Fo an n-node topology, the numbe of elements in this vecto will be n+3 (n-1)+(n-1) (n-2)/2. This fou-node elation may also be shown on a topological chat as shown in Figue 38. Node is called the base node because the link elations ae calculated elative to this node. t B x B x, n n, B B t D x D x, n n, D D α BD t α BC α CD x C x, n n, C C t C Figue 38. Fou-node, scale and oientation invaiant featue vecto is shown in a topological chat. This vecto is oientation and scale invaiant since all elations ae defined elative to node. Howeve fo some applications, oientation and/o scale invaiance may not be desied. Fo example, if the metic size of the object to be ecognized is known, then scale invaiance is unnecessay. Similaly if the oientation of the object elative to the senso 60

80 61 device is fixed, then oientation invaiance capability of a ecognition system will be edundant. Fo this eason oientation and/o scale dependent vesions of this vecto may be defined. [ ] D C B D C B DC BD BC D C B D C B i t t t t,,,......,,,......,,...,...,,...,...,,, n n n n n n x x x x x x = α α α λ (14) The Equation 14 is an example of an oientation invaiant but not scale invaiant featue vecto epesenting a fou-node topology, since the link lengths and featue sizes ae not nomalized accoding to the base node. On the othe hand the featue vecto in Equation 15 is scale invaiant but not oientation invaiant because the featue nomal vectos and link vectos ae not nomalized accoding to node. [ ( ) ( ) ( ) ] D C B D C B D D C C B B D C B D C B i t t t t,,......,,,,......,,,......,,,...,...,,, n n n n x x x x x x x x x x x x x x x x x x = λ (15) In Figue 39, a simple demonstation of scale invaiance of the poposed method is demonstated. The smalle suface has half the scale of the bigge one. The adii of the peaks and pits extacted fom both of the sufaces ae given in the figue. It is clealy seen that the scale atio is kept constant. In addition the adii ae halved fo the smalle suface, which is exactly half of the bigge suface in all axes.

81 Figue 39. The featue extaction algoithm is scale invaiant since the extacted featues numeically depend on the scaling atio of a suface. The numbes ae the adii of the extacted featue elements. Using the equations (13), (14) o (15) a featue vecto fom n-nodes may be obtained. Howeve excessive numbe of nodes might be (actually ae usually) extacted fom a ange image. Theefoe, many n-node combinations may be obtained fom a ange image and these diffeent n-node featue vectos epesents diffeent pats of the 3D suface. The algoithm to obtain a complete set of featue vectos which epesents the whole 3D suface is given below: 1. Extact all featue elements fom the suface: Node 1 : (t 1 ), (v 1 ), (x 1 ), (n 1 ), (s 1 ), ( 1 ) Node 2 : (t 2 ), (v 2 ), (x 2 ), (n 2 ), (s 2 ), ( 2 ) Node 3 : (t 3 ), (v 3 ), (x 3 ), (n 3 ), (s 3 ), ( 3 ) Node 4 : 2. Select M nodes with M lagest adii ( i ) OR volume (v i ). 3. Ode these selected M lagest nodes accoding to the citeia below: a. Ode them accoding to thei type. b. If thee is moe than one occuence of a type, ode them accoding to thei adius ( i ) OR volume (v i ). 4. Fo the odeed M nodes do the following: 62

82 a. Select node goup size n: (2<n<M) b. Obtain all possible (k numbe of) combinations of node goups of n among the M odeed nodes: k=c(m,n) c. mong each n-node combination, keep the ode of the nodes accoding to Fo all k combinations, a. Select the fist featue as the base node. b. Extact the featue vectos λ i using (13), (14) o (15) c. Then stack these ow vectos in a featue matix Λ = [λ T λ T 2 λ T 3 λ T k ] T. This featue matix Λ will be a scale and/o oientation epesentation depending on the equation used to calculate the featue vectos (13), (14) o (15) 6.5 Impotance of Scale Space Seach In ode to show the significance of scale space seach, the featue extaction esults with and without scale-space seach ae depicted in this subsection. When the scale-space seach is omitted, the cuvatues ae found only at the given esolution. Even though scale/sampling atio is contolled (usually it is not contolled since this atio changes even when the distance between the 3D scanne and the object changes), the types and the sizes of the featues ae detected wong when only the given esolution is consideed. In Figue 40, the extacted featues of the oiginal scewdive object and its scaled vesion (by 0.8), both with and without scale-space seach, ae depicted. The featues extacted fom the oiginal and the scaled vesions of the scewdive object using only the given esolution ae usually mislabelled (Figue 40.a,b). Fo example, only some plana featues ae located on the handle of the object and the actual shape of the handle could not be extacted. Thus, most suface stuctues ae geneally labelled as planes when only the given esolution is consideed. Since the oiginal esolutions of 3D images ae vey high, even inside a peaky egion, a point may be consideed as a plane because the neighbouing points ae vey close to each othe. s a esult, many small plana egions ae detected in lage concave aeas. Thus, only when a scale space seach is pefomed, the eal types and sizes of the suface featues could be extacted (Figue 40.c,d). 63

Second of all, the featue sizes can only be coectly extacted if scale-space seach is used. Since Figue 40.d is 0.8 times esized vesion of Figue 40.c, the adius of a featue found in Figue 40.

83 Second of all, the featue sizes can only be coectly extacted if scale-space seach is used. Since Figue 40.d is 0.8 times esized vesion of Figue 40.c, the adius of a featue found in Figue 40.d is also 0.8 times smalle than the adius of the coesponding featue found in Figue 40.c (the numeical values ae given in Table 7). Howeve, this popety of obustness unde scaling cannot be obseved when only the given esolution is used fo featue detection. lthough most featues ae coespondent in Figue 40.a and Figue 40.b, thei adii ae faulty. The shape in Figue 40.b is smalle than the one in Figue 40.a by a atio of 0.8, howeve the adii of the featues numbeed as 1 and 2 in Figue 40.b ae lage than the adii of the coesponding featues in Figue 40.a (Table 7). Thus it is clealy seen that in ode to extact featues with thei size popeties, scale-space seach is a must. Figue 40. Ten lagest featues extacted using a) oiginal image without scale-space seach, b) scaled image (by 0.8) without scale-space seach, c) oiginal image with scalespace seach, d) scaled image (by 0.8) with scale-space seach. s seen in Figue 41, the scaled vesions ae ceated fo vaious objects and the featues ae extacted fom each one of them. Fo each object, the oiginal object is in the middle, while 1.2 times and 0.8 times scaled vesions lie on the left and the ight side of the 64

84 oiginal espectively. Fo each object, ten lagest extacted featues ae depicted as squaes on the objects. Some featues ae numbeed to help fo bette visualization and the adii of these numbeed featues ae given in Table 7. Fo all object tiples, the majoity of the extacted featues can be detected in all vesions. lso, the adius of each featue changes accoding to the size of the featue, i.e. the object (Table 7). Of couse, some othe featues appea o disappea among diffeent sizes as expected. n atificially ceated bumpy suface and its scaled vesion ae given in Figue 41.a and only a peak in the cente of the shape suface is extacted as a featue fo each vesion. The adii of these extacted featues ae given in the fist ow of Table 7. The adius values ae consistent with the sizes of the featues and the objects. Similaly, some featues ae also labelled on the oiginal and the scaled vesions of scewdive (Figue 41.b), pig (Figue 41.c) and duck (Figue 41.d) and coesponding adii values ae listed in Table 7 also. ll featue sizes ae detected in consistency with the amount of scaling. Table 7. The adius values (in mm) of the extacted featues in Figue 41 Object / Featue x 1.2 Oiginal x 0.8 Figue 5.a / Figue 5.b / 1 17,15 14, Figue 5.b / 2 32,71 26, Figue 5.b / 3 23,20 18,08 13,36 Figue 5.b / 4 18,17 15,83 12,41 Figue 5.c / 1 83,91 69,35 57,70 Figue 5.c / 2 54,37 46,33 38,87 Figue 5.d / 1 67,19 51,96 39,89 Figue 5.d / 2 113,86 93,36 77,47 Figue 5.d / 3 31,68 26,44 21,40 65

a) b) c) d) Figue 41. Ten lagest extacted featues fom fou diffeent object sufaces. Fo each object the oiginal (middle), 1.2 times scaled (left) and 0.8 times scaled (ight) vesions ae given.

85 a) b) c) d) Figue 41. Ten lagest extacted featues fom fou diffeent object sufaces. Fo each object the oiginal (middle), 1.2 times scaled (left) and 0.8 times scaled (ight) vesions ae given. The featue types and sizes ae given by colou and squae size espectively (peak: blue, saddle idge: ed, convex cylinde: puple, pit: cyan, saddle valley: yellow, concave cylinde: geen, hypebolic: oange, plane: gay). 6.6 Featue Extaction Robustness unde Noise In ode to see the affect of noise, the noisy vesions of the thee objects with diffeent levels of noise ae ceated (Figue 42). Subsequently, featues ae obtained fom the noisy vesions. It is easily seen that simila featues could geneally be extacted fom all vesions and thus noise does not affect the featue extaction pocedue dastically. Even when the thee s %60 noise ove the image, featues which ae found in the highe scales of the UVS space ae extacted clealy. Since Gaussian filtes ae used fo each scalespace level constuction, noise vanish in highe scales and thus featues lage enough to be found on highe scale levels ae extacted without eo. In Figue 43, the aveage of the inteest point cente localization eo is given fo diffeent noise levels. If a featue could be extacted fom both the noisy vesion and the oiginal (moe than %90 of featues wee coespondent in the oiginal and noisy vesions) the aveage of eo between the oiginal featue centes and the centes obtained fom the noisy vesion ae calculated and this aveage eo fo each noise level is depicted (Figue 43.a). In addition, the eo of featue size (adius) calculation fo each noisy vesion is depicted sepaately in Figue 43.b. s seen fom the figues, vey high noise levels do not affect inteest point localization and adius calculation noticeably. 66

86 a) b) c) d) Figue 42. Ten lagest extacted featues fom the noisy vesions of the objects: scewdive (top), duck (middle) and pig (bottom). a) %15 noise b) %30 noise c) %45 noise d) %60 noise. a) b) Figue 43. a) Inteest point cente localization eo in mm fo diffeent noise levels. b) Radius calculation eo in mm fo diffeent noise levels. 67

87 6.7 Local Region Desciptions s stated in the pevious subsections, the 3D suface may be globally o locally epesented using scale and oientation featue points. This is a quite obust and efficient way to ecognize categoies of objects. Fo example; two diffeent facial sufaces will both have two eye pits, one nose peak and a nose saddle. Thus, in ode to detect 3D facial sufaces, a simple fou-node topology including these facial featues could be tained. Howeve, so as to ecognize a cetain 3D facial suface among a database of facial sufaces; a global topology might not povide sufficient epesentation. Fo this eason many 3D ecognition appoaches attempt to make global o egional definitions of 3D sufaces. Histogam epesentation is a compact and efficient method to globally o patially define a 3D suface. pat o whole of a 3D suface o a 3D scene may be epesented with diffeent histogam methods such as, depth histogams, nomal histogams, cuvatue histogams o spin images. In this chapte, fou of the well-known histogam epesentations fo 3D sufaces is eviewed. The epesentations will late be used to define local egions aound the extacted featue points. These epesentations ae depth histogams, nomal histogams, shape index histogams and spin images. Thee seveal othe local desciptos in the liteatue, namely splash [Stein and Medioni 1992], cyclic images of adial contous (CIRCON) [Toe-Feeo et.al 2009], point signatues [Chua and Javis 1997], intinsic point signatues [Zhong 2009], 3D point fingepints [Sun et.al. 2001], 3D shape context (3DSC) [Fome et.al. 2004], spheical spin images [Ruiz-Coea 2001] etc. It is beyond the scope of this thesis to decide which one is the best. Thus the selected fou, which ae implemented and tested fo object ecognition, ae biefly explained in the succeeding subsections Depth Histogams This is simplest histogam that can be obtained fom a ange image. Since pixel intensities diectly coespond to depth values (distance fom the senso device), this is simply a gey level histogam of the ange image. This intensity distibution povides valuable cues about the shape of the 3D suface. If nomalized, these histogams stay invaiant unde scale changes. They ae also invaiant unde tanslation and image plane otation. Howeve they can be vey sensitive to the 68

88 peceived depth ange. If thee ae lage and abupt changes in the depth ange, e.g. due to occlusion effects, the whole histogam will be shifted and ecognition might no longe be guaantee. Fo this eason, intensity histogams can only be elied on fo the ecognition of sufaces with sufficient depth ange [Hetzel et. al. 2001]. [Hetzel et. al. 2001] uses depth histogams of the whole ange scans fo object ecognition. lthough this is a scale and image plane otation invaiant manne, it is vey sensitive to occlusions. In Figue 44, a ange image and its %25 occluded vesion ae depicted. The depth histogams obtained fom each image ae shown as well. The effect of occlusion on the depth histogams is clealy seen. a) b) c) Figue 44. a) The oiginal ange image taken fom [Stuttgat Database]. b) %25 occluded vesion of the ange image in (a). c) The depth histogam of the oiginal ange image (blue) and the %25 occluded image (ed). 69

89 In ode to make depth histogams moe obust to occlusion, local depth histogams can be calculated. Fo this pupose, a featue cente point and featue size should be povided fo each local histogam obtained fom the ange image. Using the scale and oientation invaiant featue points as histogam centes and thei adii as the egions of inteest; a local histogam epesentation which is obust to occlusions may be poposed. In this epesentation, fo each node extacted fom the UVS volume, a depth histogam of the egion within the adius of the featue cente is intoduced as a new featue element: Node i : (t i ), (v i ), (x i ), (n i ), (s i ), ( i ), [Hist Depth,i ] (16) In Figue 45.b, a local depth histogam obtained fom the featue in Figue 44.a is depicted. a) b) Figue 45. a) Nose is a featue element of peak type. Its adius is the adius of the blue cicle aound the featue. The local depth histogam of the egion within the adius of the featue cente is depicted Nomal Histogams s it was explained in chapte 2, suface nomals ae calculated fom the fist ode deivatives of the suface. unit nomal vecto caies two independent vaiables. Thee ae two main epesentations commonly used fo this pupose. The fist just discads the z- component of the nomal vecto and epesents it as a pai (x, y). This coesponds to a pojection of the oientation hemisphee on the unit cicle. The second possibility is a epesentation as a pai of angles (θ, φ) in sphee coodinates. The angles can be calculated as follows: 70

90 2 2 ( n + n ) n z y z φ = actan, θ = actan (17) n y nx [Hetzel et. al. 2001] uses nomal histogams of the whole ange scans fo object ecognition. They use a 2D histogam of (θ, φ) values. lthough this is a scale and image plane otation invaiant manne, it is vey sensitive to occlusions. In Figue 46, a ange image and its %25 occluded vesion ae depicted. The nomal histogams obtained fom each image ae shown as well. The effect of occlusion on the nomal histogams is clealy seen. a) b) c) Figue 46. a) The oiginal ange image taken fom [Stuttgat Database]. b) The 2D nomal histogam of the oiginal ange image. c) %25 occluded vesion of the ange image in (a). d) The 2D nomal histogam of the %25 occluded image. 71

91 In ode to make nomal histogams moe obust to occlusion, local nomal histogams can be calculated. Fo this pupose, simila to local depth histogams, a featue cente point and featue size should be povided fo each local histogam obtained fom the ange image. Using the scale and oientation invaiant featue points as histogam centes and thei adii as the egions of inteest; a local histogam epesentation which is obust to occlusions may be poposed. In this epesentation, fo each node extacted fom the UVS volume, a nomal histogam of the egion within the adius of the featue cente is intoduced as a new featue element: Node i : (t i ), (v i ), (x i ), (n i ), (s i ), ( i ), [Hist Nomal,i ] (18) In Figue 47.b, a local nomal histogam obtained fom the featue in Figue 46.a is depicted. a) b) Figue 47. a) The nose is a featue element of peak type. Its adius is the adius of the blue cicle aound the featue. The local nomal histogam of the egion within the adius of the featue cente is depicted Cuvatue Histogams Cuvatue histogams simply cay the cuvatue distibution infomation of the input 3D suface. This cuvatue value might be κ 1, κ 2, H, K, C, S o anothe value deived fom the fist o second deivatives of the suface. [Hetzel et. al. 2001] uses shape index (S) histogams of whole ange scans fo object ecognition puposes. They expeimentally pove that shape index histogams ae bette in epesenting ange images compaed to depth o nomal histogams [Hetzel et. al. 2001]. Futhemoe they ae advantageous compaed to othe cuvatue histogams since a single shape index values can classify a local egion (o pixel) into all fundamental types except planes. No othe cuvatue definition is that much disciminative in tems of classification. 72

Shape index histogams obtained fom each image ae shown as well. The damatic effect of occlusion on shape index histogams is clealy seen. a) b) c) Figue 48.

92 Shape index histogams ae scale and oientation vaiant by natue. This epesentation is even obust to some degee of off-plane otation. Howeve if the histogam belongs to a whole ange scan of the object, sensitiveness to occlusion is inheited. In Figue 48.a and c, a ange image and its %25 occluded vesion ae depicted. Shape index histogams obtained fom each image ae shown as well. The damatic effect of occlusion on shape index histogams is clealy seen. a) b) c) Figue 48. a) The oiginal ange image taken fom [Stuttgat Database]. b) %25 occluded vesion of the ange image in (a). c) The shape index histogam of the oiginal ange image (blue) and the shape index histogam of the %25 occluded image (ed). In a simila manne to the pevious appoach on depth and nomal histogams, shape index histogams may be alteed to become moe obust to occlusion by calculating local histogams. Fo this pupose, simila to local depth and nomal histogams, a featue cente point and featue size should be povided fo each local histogam obtained fom the ange image. Using the scale and oientation invaiant featue points as histogam 73

93 centes and thei adii as the egions of inteest; a local histogam epesentation which is obust to occlusions may be poposed. In this epesentation, fo each node extacted fom the UVS volume, a shape index histogam of the egion within the adius of the featue cente is intoduced as a new featue element: Node i : (t i ), (v i ), (x i ), (n i ), (s i ), ( i ), [Hist ShapeIndex,i ] (19) In Figue 49.b, a local shape index histogam obtained fom the featue in Figue 48.a is depicted. a) b) Figue 49. a) The nose is a featue element of peak type. Its adius is the adius of the blue cicle aound the featue. The local shape index histogam of the egion within the adius of the featue cente is depicted Spin Images Spin Image is anothe mapping of neaby points on a suface fom 3D to 2D [Johnson and Hebet 1999]. The spin-image fo point p is found by ecoding the distance of all neaby points x fom the suface nomal n (α - alpha) and the distance fom x to p along n (β - beta). Coesponding points fom diffeent views have simila spin-images, thus spin image is an oientation invaiant epesentation (Figue 50). 74

94 Figue 50. The spin images of point p fom diffeent views ae shown. The spin images ae simila [Johnson and Hebet 1999]. Since spin images ae found by using the distances fom the neighbouing points, thee is no notion of scale invaiance. On the othe hand thee is the notion of esolution invaiance which if affected by quantization noise if esolution diffeence is exteme. Howeve with mino modifications, it is possible to constuct esolution and scale invaiant spin images. If a patch of a suface is nomalized to unity aea befoehand, the calculated spin image will stay independent of scaling. Moe impotantly when the patch aea is nomalized, the distances α and β will be fixed to cetain limits, which is cucial in defining α max and β max. The 2D esolution of the spin image is defined accoding to α max and β max. Vey similaly, if the nomalized suface patch is e-sampled to fixed esolution, the spin image becomes uttely invaiant unde esolution changes. In addition the scale/sampling atio is contolled by this way, which is extemely cucial fo ou featue point extaction algoithm. Up to now, the theoy behind scale and oientation invaiant featue extaction using scalespace of cuvatues and defining local egions using diffeent histogam types ae intoduced. Beginning with the next chapte, the expeiments and compaisons caied out to evaluate the poposed methods ae given. 75

95 CHPTER 7 EXPERIMENTL WORK In this chapte, all the expeimental wok pefomed using the poposed featue extaction method and object epesentation method is pesented. The expeiments ae fomed of mainly thee pats. The fist pat focuses on the 3D face detection poblem. The second pat, which is consideably iche than the othes, concentates on object epesentation and ecognition. This pat includes cuvatue classification compaisons, object ecognition on vaious databases and object egistation. Finally the thid pat pesents an application on a diffeent discipline, whee suface cuvatues of DEMs ae used to detect landslide egions. In all expeiments, esults ae compaed with methods fom the liteatue. 7.1 Face Detection and Pose Estimation In ou fist expeiment, we make ou fist attempt to test the featue extaction algoithm poposed in this thesis. The featues ae extacted fom facial scans in ode to detect the location and the oientation (pose) of the face using the HK UVS classification. The featue elements used to categoize a face ae two inne eye pits, a nose peak and a nose saddle (nose bidge). These points ae chosen because they ae enough to estimate the pose of the face so that the oientation of the face can be detected. lso, these points ae easily detected by ou algoithm and ae the ones which ae not seveely affected by facial gestues. These fou elements ae late used to constuct gaph epesentation using the equation (15) fom the pevious chapte. In ode to test ou epesentation fo face localization, Bospous 3D face database [Bospous Database 2007] is used. Bospous 3D face database includes moe than 20 diffeent poses and facial expessions of 78 subjects and that is why it is vey suitable fo 76

96 testing the poposed method against vaious tansfomations. The neutal and fontal poses of the subjects ae used fo taining. 17 diffeent poses and expessions ae used fo testing. In ode to test ou method on facial pose estimation, 35 atificially otated vesions of five diffeent facial scans fom the FRGC v1.a database [FRGCv1a] ae ceated. The poposed method is compaed with fou diffeent techniques. Only the featue and face detection pats of the algoithms ae consideed fo the compaed fou algoithms. The compaed algoithms ae Colombo, Cusano and Schettini s 3D Face Detection Using Cuvatue nalysis [Colombo et. al. 2006], Lu and Jain s utomatic Featue Extaction fo Multiview 3D Face Recognition [Lu and Jain 2006], Lu and Jain s Multimodal Facial Featue Extaction fo utomatic 3D Face Recognition [Lu and Jain 2005] and Chang et al. s Multiple Nose Region Matching fo 3D Face Recognition unde Vaying Facial Expession [Chang et al 2006]. In the next section, we commence by defining the featue vecto used to epesent the facial topology Tansfom Invaiant Fou-Node Face Topology In ode to define a epesentation fo a facial suface, salient and meaningful featues should be designated. The significant featues on human face ae the nose, eyes, chin etc. Thus, any featue set globally defining a face must encompass some of them. Fo this pupose, we have chosen a fou node gaph fo ou facial topology epesentation, whee the nose peak (N), the left eye pit (E L ), the ight eye pit (E R ) and the nose saddle (S) fom the nodes. These featues ae chosen because they ae the most stable featues against facial expessions. We indicate the nose peak (N) as the base node (since it is the single peak inside the topology). ccodingly, using equation (20) fom the pevious chapte, a scale and oientation invaiant featue vecto is defined as: 77

97 78 [ ] N S N E N E N E N S N E N E NS E NS E NE E N N S N N E o N N E S E E N FCE s o s t t t t R L L R L R L R L R s N L R L, ), (......,,,......,,......,,,...,...,,, ) 2 ( = n n n n n n x x x x x x α α α λ (20) Fo the any facial featue set λ FCE, the types (t N,t EL, t ER, t S ) ae pedetemined as peak, pit, pit and saddle idge. Thus they ae edundant in the featue vecto. In addition fo a human face, the nomal oientation of most of the featues ae fontal. Fo this eason, not much infomation will be lost, if the nomal diffeences ae excluded fom the featue vecto. Consequently the featue vecto can be simplified into: [ ] N S N E N E S N E S N E E N E N N S N N E N N E FCE R L R L R L R L,,......,,......,,,... = α α α λ x x x x x x (21) Face Detection on Bospous Database [Bospous Database 2007] is 3D facial database including 21 diffeent poses of 78 diffeent subjects. In this section, the 3D facial detection expeiments on this database ae pesented Leaning the Fou-Node Topology In this study, the fou-node face topology given in (21), is modelled by a single Gaussian with a diagonal covaiance matix. In the taining phase, featue vectos extacted fom the fontal and neutal pose of the Bospous 3D face database ae used. Othe poses and expessions of the database ae used in testing. The complete wokflow of the taining phase is depicted in Figue 51.

98 Taiing Set (78 Subjects) Post Pocessing Manually Maked Fiducial Points ove the Taining Set UVS calculation and 3D featue extaction Mean and Vaiance Calc. Nomalized Taining Vecto (78x9) Taining Vecto (78x9) µ (1x9) σ 2 (1x9) Diffeence Calculation Mean Diffeence Vecto (1x9) (Table 3) - Gaussian Modelling Figue 51. The taining phase. In taining face, fist of all, data is pocessed in ode to fill the gaps and emove the noise. Then, the featues fom the 78 neutal and fontal faces ae obtained. mong these fundamental elements, one of the one-peak, one-saddle, two-pit combination coesponds to the tue combination of the nose, nose saddle and the two eyes. Using the tue combinations of the neutal facial scans, we extact 78 scale and oientation invaiant featue vectos using (21). These 78 featue vectos ae used to tain a Gaussian with diagonal covaiance matix whee the data is fist nomalized to zeo mean and unit vaiance. In the database, the facial components such as nose pit, inne eye cones wee maked manually also. These manual makings ae compaed with detected points duing testing in ode to qualitatively check the pefomance of the algoithms. (Since nose saddle was not maked in the database, we don t check detection pefomance fo nose saddle.) The facial components wee maked on 2D textue images of 3D scans because localizing them on 2D images ae easy fo humans. Howeve, they may not be the actual centes of the pits and peaks. Fo example, the nose tip was maked as the nose in the database. Howeve the cente of the nose peak does not occu at the nose tip but at 79

99 somewhee else on the nose suface. Ou system locates the actual centes of the suface peaks and pits. Since the diffeence between the maked point and the calculated cente has a chaacteistic, fo example the nose peak cente location computed by ou method and the maked nose tip diffe with a specific vecto distance (Figue 52), we simply calculated the mean of these diffeence vectos fom all taining facial scans (Table 8). Then, this mean is used as an additive constant to the esults obtained fom ou method duing testing. Figue 52. a) The maked nose tip. b) The calculated nose peak cente. c) The maked inne ight eye pit. d) The calculated inne ight eye pit cente. e) The maked inne left eye pit. f) The calculated inne left eye pit cente. Table 8. Mean of the diffeences between the maked and calculated featues fo the nose tips and inne eye pits in the taining set µ (mm) X-axis Y-axis Z-axis Nose Tip Left inne Pit Right inne Pit Testing the Fou-Node Topology The complete wokflow fo the testing phase is given in Figue 53. Fist post pocessing is applied. The noised is cleaed and the false holes ae filled. The scans ae e-sampled and the scale/sampling atio is fixed to 0.5mm/sample. Then HK UVS space is constucted 80

100 and featues ove the facial topology ae located fo each scan. mong these featues, fo all combinations of one peak, two pits and one saddle idge, fou-node topology vectos ae obtained as candidates o face. These vectos ae also nomalized by mean and vaiance values obtained duing taining. Next, the vecto with the highest pobability is selected as the best candidate. Finally, the taining mean and vaiance ae used to unnomalize the best candidate vecto in ode to achieve the actual location. lso, the diffeence vecto (Table Table 8), which is calculated in the taining phase, is added to the computed locations fo nose peaks and eye pits in ode to be able to make compaisons with the manual makings. The esulting x-y-z coodinates ae taken as the coodinates of the nose tip, inne eye pits and the nose saddle but we evaluate the esults only fo the nose tip and the inne eye pits. Test Set (1323 Subjects) Post Pocessing UVS calculation and 3D featue extaction µ (1x9), σ 2 (1x9) - Test Vecto (1323x9xn) n combinations Nomalized Test Vecto (1323x9xn) Gaussian Testing µ (1x9), σ 2 (1x9) + + Vecto with the highest pobability (1323x9) Mean Diffeence Vecto (1x9) Calculated x-y-z coodinates fo the nose tip and the inne eye pits (1323x9) Figue 53. The testing phase. 81

101 7.1.3 Expeimental Wok on 3D Face Detection In this section the expeimental esults on Bospous database ae demonstated. These expeiments consist of testing the model on 1323 scans which include poses diffeent fom the tain poses. Within the database, the ancho points like nose tip, eye pits etc. wee maked fo test scans as well. Using this as the gound tuth we have evaluated ou algoithm fo detection accuacy in tems of pecentage and ancho point localization eo in millimetes. In addition to these, we have implemented fou othe 3D face detection and facial featue point localization methods fom the liteatue. We have tested these methods with the same database and compaed ou success ates with theis. In this section we explain the test set. Then othe implemented methods ae summaized and compaative esults ae tabulated and visualized. Table 9. Tested poses and numbe of facial points included fo each pose. Type of the Pose Nose Tip Inne Left Eye Pit Inne Right Eye Pit Smile Disgust Mouth Open Eyebows Up With Glasses Hand on one Eye Hand on mouth Look Right 30º Look Right 45º Look Right 60º Look Right 75º Look Up 15º Look Up 30º Look Down 15º Look Down 30º Look Right & Up 45º Look Right & Down 45º

102 Test Set mong many diffeent poses in the Bospous database, some poses such as smile, disgust, mouth opened, eyebows up, eye glasses won, hand on one eye, and hand on mouth ae selected to be tested. In addition to these, otated scans such as look 30º ight, 45º ight, 60º ight, 75º ight, 15º up, 30º up, 15º down, 30º down, 45º ight and up and 45º ight and down ae also included. These vaious poses include oientation changes (e.g. look 45º ight), shape defomations (e.g. disgust face) and occlusions (e.g. hand on one eye). Thus, the model is tested fo all these conditions and vaiations. The tested poses ae listed in Table 9. s seen fom the table, fo some poses (e.g. look ight 75 º) vey few numbe of test data (e.g. the inne ight eye pit) exist. The eason is that when the face is otated ight fo moe than 45º, the inne eye pit may disappea and become impossible to be maked Post-Pocessing on 3D Facial Scans Bospous 3D face database [Bospous Database 2007] is acquied by Inspeck Mega Captuo II [Inspeck-web]. Fotunately, duing the constuction of the data-base, almost all existing poblems such as holes and spikes wee cleaned manually fom the models using scanne s built in 3D scan cleaning softwae. Howeve, 3D models in the database still have consideable amount of holes aound dak hai, eye-bows, and eyelids. By holes we mean 3D points labelled as invalid. In addition, some spiky outlies ae also esident nea some extemely specula egions like eye pupils. In all models, backgound is also labelled as invalid by default. To begin with, the connected components of points which wee labelled invalid ae detected. mong these connected holes, we select the ones having aeas smalle than a theshold (100 in this case) assuming that the backgound will be the biggest connected component. Using the valid coodinates of the points neighbouing these selected holes we extapolate the values of these invalid points and thus fill the holes (Figue 54). s seen in Figue 54.b, this egula gid also includes some invalid points. If the scanne output is to be used to constuct a polygonal mesh, the points with invalid labels can be automatically discaded by methods like Delaunay tiangulation which is used to constuct polygonal data. Howeve in this study, it is cucial fo us that the egula gid fom of the 3D point cloud is peseved due to Gaussian Pyamiding Reduce and/o Expand opeations. Fo this eason, we have chosen not to discad these invalid points but to fill them by assigning the minimal valid z coodinate value to the invalid z coodinate. Then 83

again fo invalid x and y coodinates, we linealy extapolate thei values using the valid x and y coodinates that eside on the same ow and column espectively.

103 again fo invalid x and y coodinates, we linealy extapolate thei values using the valid x and y coodinates that eside on the same ow and column espectively. s a esult, we obtain a egula gid of valid 3D points as seen in Figue 12.c. fte filling holes, the 3D suface is needed to be smoothed using a Gaussian-like opeato because eliable cuvatue estimation on 3D data is vey sensitive to quantization and senso noise [Flynn and Jain 1989]. Theefoe we have applied a 5-by-5 Gaussian opeato on the gid fo smoothing noise and spiky egions. a) b) c) Figue 54. a) Textue fo the 3D Scan. b) Black aeas denote the backgound. Red aeas denote the invalid points selected to be extapolated. Blue aeas denote the neighbouing valid points used fo extapolation. White aeas denote valid points. c) Cleaned copped and e-sampled point cloud. Finally we esample the smoothed 3D point cloud into an M-by-N egula gid of 3D points, such that the aveage scale/sampling atio will be 0.5 mm pe point. Since each face has diffeent metic size, each image is e-sampled into a diffeent esolution, which will satisfy aveage 0.5 mm scale/sampling atio. It should be emembeed that esampling the data does not change the metic distances. Since the 3D scanne output has absolute 3D coodinates, the elative distances maintain thei oiginal values even afte esampling. 84

104 Compaed Methods We evaluate ou esults in compaison with fou othe methods we have implemented. These fou methods ae: M1) Xiaoguang and Jain s Multimodal Facial Featue Extaction fo utomatic 3D Face Recognition [Lu and Jain 2005], M2) Xiaoguang and Jain s utomatic Featue Extaction fo Multiview 3D Face Recognition [Lu and Jain 2006], M3) Chang et al. s Multiple Nose Region Matching fo 3D Face Recognition unde Vaying Facial Expession [Chang et. al. 2006], and M4) Colombo, Cusano and Schettini s 3D Face Detection Using Cuvatue nalysis [Colombo et.al. 2005]. M1 assumes that the scan is fontal and each ow of the depth image is seached fo the maximum z value. Fo each column, the numbe of maximum z value positions is counted and the column with the maximum numbe is selected as the midline. Then, nose tip candidates ae assumed to be located on the midline and a vetical Z pofile analysis is done on this line. Since nose bidge pesents a contiguous incease in z values, nose tip is defined as the last point of one of the thee non-deceasing continuous seies ove the midline. fte detecting thee-candidate nose tips, a hoizontal z pofile analysis is done fo these thee points. Sum of the diffeences between candidate points and thei neighbous ae calculated and the point with maximum diffeence is selected as the nose tip. fte detecting the nose tip, possible locations of the two inne eye pit points ae calculated statistically. In seach egion, shape index and cone analysis is done. Needless to say, the method is not oientation invaiant. If the subject looks upwads duing scanning, fo example, the chin may be detected as the nose. lso, the method depends on lots of heuistics which may not be satisfied all the time. M2 attempts to find the nose tip as the fist step also. In ode to find the nose tip, 3D data is otated aound Y- axis (axis pependicula to the hoizontal plane) fom 90º to 90º with 2º intevals. Fo each otated vesion, the point with maximum z value is selected as the nose tip candidate. The best thee candidates ae selected to be analyzed by PC. Howeve, since we obtain high success in locating the coect nose position, PC is not applied but best candidate is selected by sum of diffeences between candidate points and thei neighbous as descibed in M1. fte finding the nose tip, possible locations of the two inne eye pit points ae calculated statistically similaly as explained fo M1. Howeve, diffeent fom M1, seach egions fo pit egions vay among diffeent poses. The method is patially oientation invaiant howeve it has the potential tendency to detect spiky egions as the nose and the method is still fagile unde otation of face aound X and Z axes. 85

105 M3 attempts to find the inne eye pits fist. They use H and K values. The nose tip is expected to be a peak and the inne eye points ae expected to be pit-like egions. Fist, small pit egions ae emoved. Then, pais of pit egions with small diffeences in both Y and Z diections ae selected as candidate eye pit egions. The nose tip is found next. Ove the line dawn pependicula to the midpoint of the line fomed by the two pit egions, the point with maximum z value is selected as the nose tip. Unsupisingly this method is not scale and esolution invaiant and vey bittle unde spiky noise. lso, the method is not obust to in plane otations. In all these methods, fist of all, the location fo a type of ancho point is estimated independently and then othe ancho points ae located based on the fist detected one. Thus, the spatial stuctue of ancho points is not consideed efficiently. In contast to this, M4 attempts to find possible ancho point tiangles (i.e. nose tip, inne left eye, inne ight eye). The method uses H and K values to find the ancho point candidates. nose tip is expected to be a peak and the inne eye points ae expected to be pit-like egions. fte finding all candidate points, impossible tiangles ae emoved in ode to decease computational complexity. In doing this, distances between ancho points ae thesholded by some values defined based on the taining data. 50 face tiangles ae manually labelled fo 50 taining faces and the aea inside the tiangles ae analyzed by PC in ode to lean a face model. Fo test images, the tiangula aea is extacted fo each candidate ancho point tiangle and analyzed by the same pinciple components. This method is simila to ou method in the sense that the featue combinations ae seached; howeve it is vaiant unde scale and esolution. lso, ou method estimates the face location based on only the featues extacted fom a scale space and thei spatial layout but M4 applies PC analysis to the pat of the face defined by the candidate ancho points. Since this is done fo each candidate tiangle it is vey time consuming Compaative Results Fo evaluation puposes, we have pepaed two tables including the esults fom ou method and the fou othe methods fom the liteatue. The fist table depicts the pecentage accuacies of the methods in locating the ancho points. If the detected point is in the 2 cm adius of its manually maked position, then the detection is consideed to be coect (Table 10). 2 cm is vey high but we intended to accept as many detection esults as possible in the fist step, because the localization accuacy is investigated late among these detections. 17 poses and 3 landmak points fo each pose (3x17=51 esults in total) 86

106 ae tabulated fo each method in the table. The best detection ates ae undelined and bolded. We see that ou method has the best ating fo 35 out of all 51 esults. M4 has the best ate fo 15 out of 51 esults, which ae mostly fo the detection of the nose tip. M1 and M2 each have the best esults in 2 cases. M3 has not given any best esult among all poses and landmaks. M4 and ou method ae highly obust against facial expessions (smile, disgust, mouth open, eyebows up) and pesent ove %97 success in ancho point detection. Fo nose detection M4 is slightly bette, 1.5 % moe than ous. But fo eye detection, ou method is much bette (by moe than 30%) than M4 fo vaious facial expessions. Methods M1, M2 and M3 ae not obust against facial expessions and show less pefomance than othes. gain M4 and ou method achieve well fo in plane otations up to 600. ll of the methods fail fo otations geate than 600. Fo otations aound Y-axis, M3 has nealy no coect detection, wheeas M1 and M2 have seveely deceasing success as the amount of otation inceases. Fo otations aound X axis, M1 is bette than M2 and M3. Fo complex otations aound X and Y axis, ou method pefoms the best whee othes show no pefomance at all. Fo all kinds and amounts of otations, ou method pefoms much bette fo eye detection than all of the othe methods. Fo occlusion scenaios, ou method and M4 pefom the best. We can also say that M4 is bette fo nose detection but ou algoithm is much bette fo eye detection. In the second table (Table 11), the detection accuacies in millimetes ae examined. Fo all five methods, only the successfully detected scans ae consideed and the eo is computed as the absolute distance between the maked points and the calculated points. The means and the standad deviations of the eo ae compaed in Table 6 and the lowest ones ae bolded and undelined. Since only the successfully detected landmaks ae consideed even though the detection ates wee not vey successful fo a method, the localization esults may seem successful. Thus, to evaluate a method s success bette we should conside the methods which have both the lowest eo mean in Table 6 and the best detection success in Table 5. These methods ae also indicated in Table 6 with white font colou ove dak gey backgound. Ou method has both the lowest eo mean and the best detection ate in 31 out of 51 cases whee method M4 has such a success only fo 5 of the cases. The othe methods could not achieve this success fo any pose. When we compae pecision of the algoithms, i.e. the exact locations of the ancho points, although M4 pefoms slightly (1 mm on the aveage) bette than ou algoithm fo nose localization, ou algoithm pefoms much bette (4 mm on the aveage) fo eye 87

107 localization. The eason fo this is that M4 aims to find the nose tip and the manual makings wee done fo the nose tip also. Howeve, ou method aims to find the cente of the nose peak which is diffeent fom the nose tip. lthough we tied to compensate the diffeence between the nose peak cente and the nose tip by adding a mean diffeence vecto obtained duing taining (Table 8) to ou esults, due to diffeent nose shapes this usually causes a little coection. Fo eye pits, the poblem is not as sevee as the nose case because ou algoithm aims pit centes and manual makings wee done fo pit centes also. Thus, ou algoithm s success becomes moe obvious fo eye pits. Now, we will compae the algoithms pose by pose. When we check the algoithms fo neutal and fontal poses, M1 and M2 have acceptable ates since they have some assumptions which ae valid only fo fontal neutal poses. M3 has the wost pefomance since an eo in detecting the pit egion misleads the detection of nose peak. M4 has the best ate fo nose detection but thee is a noticeable decease in the eye pit detection. This is caused because of the otation applied to each candidate tiangle since pits on both sides of the mouth may also be selected as eye pits. Ou method pefoms the best fo both nose and eye detection fo fontal poses. Fo mouth open pose, success ates of M1 and M3 incease and success ate of M2 deceases when compaed to fontal and neutal pose. M4 pefoms the best fo nose but the wost fo the eye pits and the eason is the same as explained above fo the neutal pose. Ou method pefoms the best fo both nose and eye detection. Fo disgust pose, M3 and M4 ae seveely affected especially in eye detection since eye pits ae seveely defomed in this pose. Ou algoithm pefoms the best among all fo especially eye detection. Fo eyebows up pose, ou method pefoms the best and M2 is the least affected one among the othes. Fo smile pose, success ates of all methods decease and M4 is the most affected one especially fo the eye pits. gain, ou method pefoms the best among all in the aveage. Fo yaw otations only ou method pesents a pefomance fo otations moe than 450 and all othes fail. Fo vaious yaw otations, M3 is the most seveely affected one and ou method pefoms the best especially fo eyes. Fo look upwads and downwads poses, all the methods except ous ae seveely affected. Fo look bottom ight and uppe ight poses, although ou method pefoms the best, all of the methods have deceased success ates. Fo hand on one eye pose, all of the methods ae affected. But M4 pefoms the best fo nose detection and ou algoithm pefoms the best fo eye detection. 88

108 Table 10. Detection pecentage of the nose and the inne eye pits fo vaious poses ae listed fo all methods. The fist column fom left (ous) in each main column is the poposed method. Then M1, M2, M3 and M4 ae given in the columns fom left to ight espectively. Best esults fo each pose ae undelined and bolded. % Nose Left Eye Right Eye Ous M1 M2 M3 M4 Ous M1 M2 M3 M4 Ous M1 M2 M3 M4 Mouth Open 99, ,8 80, ,3 87,9 75, ,2 84,8 75,8 Disgust 99,3 89,7 92,6 58, ,7 91,2 92,6 69,1 63,2 98,7 91,2 92,6 77,9 64,7 Eyebows Up ,7 92,8 68, ,0 79,7 92,8 63,8 72,5 100,0 81,2 92,8 72,5 71 Smile 97,1 82,6 78,3 65,2 98,6 99,3 82,6 78,3 72,5 62,3 99,3 85,5 79,7 78,3 68,1 Look Right 30º 99, ,4 4, ,3 81,4 81,4 54,3 64,3 99,3 75,7 80 8,6 68,6 Look Right 45º 94,3 48,6 72,9 7, ,3 65,7 72,9 41,4 72,9 94,3 55,2 67,2 7,5 71,6 Look Right 60º 82,9 7,4 54,4 0 60,9 86,5 10,1 55,1 24,6 63,8 83,5 9,3 39,5 2,3 44,2 Look Right 75º 28,3 0 32,2 0 2,9 34,5 0 42,6 14,7 39,7 60, Look Up 30º 96,8 78,3 43,5 39,1 97,1 95,9 76,8 44, ,1 95,9 76,8 44,9 75,4 76,8 Look Up 15º Look Down 15º Look Down 30º Look Right & Down 45º Look Right & Up 45º Hand on one Eye 99,7 95,7 82,9 61, ,6 88,6 82,9 71,4 68,6 99, , ,3 98,6 81,4 52,9 75, ,1 52,9 88,6 74, ,1 51,4 81, ,2 59,7 17,9 40,3 98,5 98,1 77,6 14,9 92,5 76,1 98,1 76,1 16,4 88,1 80,6 49,7 14,3 22,2 4,7 35,7 44,9 24,3 28,6 11,4 41,4 64,7 20,1 26,7 10,3 53,3 74,6 16,2 51,5 0 22,9 77, ,6 32,9 42,9 77,6 20,1 44,1 4,4 32,1 72,0 53,6 23,2 57,9 95,7 71,3 59,4 26,1 59,4 65,2 76,4 60, ,3 75 Hand on mouth 96,6 59,7 14,9 16, ,8 62,7 14,9 74,6 82,1 96,6 62,7 14,9 59,7 79,1 With Glasses 88,8 89,7 30, ,3 92,5 31,3 67,2 83,6 88, ,9 59,7 79,1 89

109 Table 11. Detection eos ae compaed fo all methods. The mean and the standad deviation of the eo (the absolute distance between the maked points and the calculated points) ae given. The values ae in millimetes. The mean is witten ove the standad deviation. µ (mm) σ (mm) Nose Left Eye Right Eye Ous M1 M2 M3 M4 Ous M1 M2 M3 M4 Ous M1 M2 M3 M4 Mouth Open 3,01 2,57 3,96 2,54 5,35 2,40 4,27 2,94 3,05 1,71 3,07 1,72 6,08 2,95 6,91 3,99 8,51 4,09 7,57 4,86 2,61 1,42 6,13 3,23 5,55 3,17 11,30 3,84 5,98 4,55 Disgust 4,50 3,20 6,09 3,51 6,20 3,01 7,38 4,88 4,34 2,67 3,16 1,76 5,94 4,49 5,69 4,08 9,41 5,00 7,60 5,15 3,11,154 5,56 3,72 5,12 2,97 11,16 4,29 6,71 4,24 Eyebo ws Up 3,05 1,94 3,80 2,80 5,29 2,92 4,51 2,87 3,21 2,16 2,79 1,50 6,36 3,10 6,38 3,17 9,35 4,54 7,01 4,42 2,72 1,72 6,17 3,02 5,91 2,96 11,36 4,07 7,40 5,16 Smile 4,02 2,60 5,65 3,85 5,09 2,47 6,58 4,25 3,90 2,12 3,12 1,78 6,21 3,92 6,37 3,86 8,74 4,41 6,60 3,99 3,19 1,68 5,84 3,00 5,52 2,98 10,36 3,93 6,88 4,84 Look Right 30º 4,93 2,77 7,39 3,97 4,61 2,99 9,58 1,05 4,65 2,88 2,77 1,65 7,51 3,95 7,66 3,89 14,51 3,48 7,91 4,22 3,65 2,53 5,84 3,94 5,25 2,93 13,58 4,70 7,69 5,52 Look Right 45º 7,75 2,49 10,93 4,87 5,66 3,71 15,10 3,85 7,78 3,86 3,40 2,17 7,85 4,04 6,86 3,27 12,61 4,38 8,69 4,62 5,54 4,18 7,68 4,13 7,91 4,92 13,34 4,22 7,64 5,34 Look Right 60º 9,90 2,97 10,44 3,81 8,45 3,85-9,81 3,98 3,94 1,72 6,94 5,15 6,71 3,29 12,93 5,84 9,85 5,95 7,47 5,21 11,19 5,07 7,78 3,62 19,20-7,04 4,95 Look Right 75º 12,48 3,14-13,73 4,46-13,7 6 7,69 4,21 2,23-8,23 5,19 11,42 6,63 9,28 5,31 2,89 0, Look Up 30º 4,27 2,20 3,48 2,71 6,06 2,93 4,53 2,77 3,44 2,41 3,65 1,67 8,75 4,97 5,57 2,72 9,29 3,99 6,55 4,07 3,82 1,94 8,49 4,14 5,14 3,27 11,24 4,16 7,12 4,69 Look Up 15º 3,32 1,80 3,48 3,00 4,74 2,24 4,84 3,21 3,07 2,12 2,99 1,67 6,30 2,92 5,28 2,87 9,33 4,18 7,80 4,60 3,01 1,71 6,27 3,52 5,35 3,22 11,21 3,43 6,71 4,44 Look Down 15º 4,13 2,42 6,17 3,90 5,72 3,00 5,46 3,24 4,10 2,42 2,98 1,52 6,19 3,80 7,03 4,17 7,29 4,02 6,29 4,42 2,97 1,43 6,03 4,10 5,95 3,33 9,20 3,96 5,34 3,84 Look Down 30º 4,58 3,10 7,46 5,32 5,83 2,59 6,04 3,75 4,61 2,48 3,42 1,52 4,67 3,71 4,64 3,92 6,87 3,40 4,74 3,24 3,32 1,31 5,42 3,47 5,47 3,70 7,51 3,50 4,47 3,23 Look Right & Down 45º 8,63 3,08 11,46 4,77 8,87 4,00 14,14 4,80 8,78 4,11 4,64 1,71 8,24 4,93 7,24 4,27 10,10 4,84 6,62 4,36 5,94 3,61 10,09 4,22 7,38 4,05 7,35 4,51 7,51 5,15 Look Right & Up 45º 11,59 3,61 13,53 4,23 9,22 4,41-11,3 5 4,28 4,13 2,27 9,99 4,36 7,37 4,44 12,47 4,79 9,32 5,26 7,58 4,84 11,7 6,57 8,84 6,69 11,37-5,47 3,18 Hand on one Eye 2,89 1,85 5,29 3,85 6,01 2,20 5,40 3,75 3,82 2,22 2,65 1,50 7,39 4,02 7,42 5,35 9,90 4,96 6,62 4,33 5,83 3,78 5,98 3,42 4,48 2,02 10,06 5,11 8,06 4,72 Hand on mouth 3,72 2,58 4,76 3,35 5,52 2,68 7,69 5,59 3,98 2,36 2,93 1,82 6,05 3,46 7,70 3,73 10,48 4,84 6,68 4,05 2,82 1,37 5,24 3,30 5,76 4,26 12,65 4,50 6,17 4,11 With Glasses 2,90 1,96 4,33 2,75 5,14 2,58 4,49 2,66 3,03 1,70 3,47 2,49 7,16 4,01 6,41 3,04 10,63 4,82 7,49 3,90 3,92 2,79 7,38 4,31 7,82 3,67 11,94 4,38 7,61 4,13 90

110 Compaison in tems of Complexity When an algoithm is implemented, computation load is the othe impotant citeia which should always be consideed besides its success. Computational load can be measued in tems of complexity. We investigate complexities of the five algoithms in tems of Big- O notation. Ou method, M1, M2 and M3 each has the same complexity of O(MN) whee M and N ae the sizes of the 3D data gid. M4 s complexity is O(MNmn2) whee m is the numbe of possible nose peak candidates and n is the numbe of possible eye pit candidates. s can easily concluded fom the values, M4 equies much moe computation than othe algoithms and ou algoithm equies the same amount of computation as M1, M2 and M3 although its success is much bette than those Poblematic Cases Even though the esults of ou method ae appaently bette than othe methods, we still can point out some poblematic cases fo ou method. Since the method poposed is scale and tansfom invaiant, fo fontal poses and poses otated less than 45º, success ates highe than 98% wee achieved. 100% wee achieved fo a majoity. Howeve when the face is otated ove 45º, nose tip and eye pits may become invisible. When an element is occluded because of otation and othe easons (like hand ove one eye), the method may fail to find the coect one-peak, two-eyes and one-saddle combination. Fo these ove otated faces, the nose usually loses its fundamental popety of becoming a peak and is not selected among the best combination. Similaly when one eye is occluded with a hand o glasses in font of it, the eye pit vanishes and coect combination is again lost. Fo these easons the success ates fo these types of poses ae elatively lowe than othe poses, but they ae still highe than the othe compaed methods. Such cases ae shown in Figue 55. In Figue 55.e. fo faces otated 45º ights and down, two esults ae depicted. Fo the uppe figue the face is successfully egisteed, howeve fo the lowe figue the method fails since no ight eye pit could be detected because of occlusion. Similaly in Figue 55.d. fo faces otated 75º ights, two esults ae depicted. gain fo the uppe figue, the eye pits, the nose peak and the nose saddle ae detected and the method is successful. Howeve fo the lowe figue inne ight eye pit is invisible and is not detected, thus causing the GMM to select a bad combination. Howeve fo fontal poses and poses otated less than 45º, whee success ates ove 98% wee achieved, thee ae vey ae occasions whee false combinations ae selected. Some examples of these ae occasions ae given in Figue 55.a. fo the pose with glasses. The 91

optical scannes ae vey sensitive to specula sufaces and lenses. The eye glasses may cause false spikes o holes ove the suface, which cannot be easily ovecome by post pocessing.

111 optical scannes ae vey sensitive to specula sufaces and lenses. The eye glasses may cause false spikes o holes ove the suface, which cannot be easily ovecome by post pocessing. Fo the lowe figue the glasses cause a false peak ove the ight eye and a false combination is selected. In Figue 55.b. whee the mouth open faces ae scanned, a false combination is selected fo the lowe figue. The eye pits ae found coectly but, somehow the uppe lip is selected instead of the nose tip. ctually this is the single false detection case fo this pose. Yet again fo Figue 55.c. the ight eye is occluded with the ight hand. Fo the uppe figue the inne ight eye pit is still visible and the method succeeds, howeve fo the lowe figue the pit is completely occluded, thus causing the method to fail. Figue 55. successful and a poblematic case fo the poses a) with glasses. b) Mouth open c) Hand on one eye d) Look ight 75º e) Look ight and down 45º D Facial Pose Estimation s anothe attempt to evaluate the quality and capability of the extacted featues, a pose estimation algoithm is poposed in this section. Using the 3D featues extacted fom the HK UVS volume; poses of otated faces ae estimated. So as to make the expeiments numeically compaable, vitually otated vesions of 3D fontal facial models fom the FRGC ve1.a ae used. 92

7.1.4.1 Constuction of tificially Rotated 3D Facial models In ode to make contolled tests on pose estimation vitually otated vesions of FRGC v1.

112 Constuction of tificially Rotated 3D Facial models In ode to make contolled tests on pose estimation vitually otated vesions of FRGC v1.a fontal facial scans ae ceated using a softwae constucted in ou laboatoy, METU-CVIS. The softwae is a 3D vitual scanne pogam whee, diffeent scenes using diffeent object fomats such as ange images o 3D object models, can be ceated and expoted in ange scan fomat. The softwae is ceated by Nesli Bozkut, as a pat of he M.Sc. thesis [Bozkut 2008]. Fo this pupose, 35 suface models ae atificially ceated by applying 7 diffeent space tansfomations to 5 oiginal (fontal) sufaces fom the FRGC v1.a database. These tansfomations ae otation aound Y-axis by +10º, +20º,-10º,-20º, otation aound X-axis by +15º,-15º and otation aound Z-axis by +45º. set of atificially ceated ange scans ae given in Figue 56. Figue 56. tificially ceated facial scans. Leftmost scan is the oiginal scan fom FRGC database. Othe scans ae ceated by applying the following tansfomations espectively fom left to ight: otation aound Y-axis by +10º,+20º,-10º,-20º, otation aound X-axis by +15º,-15º and otation aound Z-axis by +45º. The fou-node facial topology given in the pevious subsection is sought in the oiginal and atificially ceated models by using the featue vecto λ FCE given in (21). gain, all fou-node, peak-pit-pit-saddle combinations ae extacted and the best candidate (the one with the highest pobability distibution within the GMM) is selected as the actual facial fou-node topology. When the facial nodes (nose peak, eye pits and nose bidge) ae detected on the oiginal and the otated scans, the 3D locations of these nodes ae known. Using this data, the amount of otation of a otated 3D facial scan might be calculated with espect to the oiginal fontal 3D facial scan. So as to do this, tansfomation matix fom extacted nodes of the otated scan to the extacted nodes of the oiginal fontal scan must be calculated. In othe wods, this tansfomation matix gives the elations between 3D 93

113 efeence fames of the otated and oiginal 3D scans. If we define the otated efeence fame C R and oiginal efeence fame C O, the tansfomation matix can be calculated as: C R 1 [ T ] C [ T ] = C C = 4 x4 O 4x4 R O (22) Figue 57 shows the estimation esults. Range images, HK UVS volume depictions and the estimated angles fo the otated scans ae given. It is seen fom the figue that fo otation aound the Y-axis, the estimates ae bad since localization of the eye pits become weak as they get occluded behind the nose saddle. Howeve fo otation aound the X-axis bette estimates ae obtained since the eye pits and the nose peak do not get occluded. Similaly the estimates fo otation aound the Z-axis ae good as no occlusion occus. Oiginal (Fontal) Scans Rotated +10º by Y Estimated +8.84º by Y Estimated +8.43º by Y Rotated +15º by X Estimated º by Y Estimated +8.62º by Y Estimated º by X Estimated by X Rotated +45º by Z Estimated º by X Estimated º by X Estimated+46.14º by Z Estimated by Z Estimated º by Z Estimated º by Z Figue 57. Vitually ceated facial scans and thei estimated pose angles Conclusion on 3D Facial Detection and Pose Estimation In this chapte, the scale invaiant featues ae extacted fom facial ange images fo the pupose of face detection and pose estimation. Fou-node facial gaph topology is 94

114 constucted using a peak, two pits and a saddle idge featue element. Using this 3D topological element the locations of facial featues ae detected and the facial pose is found. The detection method is compaed with fou epesentative methods in the liteatue. These methods ae selected because some of them ae only applicable to faces wheeas some of them ae fo geneal object detection methods applied to faces. The methods ae tained using 78 neutal and fontal scans and tested using 1323 scans of 17 diffeent poses of vaious otation, expession and occlusion. The esults show that the poposed method is moe successful in tems of detection pecentage, localization accuacy and computational efficiency compaed to the othe methods. The main easons that the poposed method is moe poweful in detection and localization ae the scale and tansfom invaiance in detecting the ancho points and the same invaiance in constucting the spatial gaph stuctue. The method easily ovecomes false positive esults caused by spiky noises, otation and scale. Since the poposed method equies no pio compulsoy poses (like fontal pose o a side view), the detection ates ae high fo any kind of pose. s long as the nose-eyes-nose bidge quaduple is detected, the face is detected fo many poses, with nea 100% success. Fo fontal poses with diffeent expessions such as, mouth opened, smile, disgust, eyebows up and hand on mouth nea 100% success is achieved. Fo poses with otation less than 45º, the pecentage success ates ae still ove 97%. Fo occluded poses such as with glasses and hand on one eye, the esults ae acceptable and bette than the compaed methods. Fo ou poposed method, the detection accuacy in millimetes is also bette than the existing methods. few millimetes of eo mean and vaiance ae achieved fo otations less than 45º. Consideing that the database was manually maked, a few millimetes of mean eo can be egaded as absolute success because human making eo is also not negligible. The facial detection is bittle fo cases whee one of the ancho points is not visible because of otation o occlusion. If the eye pit is occluded with a hand o because of high degees of otation, its detection fails and the coect quaduple is not located. The pose estimation is caied out on atificially otated vesions of five FRGC1.a facial scans. The otations in x, y and z axes ae estimated with aound ±2º eo x, y and ±4º eo in z axes. The poposed method is extemely poweful fo pose estimation since it is 95

115 invaiant in any type of tansfomation including otation, scaling, e-sampling. The method is also obust to noise since the featues ae found within a scale-space whee exteme amounts of smoothing ae pefomed D Object Recognition In this chapte, we finally aive at the coe subject of this dissetation, 3D object ecognition. Using the poposed featue extaction method and 3D topology model, objects ae classified and/o categoized. We commence by defining the n-node topology deived fom the definitions in Chapte 5. Then the method used in this study to classify objects, namely geometic hashing, will be given in detail. Tests caied out in Stuttgat Database [Stuttgat Database], will be coveed and discussions and pefomance compaison to othe methods ae also given in this chapte Tansfom Invaiant n-node Topology The method to extact tansfom invaiant featues and constuct a tansfom invaiant topology model is thooughly coveed in Chapte 5. In this subsection we simply give the details of the topological model deived fom the poposed model. The geneal model fo an n-node topology can be fomulized as: λ = i [ t, t, t, t,..., t B C D N... x x N, x B x N,..., x M x N,... α NB, α NC, α ND,..., α LNM, (23)... n n N, n B n N,..., n M n N,...,,,..., B C N ] In this vecto thee ae n numbe of type values (fist ow), n-1 numbe of link lengths, C(n,2) numbe of angles, n-1 numbe of nomal diffeence vectos and n numbe of adii, totally (n 2 +8n-5)/2 numbe of values. 96

116 On the othe hand in its scale invaiant vesion, the link lengths and adii ae nomalized accoding to the adius of the base vecto. The base vecto is selected as the node having a cetain type (e.g. peak) with the lagest adius. This vecto is defined as: λ = i [ t, t, t, t,..., t B C D N x α x NB N N, α x, NC B, α x N ND N x,...,,..., α LNM M, x N N, (24)... n n N, n B n N,..., n M n N,... N, B N, C N,..., M N ] This vecto is used to epesent a ange image, and consequently to classify the object within a database. It is impossible to know which element of this vecto caies bette classification quality, unless pope expeiments ae completed. In addition the nodes, o in othe wod, the extacted featues could be obtained using the HK o SC cuvatue classification types. Thus, choosing the appopiate featue classification method is anothe impotant issue fo object ecognition. In the next subsections, all these issues will be handled with cetain test pocedues. Howeve, befoe that, the next subsection gives the details of the topology classification method, which basically decides which topology belongs to which object o which categoy of objects. Simila featue vectos to (23) and (24) can be constucted by using SIFT if all ange images ae endeed to gay scale images, whee the depth value designates the gay level intensity. By using the SIFT featue attibutes, namely the pixel location (x i :{u i,v i }), the scale (σ i ), the oientation (θ i ) and the descipto (d i ), a featue vecto simila to (24) can be ceated: In Equation (25), x i -x j /σ j designates the nomalized distance, α i-j-k designates the link angles, θ i -θ j designates the oientation diffeence and σ i /σ j designates the size atio and di designates the SIFT descipto. λ i = [ x α B BC B B B, α BD C σ, σ C d, d,..., d N,..., α MN θ θ, θ θ,..., θ θ; σ x xc x, σ σ ; N σ,..., σ x N x,..., σ N σ ; ; (25) ] 97

117 7.2.2 Leaning the n-node Topology Geometic Hashing The featue vecto used in this study can be consideed as a patial topology of a geneal topology, whee each pat contains N elements out of total W (W>N) elements on a ange image. This featue vecto esembles to a patial gaph belonging to a global gaph. Fo each ange image, U lagest featues (W>U>N) in tems of adius ae selected and all N- tuple combinations of these k featues (totally C(U,N) featue vectos) ae used to constuct the featue vectos defining that ange image (Table 12). Table 12. C(U,N) numbe of featues obtained fom k lagest featues, using N-node topology. Fo the sake of simplicity the last node is chosen as the base node. Featue α t, t B, tc, t D,..., t N, x x N, x B x N,..., x, α, α,..., α, n n, n n,..., n 1 NB NC ND LNM N B N M N B C N Featue α t, α, t C, t D, α, t,..., to, x B xo, x C xo,..., x N,..., α, n n, n n,..., n E M x N, n,,,,..., x O, n,,,..., 2 BOC COD DOE MON B O C O N O B C D O Featue C(U,N)... α PUS..., t, α S SUT, tt,, t,... n U S,... x S xu, xt xu, n, n n,...,, U T U S T U In this study, these featue vectos ae used in a geometic hashing method fo object ecognition pupose. The main eason behind using geometic hashing is that it allows patial matching of small topologies in a geneal topology so that the ecognition pocess becomes obust to noise, otation and even occlusion. In ou hashing method, indexing is done by the types of the n-tuples and each enty includes the featue vecto of the tiplet besides the code of the pose and the object. In othe wods, fo a ange image the C(U,N) long featue vecto is constucted. t the pe-pocessing stage of hashing, fo each ange image in the taining set of categoy of object (fo example ange images of bunny taken fom vaious angles), the featue vectos ae calculated. Thus we obtain Num_T i numbe of featue vectos fo the object i with Num_T i numbe of taining ange image in the database. This database constuction stage can be computed offline. Fo vaious ange image taining sets belonging to diffeent objects, this opeation is completed. Consequently at the ecognition stage, featues of the test model (i.e. the model to be 98

ecognized) ae extacted and then elated hash table indexes ae obtained. By compaing the hash table enties using a similaity measue between the featue vectos, matching featues ae found.

118 ecognized) ae extacted and then elated hash table indexes ae obtained. By compaing the hash table enties using a similaity measue between the featue vectos, matching featues ae found. Coesponding to the indexes in the taining sets, matched model s vote is incemented by one fo a paticula taining set. Finally, the database model which eceives the geatest numbe of votes is taken as the match of the test object. t 2, 2, n 2, 2 t 1, 1, n 1, α t 3, 3, n 3, a) b) c) Figue 58. Demonstation of geometic hashing: ll the topologies constucted fom a ange image ae compaed to the topology vectos in the database. The database model which eceives the geatest numbe of votes is taken as the match of the test object. Figue 58 demonstates the geometic hashing pocedue. Fo a ange image (bunny, Figue 58.a) the featue ae extacted. Lagest U (6 in this case) featues ae selected. Then combination of all N-tuples (3) ae found and the featue vecto is constucted (totally C(6,3) = 20 featue vectos). Then fo each ange image in all of the taining sets in the database, this featue vecto is matched. Fo each coect match the vote of that taining set is incemented and the object is ecognized by final voting. 99

119 Similaity Measue The similaity measue defined distance between two featue vectos. Fo two featues to be compaed, they must have the same types featue points. It is not logical to compae two pits and a saddle with two peaks and a saddle. Fo any goups of featue points with the same type combinations, the definition of a similaity measue is pope. Fo compaable featue vectos, if the distance between any coesponding featue element, namely the diffeence between the nomal diections o the diffeence between the angles of each tiple o the diffeence between two link length atios etc., is below a given theshold then the two vectos ae efeed as simila. The theshold values that indicate the similaity between two featue vectos ae found by making many expeiments on the data. The fine-tuned theshold values ae given in the next subsection HK-SC Cuvatue Space Classification Compaison and Theshold Fine Tuning Since both HK and SC spaces classify suface patches in to simila types, thei classification capabilities ae compaable. Fo this eason, thee is an ongoing debate on the advantages and disadvantages of using mean & Gaussian (HK) o shape index & cuvedness (SC) cuvatue spaces fo object ecognition applications. In [Cantzle and Fishe 2001], HK and SC cuvatue desciptions ae compaed in tems of classification, impact of thesholds and impact of noise levels and it is concluded that SC appoach has some advantages at low thesholds, in complex scenes and in dealing with noise. Howeve in that study the cuvatues ae calculated only at the lowest scale, i.e. the given esolution. Scale-spaces of the sufaces o the cuvatues ae not defined. nothe compaative study has been caied out in [Li and Hancock 2004] whee cuvatue values obtained fom the shading in 2D images ae used and HK and SC histogams ae ceated. The compaison esults show that SC histogams ae slightly moe successful in tems of classification. Yet again, the tested esolution is the pixel esolution of the 2D image and the effect of sampling is ignoed. When calculating H, K, C and S values, the scale/esolution atio is highly effective. Howeve due to its scale invaiant natue, shape index (S) values ae independent of the esolution o the scale. Thus it is no wonde that SC methods give bette esults than HK methods when the compaison is caied out at an uncontolled scale/esolution level. In 100

120 ode to make H, K and C values also scale invaiant, the scale/esolution atio of the scan must be set to a constant value fo the whole database. In addition to this, a scale space of the suface should be constucted so that the featues which also cay the scale level infomation can be obtained by using H, K, S and C values. Obtaining the absolute scales of these featues is cucial fo 3D object epesentation and the only way of achieving such infomation is constucting the scale-spaces of the sufaces in tems of cuvatue values. Thee have been diffeent attempts at constucting a scale-space of a suface and defining scale-invaiant featues [kagündüz and Ulusoy 2007]. Howeve, thee has been no study which uses a scale-space appoach fo the compaison of HK and SC fo thei classification capabilities. Fo this expeiment, ou main motivation and contibution is to make both mathematical and expeimental compaisons of HK and SC cuvatue desciptions in thei scale-spaces. Fo this pupose we calculate the scale spaces of the cuvatues of the given sufaces fo each method sepaately. Then we extact scale and oientation invaiant featues fom each space. By this we mean that featues ae extacted with thei scale infomation both in SC scale space and HK scale space. Finally, we use these featues in an object ecognition task and compae the pefomances of the methods so as to decide which cuvatue space is bette in tems of featue quality and object ecognition. In addition, in ode to optimize the paametes and examine the effect of paametes on each method, seveal tests ae un using diffeent paametes values. The mathematical compaison of HK and SC spaces ae given in Chapte 3. Futhemoe the poposed featue extaction and topology constuction method is given in Chapte 4 and Chapte 5. The ecognition method to test the featue vectos ae genealized in the pevious subsections. In this subsection we fist give the details of the ecognition method and then poceed with the expeiments. To begin with, Fo each ange image, ten lagest featues in tems of volume ae selected and all tiple combinations of these 10 featues (totally C(10,3)=120 featue vectos) ae used to constuct the featue vectos defining that ange image. s explained peviously, at the pe-pocessing stage of hashing, invaiant featues ae extacted and saved to the hash table fo each pose and each taining model ange image. Consequently at the ecognition stage, featues of the test model (i.e. the model to be ecognized) ae extacted and then elated hash table indexes ae obtained. By compaing the hash table enties using the similaity measue given befoe, coesponding to these indexes, matched model s vote is incemented by one. Finally, the database model which eceives the 101

121 geatest numbe of votes is taken as the match of the test object. In this study, Stuttgat database [Stuttgat Database] is used whee thee ae 256 scans fo 42 objects. s defined in the web page of the database, 66 poses of each object is used fo taining and the est of the poses ae used fo testing. In this study, two goups of expeiments ae pefomed. Fist goup of expeiments is caied out in ode to decide on the best theshold values and featues. When compaing two featue vectos, we fist check if the types of the thee featues ae identical. If so, we check if the absolute diffeence between the angles (α i,j,k ) is at most th_ang degees. Then similaly, we check if the absolute diffeence between the noms of distances ((d 2 i,j,+ d 2 j,k + d 2 i,j) ½ ) of the compaed featue vectos is at most th_dist mm. seies of expeiments ae held both on HK and SC methods in ode to decide on the values of these paametes fo each method. The second goup of tests is caied out in ode to compae HK and SC in object ecognition against the numbe of objects in the database. Fo the sake of simplicity and speed, the fist goup of tests is caied out using only 5 objects but the object ecognition tests ae un fo vaious numbe of objects Expeiments fo Thesholds It is an impotant and difficult task to decide on similaity thesholds. Thee ae two similaity measues defined fo ou featue vecto: the angle similaity and the distance similaity. The angle similaity is the absolute diffeence between the angle attibutes of the two featue vectos. The theshold value fo absolute angle diffeence is set to 3, 5, 10, 20 and the ecognition pefomances ae obseved sepaately fo HK and SC methods (Figue 59), while the distance theshold is fixed to 20 mm. The ecognition pefomance is defined as the pecentage of coectly ecognized poses among all test poses. The HK method is obust fo the angle thesholds. Both methods pefom the best when the angle theshold is 10. Similaly, the distance theshold is set to 5, 10, 20, 30 mm, while the angle theshold is fixed to 10 and the pefomances of both methods ae given in Figue 60. The HK method is obust fo diffeent distance thesholds. But both methods pefom the best when the distance theshold is 20 mm. 102

122 Figue 59. ngle theshold tests fo both methods. Figue 60. Length theshold tests fo both methods Expeiments fo Types Both methods classify egions into eight fundamental types, namely peak (1), convex cylinde (2), saddle idge (3), plane (5), hypebola (6), pit (7), concave cylinde (8) and saddle valley (9). Some of these types ae moe useful whee some types ae even uneliable. Diffeent goups of types ae included in the featue vecto and tested fo each method (Figue 61). The HK method is obust against diffeent type combinations but the best esults fo both ae obtained when only the types 1, 3, 5, 7 and 9 ae included in the featue vecto. Figue 61. Type tests fo both methods. 103

123 Expeiments fo Featue Numbes In ou method, a numbe of featues with the lagest volumes ae selected fo each ange image and tiple combinations among these featues ae found in ode to constuct the featue vectos. Taking fewe numbes of featues would speed up the pocess. Howeve, good quality featues may be omitted. On the othe hand, taking moe featues will slow down the pocess and some bad quality featues may be included. Thus we seach fo the optimum numbe of featues by testing diffeent numbes such as 6, 10, 14, 16 and 18 (Figue 62). The HK method is stable against diffeent numbe of featues but the SC method is seveely affected by the featue numbe. When ten lagest featues ae selected fom each ange image, both methods pefom the best. Figue 62. Featue numbe tests fo both methods Expeiments fo Database Size fte deciding on the theshold values and the featue vecto content, thee expeiments ae un whee database size is inceased fom 5 objects to 42 (all) objects. By this way the obustness of each method is tested unde vaying numbe of objects in the database (Figue 63). The pefomance of the SC method deceases linealy with the size of the database whee the pefomance of the HK method deceases logaithmically. Figue 63. Database size tests fo both methods. 104

124 When compaed with a scale-space appoach, the HK method outpefoms the SC method in all tests. We think that the main eason behind this is the definition of plana egions in the HK method. This definition embaces many ambiguous egions moe consistent than SC s. In addition, the SC method is less obust to featue type changes and database size. s the numbe of selected featues fo a ange image inceases, the quality of featues educes faste fo the SC method. ll esults confim that the HK method is moe successful D Object Recognition Tests In this subsection the object ecognition capability of the poposed 3D featue ae tested. Es mentioned befoe, multi-scale featues ae extacted fom sampled sufaces and then used to constuct a scale and oientation invaiant topological epesentation of object categoies. Featues ae located ove the suface with thei metic size iespective of the suface esolution. Stuttgat Range Image database [Stuttgat Database] is used fo testing the poposed featues in object categoy ecognition. Oiginally this database contains 42 objects with 258 diffeent vitual scans of each (totally scans). Howeve the database does not include scale vaying o occluded scans. In this study, fo the sake of testing the scale invaiance and occlusion obustness limits of the poposed method, ¼ scaled, %25 occluded and both ¼ scaled and %25 occluded vesions of these scans ae vitually ceated and tested. In ode to classify the object categoies, geometic hashing [Lamdan and Wolfson 1988] is used. Besides, the esults ae compaed to SIFT [Lowe 2004], fo which each oiginal, scaled o occluded ange image is endeed to a gay scale image. Fo the sake of obseving the benefits of using multi-scale featues, the 3D featues ae extacted with and without scale-space seach. Fist, the featues ae extacted using the conventional method, in which the suface is classified using the H and K cuvatues obtained fom the given esolution. This method will be efeed to as single scale featue extaction method (SSFE). Then, the featues ae extacted using the poposed multi-scale featue extaction method whee the scale-space of cuvatues is used. This method will be efeed to as multi-scale featue extaction method (MSFE). The featue vectos ae obtained as in (24) and hashing is applied as explained above fo both methods (SSFE and MSFE) sepaately. In ode to see the benefits of using featue point desciptos, spin images aound the featue centes ae also extacted and appended to the featue vecto as in (25) fo both 105

125 SSFE and MSFE sepaately. We will efe to these methods as SSFE+spin and MSFE+spin. s explained in the pevious section, the effect egions of the spin images ae designated by the adii of the featues. Finally the SIFT featues ae obtained fom the gay scale images which ae obtained by endeing the ange image sufaces. Then (23) is used to constuct the featue vectos by the SIFT featues. This method will be efeed to as SIFT. ll methods (i.e. SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT) ae tested fo object ecognition. Fou goups of expeiments ae caied out to test fo oientation invaiance, scale invaiance and obustness to occlusions. 8 objects fom the Stuttgat database (auto, bunny, chicken, ente, hub, eage, ocke, scewdive) ae used. Fo each model, 66 oiginal ange scans ae used fo taining. 192 oiginal, 192 scaled, 192 occluded and 192 both scaled and occluded ange scans ae used fo testing. Fo eight objects, a total of 6672 ange scans ae used in these expeiments Recognition unde Rotation The database oiginally includes 192 testing images scanned fom angles diffeent than the 66 taining images. Thus expeimenting on the oiginal taining and testing models povides esults on ecognition pefomance of otated models. Peviously, [Li and Guskov 2007] and [Hetzel et.al. 2001] caied out these expeiments on Stuttgat database and achieved ecognition ates ove %98 and %93 espectively. s seen fom Table 13 and Figue 64, SSFE and SSFE+spin achieve simila pefomance. MSFE and MSFE+spin ae vey close, howeve the pefomance of SIFT is damatically behind the othe methods Recognition unde Scaling In the second pat of the expeiments, scaled vesions of ange images ae used fo testing. Each test image is scale to its ¼ aea, by scaling each axis by ½. s seen fom Table 13 and Figue 65, MSFE and MSFE+spin pefom much bette than SSFE and SSFE+spin since they use scale space of cuvatues, i.e., they ae scale invaiant. lthough SIFT is scale invaiant as well, its pefomance is much behind, poving that SIFT is not desciptive fo gay scale endeed ange images. 106

126 Figue 64. Confusion matices of the methods: SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT, fom left to ight espectively. Oiginal testing images ae used. Figue 65. Confusion matices of the methods: SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT, fom left to ight espectively. Scaled vesions of the testing images ae used. 107

Expeiments SSFE SSFE+spin MSFE MSFE+spin SIFT unde

scaling 54,55% 59,50% 83,58% 84,39% 62,79% unde

127 Figue 66. Confusion matices of the methods: SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT, fom left to ight espectively. Occluded vesions of the testing images ae used. Figue 67. Confusion matices of the methods: SSFE, SSFE+spin, MSFE, MSFE+spin, SIFT, fom left to ight espectively. Scaled and occluded vesions of the testing images ae used. Table 13. veage Recognition Rates fo ll Methods in ll Expeiments SSFE SSFE+spin MSFE MSFE+spin SIFT unde otation 99,37% 99,76% 94,71% 96,02% 75,91% unde scaling 54,55% 59,50% 83,58% 84,39% 62,79% unde occlusion 98,06% 98,64% 79,07% 85,32% 62,31% unde scaling and occlusion 45,30% 49,85% 63,71% 65,55% 51,70% 108

128 Recognition unde Occlusion The thid pat of the expeiments analyzes the methods obustness to occlusion, whee %25 occluded vesion of each testing image is used. Fo each testing ange image, andomly selected ¼ th aea of the ange scan is occluded. Results (Table 13 and Figue 66) show that, SSFE and SSFE+spin pefoms temendously good, since the test set is not scale vaying and featue based methods ae obust to occlusions. The pefomance of MSFE+spin shows that fo occluded images, adding spin image descipto to the featue vecto affects the ecognition pefomance consideably. The pefomance of SIFT on the othe hand, is still behind othe methods Recognition unde both Scaling and Occlusion The final stage of the expeiments investigates the ecognition pefomance of the methods unde both scaling and occlusion. Each test image is fist scaled by ¼ th of its suface aea and then occluded by ¼ of its aea. Since testing unde both scaling and occlusion is quite challenging, all methods ae affected significantly. Still, having the advantage of being both scale invaiant and local featue based (thus obust to occlusions) MSFE and MSFE+spin pefoms bette than the othe methods. SSFE s and SSFE+spin s pefomances ae below %50 whee SIFT s pefomance is close to theis (Table 13 and Figue 67) D Object Registation Expeiments Suface egistation is an intemediate but cucial step within the compute vision systems wokflow. The goal of egistation is to find the Euclidian motion between a set of ange images of a given object taken fom diffeent positions in ode to epesent them all with espect to a efeence fame. Registation in geneal can be divided into two: coase egistation and fine egistation [Salvi et. al. 2006]. In coase egistation, the main goal is to compute an initial estimation of the gid motion between two clouds of 3D points using coespondences between both sufaces. In fine egistation, the goal is to obtain the most accuate solution as possible. Needless to say that the latte method usually uses the output of the fome one as an initial estimate so as to epesent all ange image points with espect to a common efeence system. Then it efines the tansfomation matix by minimizing the distance between the tempoal coespondences, known as closest points. 109

129 Fo a wide liteatue suvey on egistation of ange images eade may efe to [Salvi et. al. 2006]. In this thesis we pefom coase egistation using the poposed scale invaiant featues. The homogenous tansfomation, which includes 3D otation, tanslation and scaling between two ange images, is estimated. Howeve diffeent fom pevious appoaches not single featues ae matched as coespondences, instead the tiples that wee used to ecognize object categoies ae used Tiplet Coespondences In the pevious sections, n-node topologies of scale invaiant featues wee constucted. Using these n-node topologies, tansfom invaiant object ecognition was pefomed and analyzed. The topology set included n=3 numbe of nodes (featues), since expeiments have poved that it is empiically the optimum. Up to now, only the best match, in othe wods the quantity of the all matches, between a set of taining ange images and a testing image was taken into consideation, since the poblem was to ecognize object categoies. Howeve fo egistation the quality of the matched featues ae also impotant. Only a geneal consensus between all matched tiplets would give the tue tansfomation and pove that the matched tiplets ae actually tue featues that epesent the objects. Fo this pupose using the extacted tiplets in the pevious subsection, a coase egistation is caied out using andom sample consensus (RNSC) method Random Sample Consensus (RNSC) on Tiplets RNSC is an abbeviation fo "RNdom Smple Consensus". It is an iteative method to estimate paametes of a mathematical model fom a set of obseved data which contains outlies. It is a non-deteministic algoithm in the sense that it poduces a easonable esult only with a cetain pobability, with this pobability inceasing as moe iteations ae allowed. The algoithm was fist published by [Fischle and Bolles 1981]. basic assumption is that the data consists of "inlies", i.e., data whose distibution can be explained by some set of model paametes, and "outlies" which ae data that do not fit the model. In addition to this, the data can be subject to noise. The outlies can come, e.g., fom exteme values of the noise o fom eoneous measuements o incoect hypotheses about the intepetation of data. RNSC also assumes that, given a (usually small) set of 110

130 inlies, thee exists a pocedue which can estimate the paametes of a model that optimally explains o fits this data. The RNSC algoithm is often used in compute vision, e.g., to simultaneously solve the coespondence poblem and estimate the fundamental matix elated to a pai of steeo cameas. In this thesis, vey simila to solving the coespondence poblem in steeo images, the method is used to estimate the homogeneous tansfomation between two ange images. Instead of using candidate point matches, candidate tiplet matches ae used. s explained in the pevious section, the tiplets ae matched by some well-defined similaity measues. In ode to eliminate the false matches, and obtained the tansfomation only between the tue tiplet matches, RNSC is un. RNSC also equies a similaity measue to test fo the homogenous tansfomation between two tiplet matches. Howeve fo RNSC, diffeent fom the similaity measues used the find candidate matches, only the spatial infomation is used. In othe wods, at any iteation of RNSC, if thee is candidate tansfomation, the absolute Euclidian diffeence between the fist tiplet and the tansfomed second tiplet is used. The output of the RNSC is a homogenous tansfomation vecto, which defines the tansfomation between any two coesponding points on the egisteed ange images, such that: Px Px P = P = = P x sp Px R Py R P + st 3 x3 T3 x1 = (26) P z 0 sp S s 1x3 S p D Object Registation Results In this subsection some expeiments on egistation using the poposed featue ae pesented. The fist expeiments is the simplest case, whee thee is only in-plane otation between two atificial sufaces. Thee s 135º in-plane otation between the sufaces (Figue 68). The esult of RNSC is given below togethe with the ideal tansfomation matix. The esults pove a quite successful coase egistation. In Figue 69, the matched featues can also be seen. Colous, as usual, designate diffeent featue types. 111

RNSC esult: 0.6627 0.6896 0.0342 0 0.6479 0.6807 0.1096 0 0.1149 0.0394 0.9344 0 43.1966 156.433, 5.0095 1.

Two atificial sufaces with a Gaussian peak and a Gaussian pit, which ae 135º in-plane otated (otation aound

The same expeiment was also pefomed on ange images fom the Stuttgat database.

131 RNSC esult: , and the ideal esult: Figue 68. Two atificial sufaces with a Gaussian peak and a Gaussian pit, which ae 135º in-plane otated (otation aound z-axis) of each othe. Figue 69. Matched featues fom the egisteed atificial ange images. The same expeiment was also pefomed on ange images fom the Stuttgat database. Since these images ae not captued with contolled otation, we have chosen two images with otation on a single axis. Howeve the exact amount of otation is unknown (Figue 70). 112

Figue 70. Two ange images of the scewdive fom Stuttgat database. Thee is otation aound y-axis. The esult of RNSC is given below togethe with the ideal tansfomation matix.

8616 0.0032 0.1914 0 0.0265 0.9691 0.0068 0 0.2065 0.0105 0.8159 0 0.06336 0.4065, 8.8006 1.

132 Figue 70. Two ange images of the scewdive fom Stuttgat database. Thee is otation aound y-axis. The esult of RNSC is given below togethe with the ideal tansfomation matix. The esults on Stuttgat database also pove a quite successful coase egistation. In Figue 71, the matched featues can also be seen. Colous, as usual, designate diffeent featue types. RNSC esult: , and the ideal esult (αº of otation assumed): cos(~ α) 0 sin(~ α) sin(~ α) 0 cos(~ α) T z Figue 71. Matched featues fom the egisteed ange images fom Stuttgat Database. Since the extacted featues ae scale invaiant (and so as the tiplets), it is possible to egiste scaled vesions of ange images and calculated the scaling atio between to objects. Fo this pupose using the %25 scaled vesions of the Stuttgat database objects (vs. the oiginals), egistation is pefomed. The esult of RNSC is given below 113

togethe with the ideal tansfomation matix. The esults on scaled ange images demonstate successful scale invaiant coase egistation. In Figue 72, the matched featues can also be seen.

4955 and the ideal esult: 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0.5 Figue 72.

133 togethe with the ideal tansfomation matix. The esults on scaled ange images demonstate successful scale invaiant coase egistation. In Figue 72, the matched featues can also be seen. Colous, as usual, designate diffeent featue types. RNSC esult: , and the ideal esult: Figue 72. Matched featues fom the egisteed ange images fom Stuttgat Database. The sufaces ae scaled vesions of the same ange image Conclusions on 3D Object Recognition and Registation In ode to test the scale invaiance and occlusion obustness limits of the poposed method, ¼ scaled, %25 occluded and both ¼ scaled and %25 occluded vesions of these scans ae vitually ceated and tested. In ode to classify the object categoies, geometic hashing is used. So as to show the impotance of scale space seach of suface cuvatues, in addition to multi-scale featues, single scale featues ae also extacted and tested using the same classification method. Moeove, the esults ae also compaed with SIFT, fo which each oiginal, scaled o occluded ange image is endeed to a gay scale image. If thee is no scaling, single scale featue based methods (SSFE o SSFE+spin) ae quite poweful fo ecognition on ange images. Results pove that fo scale vaying databases, multi-scale featues pefom much bette. The poposed method (MSFE+spin) pefoms best fo expeiments 2 and 4, whee scaled ange images ae tested. Unde occlusion, 114

134 featue based ecognition is hadly affected. On the othe hand, SIFT shows poo pefomance. lthough known to be quite potent fo textued images, SIFT poves the opposite fo gay scale endeed ange images. Results also show that using suface desciptos, such as spin images, may positively affect the pefomance of the extacted featues. In expeiment thee, whee occluded ange images ae tested, the pefomance of MSFE+spin is consideably highe than MSFE. When the ange image is occluded, thee ae not enough quality featues to epesent the image. Howeve, descibing local sufaces with spin images when accuate effective egions ae defined, the pefomance upgades. This is not the case fo SSFE and SSFE+spin, because the effective size extacted with single-scale featues is not accuate. Thus, the suface desciption with spin images is faulty fo the single-scale case. 7.3 Delineation of Slope Units Based On Scale nd Resolution Invaiant 3D Cuvatue Extaction Landslides, which ae one of the majo geo-hazads theatening the settlements, occu on the slopes of the teain. In ode to pedict landslide occuences, peviously occuing landslides ae elated with seveal landslide susceptibility paametes and landslide susceptibility maps ae obtained. Pepaation of a landslide susceptibility map equies selection of appopiate slope mapping unit, which can be egula gids (pixels), slope units, unique condition aeas o some mophological featues. mong these mapping units, use of slope units has seveal advantages ove othe mapping units. s the landslides occu on the slopes gid (pixel)-based appoaches let the analyze to conside the aeas which ae not pone to landslides and hence decease the pefomance of susceptibility mapping algoithms. Using slope units as landslide susceptibility mapping unit ovecomes this poblem. Slope unit is a method used to subdivide the space into egions based on cetain hydological citeia. Physically the slope unit can be consideed as the left o ight side of a sub-basin of a wateshed. Theefoe, slope unit can be identified by the intesection of a idge line and a valley line. Usually patitioning of a egion into sub-basins o slope-units is vitually impossible and such a patitioning equies high computational effot. Hence in this study, a scale and esolution invaiant, fully automatic method using scale space of 3D cuvatue values is poposed fo ceating slope units to be used in landslide susceptibility mapping. ll pocess as explained in the subsections is automatic. The 115

135 pefomance of the poposed method is tested by compaing it with a conventional method of obtaining slope units. t this point we would like to thank D. zu Eene and D. H. Şebnem B. Düzgün fom Geodetic and Geogaphic Infomation Technologies Depatment at Middle East Technical Univesity fo thei coopeation in this study. They helped us build the conventional method which is compaed to ou poposed landslide detection technique Slope Unit Geneation by Conventional Method GIS-based hydologic analysis and modelling tool, c Hydo [Maidment 2002], is employed to daw the dividing lines fo identifying slope units. In this appoach, the outline of the wateshed polygon is obtained fom digital elevation model (DEM) by using the hydaulic model tool, whee the wateshed bounday is the wateshed divides o idge line. The low elevation aeas in the DEM which ae suounded by highe teain and disupting the flow of wate in the path ae filled in ode to fom wateshed boundaies. The flow diection is calculated by examining the eight neighbous of a cell accoding to the eight diection pou point method [Multi-Wateshed]. Then the associated flow accumulation gid is computed, which contains the accumulated numbe of cells upsteam, fo each cell in the input gid. In addition, a gid epesenting a steam netwok is ceated by queying the flow-accumulation fo cell values above a cetain theshold. This theshold is defined eithe as a numbe of cells o as a dainage aea in squae kilometes. In geneal, the ecommended size fo steam theshold definition is 1% of the maximum flow accumulation. Small theshold values esult in dense steam netwok with lage numbe of delineated catchments. Then the DEM and the flow accumulation ae used to detemine backwad flow diection to identify cells that dain though a given outlet. In this backwad tacing, the cells with homogeneous flow accumulation values ae classified as wateshed boundaies, which ae late conveted to polygons (Figue 73.a). fte obtaining wateshed boundaies, the next step is ovelaying the wateshed bounday by the dainage lines (Figue 73.b). Thee is a need to divide the wateshed polygon by the dainage line to obtain two slope units. Hence the dainage line is obtained by using evese of the DEM, which is obtained by tuning the high DEM values into low values, o vice vesa. This pocess makes the oiginal dainage line to be tansfomed into a wateshed divide [Xie et. l 2004]. The combination of the wateshed deduced fom DEM and wateshed deduced fom evese 116

DEM gives the slope units (Figue 73.d). s can be seen fom the pocedue, the conventional method fo obtaining slope units equies usage of GIS tools and opeations with seveal stages.

136 DEM gives the slope units (Figue 73.d). s can be seen fom the pocedue, the conventional method fo obtaining slope units equies usage of GIS tools and opeations with seveal stages. a) b) c) d) Figue 73. a) Wateshed boundaies detemined using the DEM. b) Dainage line ovelaid with wateshed of the egion c) Wateshed boundaies detemined using the evese DEM (shown with ed lines) ovelaid with wateshed boundaies which ae pesent on both the left and ight side of a sub-basin. d) Slope unit of egion obtained in 3D The Poposed Method In this study, a scale and esolution invaiant fully automatic method which is based on 3D cuvatue values is poposed fo slope unit geneation. The method consides the digital elevation model (DEM) of the teain as a egula gid of 3D points lying ove a u-by-v mesh (u being the latitude and v, the longitude), namely a depth map of the suface. On this 3D egula gid suface, fist dainage lines and idge lines ae segmented accoding to thei mean (H) cuvatue values and nomal diections (n) in the scale space of the suface gid. Then these lines ae connected in ode to define the polygons of slope units. ll the steps within the method ae automatic and sequential. The method is detailed below step by step Re-sampling Since the 3D coodinates ae absolute in DEMs, the size infomation is always peseved on the suface. Howeve, 3D cuvatue values computed ove the suface ae highly effected by the sampling (e.g. esolution) of the suface. Thus, a scale and esolution invaiant method is equied in ode to compute cuvatue values fo the points ove the suface. Fo this eason, fist the gid suface is e-sampled accoding to its scale (e.g. m metes / n points). In this study, the sample data has both latitude and longitude scale- 117

137 esolution atio of 20m/sample points. Thus if the poposed method is applied to a DEM suface with diffeent sampling ate (esolution), then its scale-esolution atio should be conveted to 20m/points Scale Space of Cuvatues Suface cuvatue is simply the change in suface gadients. Both Mount Eveest and a small hill may have the same cuvatue values at diffeent scales but thei cuvatue values at the same scale level ae diffeent. In ode to examine a given suface at diffeent scale levels, a scale pyamid is constucted. In this study But and delson s Gaussian Pyamid method is applied [But and delson 1983] in ode to extact a scale space of the suface. In this scale space, diffeent sized elements become visible in diffeent scales. Fo example, Mount Eveest will be visible in a much highe scale than the small hill which will be visible in a lowe one. The dainage lines and the idge lines ove DEMs ae simila sized stuctues and they eside at a cetain laye of the scale space. Smalle o bigge stuctues having simila shapes (that is shapes having simila cuvatue values but diffeent sizes) vanish in the scale level whee they exist, and vice vesa. Fo this eason, it was seached though the scale space and found out that the fouth level of Gaussian Pyamid (which was obtained by applying fou successive Reduce [But and delson 1983] and fou successive Expand [But and delson 1983] opeations to the suface) includes the dainage and idge lines. The oiginal suface patch and its 4th level scale suface ae shown in Figue 74.a and Figue 74.b Ridge and Dainage Line Extaction The poposed method uses value H and vecto n to detemine if an aea is a idge line o a dainage line o neithe of them. H and n at the fouth scale level fo all points on the educed suface ae calculated. Then candidate idge and dainage lines ae obtained by thesholding as follows: dain. age line if idge line if ( H > ) & ( cos( nz ) > 0.7) ( H < ) & ( cos( n ) > 0.7) none if othewise Hee n z is the z component of the suface nomal assuming that the z diection is always altitude (elevation) in the DEM. The selected egions ae depicted in Figue 74.c. In ode 118 z (27)

to obtain the actual idge and dainage lines, thinning is applied ove the candidate egions (Figue 74.c) and actual dainage and idge lines ae obtained (Figue 74.d).

In ode to eliminate them, length of each connected line is calculated and small ones (smalle than 50 points fo this case) ae deleted (Figue 74.e). a) b) c) d) e) f) g) h) Figue 74.

138 to obtain the actual idge and dainage lines, thinning is applied ove the candidate egions (Figue 74.c) and actual dainage and idge lines ae obtained (Figue 74.d). Howeve some outlie lines ae also esulted fom small egions. In ode to eliminate them, length of each connected line is calculated and small ones (smalle than 50 points fo this case) ae deleted (Figue 74.e). a) b) c) d) e) f) g) h) Figue 74. a) Oiginal Suface Patch. b) 4th Scale Level of the suface patch c) Candidate idge and dainage lines ae shown in black. d) Ridge and dainage lines afte thinning. e) Final idge and dainage lines afte the elimination of outlies. f) Detected line tips and cones ae shown in pink and an example inteest egion fo a tip is shown by a tiangle. g) Line tips connected to available cones and tips ae shown in geen. h) Segmented landslide egions ae shown in diffeent colous Obtaining Polygons of Slope Units In ode to find slope unit candidates, the closed egions ae extacted fom these lines by using a vey simple appoach. Fist the cones and tips on each line ae found by applying kenels ove the binay image. Then fo each tip, a tip diection is detemined. Finally, a tip is connected to the closest cone which esides inside a egion aound the tip diection vecto (i.e. ±30º degees aound the diection vecto). If a cone does not exist in this egion, then the tip is connected to the closest tip existing in this egion. If a tip does not exist in the egion, then no connection is made (Figue 74.f and Figue 74.g). 119

139 fte line tips ae connected to line cones and line tips, the slope unit egions suounded by dainage and idge lines ae detemined. Finally by finding the connected components and thei contous, candidate egions ae segmented and labelled as shown in Figue 74.h Results and Compaison The conventional and poposed methods ae ovelaid at a small test aea which is appoximately 25 km 2 to epesent the diffeent and matching egions of geneated slope units. Figue 3 pesents the matching and diffeent egions. The matching egions occupy 1.2 km 2 (Table 14). The poposed method occupies lage aea of detailed slope units as compaed to the conventional method (Figue 75). The 3D visualization of slope units fom both methods is illustated in Figue 76. The poposed method is tested by using MTLB with an Intel Centino Duo Compute in windows XP. It takes less than a minute to make all the calculations including e-sampling, obtaining the 4t h scale, cuvatue calculations, thesholding, mophological opeations and egion labeling fo a suface patch including 512x256 points of nealy 50km 2 aea. On the othe hand in the conventional method the pocess of DEM filling, flow diection, flow accumulation, steam netwok detemination and wateshed bounday detemination equies use of diffeent GIS tools which may take longe time depending on the esolution of the DEM and aea of the study egion. Consideing that futhe pefomance optimization on codes is plausible and the complexity of the algoithm is O(n 2 ) in Big-O notation, clealy the method is fa faste than the conventional methods. Figue 75. The ovelay of conventional and poposed method outputs. Legend: black line: conventional slope unit, pink line: poposed method slope unit, yellow colou: simila egions, blue colou: genealized units of conventional method o detailed units of 120

140 poposed method, geen colou: detailed units of conventional method o genealized units of poposed method. Figue 76. 3D view of slope units fom poposed and conventional methods Legend: black line: conventional slope unit, pink line: poposed method slope unit. Table 14. Simila and diffeent slope unit aeas. Slope Unit ea (km 2 ) Matching 1.2 Detailed in Conventional Method Detail but genealized in Poposed Method 9.55 Genealized in Conventional Method geneal but detailed in Poposed method Sum

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012 2011, Scienceline Publication www.science-line.com Jounal of Wold s Electical Engineeing and Technology J. Wold. Elect. Eng. Tech. 1(1): 12-16, 2012 JWEET An Efficient Algoithm fo Lip Segmentation in Colo