Title. Issue Date DOI. Doc URL. Type. File Information /doctoral.k

Ttle Study of a Vew-based 3-D Object Retreval Method fo Authors 張, 永生 Issue Date 2016-09-26 DOI 10.14943/doctoral.k12406 Doc URL http://hdl.handle.net/2115/63368 Type theses doctoral Fle Informaton Zhang_Yongsheng.pdf Instructons for use Hokkado Unversty Collecton of Scholarly and Aca

Study of a Vew-based 3-D Object Retreval Method for 3-D Object Reconstructon 3 次元再構成のためのビューベース 3-D オブジェクト復元に関する研究 Yongsheng Zhang Graduate School of Informaton Scence and Technology Hokkado Unversty Sapporo, Hokkado, Japan

Table of Contents Abstract... 3 Lst of Fgures... 5 1. Introducton... 7 1.1 Background... 7 1.2 Image-based 3-D ModelngIBM and Vew-based 3-D Object RetrevalVBOR... 9 1.3 Thess Overvew... 12 2. Related Work... 14 2.1 From Images to 3-D models... 14 2.1.1 A General IBM Method... 14 2.1.2 Super-pxel based IBM... 15 2.2 3-D Object Retreval... 24 2.2.1 Model-based Methods... 24 2.2.2 Vew-based Methods... 24 3. Mult-Scale Object Retreval va Learnng on Graph... 28 3.1 Shape Feature Extracton... 28 3.2 Mult-vew Object Dstance... 29 3.3 Mult-scale Object Graph Constructon... 30 3.4 Graph Learnng for Object Retreval... 32 3.5 Computatonal Cost... 34 4. Experment... 35 4.1 Evaluaton Datasets... 35 4.1.1 Natonal Tawan Unversty 3-D Model DatabaseNTU... 35 4.1.2 The Edgenösssche Technsche Hochschule Zürch DatabaseETH... 37 4.2 Compared Methods... 38 4.2.1 Elevaton DescrptorED... 38 1

4.2.2 Adaptve Vews ClusterngAVC... 40 4.2.3 Query Vew Selecton MethodQVS... 42 4.3 Evaluaton Crtera... 45 4.3.1 The Accuracy of The Nearest NeghborNN... 45 4.3.2 F-MeasureF... 45 4.3.3-Dscounted Cumulatve GanDCG... 46 4.3.4 Average Normalzed Modfed Retreval RankANMRR... 47 4.4 Expermental Results... 49 4.5 Analyss... 52 4.5.1 On the Mult-scale Hypergraph Connectons... 52 4.5.2 On Parameter... 54 Concluson and Future Work... 57 5.1 Concluson... 57 5.2 Future Work... 58 Bblography... 59 Acknowledgements... 65 Research Achevements... 66 2

Abstract Wth rapd advances n computer technques and the popularty of the camera, a large number of photographs can be obtaned. How to obtan a novel 3-D object or 3-D scene from 2-D mages s a challengng task. To create a 3-D object, two methods can be appled. One s to create 3-D object by usng some 3-D modelng methods. The other s to create 3-D object by combnng or revsng exstng desgns. Accordng to a research report, 20% of desgns should be started from the very begnnng reconstructon, and 80% of desgns can be obtaned by combnng or revsng exstng 3-D models. Many research about 3-D modelng has been nvestgated, but t s stll a hgh-cost and laborous task to model hghly realstc 3-D models. We have also proposed a super-pxel based method to get a 3-D model from mage sequence. Wth the development of 3-D technology and 3-D applcatons, people can easly get a lot of exstng 3-D model. We can use these exstng model nstead of a drect 3-D modelng from mages. So we focus on the vew-based 3-D object retreval. Object retreval has attracted much research attenton n recent years. Confrontng object retreval, how to estmate the relevance among objects s a challengng task. We focus on vew-based object retreval and propose a mult-scale object retreval algorthm va learnng on graph from multmodal data. In our work, shape features are extracted from each vew of objects. The relevance among objects s formulated n a hypergraph structure, where the dstance of dfferent vews n the feature space s employed to generate the connecton n the hypergraph. To acheve better representaton performance, we propose a mult-scale hypergraph structure to model object correlatons. The learnng on graph s conducted to estmate the optmal relevance among these objects, whch are used for object retreval. To evaluate the performance of the proposed method, 3

we conduct experments on the Natonal Tawan Unversty dataset and the ETH dataset. To evaluate the 3-D object retreval performance of our method, we employ the state-of-the-art methods of ED, AVC and QVS for comparson. In order to measure the 3-D object retreval performance, the crtera of NN, F, DCG and ANMRR are employed to compare dfferent methods n our experments. Expermental results and comparsons wth the state-of-the-art methods demonstrate the effectveness of the proposed method. 4

Lst of Fgures Fgure 1.1 The structure of 3-D object rebuld... 9 Fgure 1.2 Example vews of 3-D objects... 11 Fgure 1.3 The framework of the proposed method... 12 Fgure 2.1 The baselne of the mage-based 3-D model reconstructon algorthms... 15 Fgure 2.2 Outlne of the super-pxel based 3-D modelng approach... 16 Fgure 2.3 An example of resultng super-pxels... 18 Fgure 2.4 Patch defnton... 19 Fgure 2.5 The procedure of back projecton, durng ths procedure3-d surface patch nodes are acqured... 20 Fgure 2.6 Patches expanded on curved surface. Normal vector correct s necessary.... 21 Fgure 2.7 The purpose of normal vector correcton s to fnd the angles a and b, t s a nonlnear mnmzaton problem.... 22 Fgure 2.8 Sample mages from the nput dataset... 22 Fgure 2.9 Fnal polygon model smulated from 3-D surface patches usng meshlab software... 23 Fgure 2.10 The general framework of vew-based 3-D object retreval... 25 Fgure 3.1 Hyperedge generaton usng dfferent K values... 31 Fgure 4.1 3-D object examples n the NTU database... 36 Fgure 4.2 Structure of Buckmnsterfullerene... 37 Fgure 4.3 3-D object examples n the ETH database... 37 Fgure 4.4 Expermental results on the NTU dataset.... 49 Fgure 4.5 Expermental results on the ETH dataset... 50 Fgure 4.6 Expermental results wth respect to dfferent connecton numbers on the NTU dataset... 53 Fgure 4.7 Expermental results wth respect to dfferent connecton numbers on 5

the ETH dataset... 54 Fgure 4.8 Expermental results wth respect to dfferent values on the NTU dataset.... 55 Fgure 4.9 Expermental results wth respect to dfferent values on the ETH dataset.... 56 6

Chapter 1 1. Introducton In ths chapter, we provde the background for understandng mage-based 3-D object reconstructon and 3-D object retreval. We then brefly defne the motvaton and goal of our research. The last secton of ths chapter provdes an overvew of contents of the rest of ths book. 1.1 Background Wth rapd advances n computer technques and the popularty of the camera, a large number of photographs are obtaned. How to obtan a novel 3-D object or 3-D scene from 2-D mages s a challengng task. In the past several decades, 3-D object reconstructon has been nvestgated extensvely. In recent years, graphcs hardware and mage processng technques have made remarkable progress and 3-D technques have been appled n varous felds, such as computer-aded desgncad, medcal dagnoss, onlne shoppng, vrtual realtyvr, and entertanment. For example, 3-D navgaton of cty and museum have become much more popular nowadays. The long hstory of 3-D technology can be drawn the way back to the start of photography. Stereoscopc photography, or the technque of creatng a "thrd dmenson", was frst nvented n 1838 by the Englsh scentst Sr Charles Wheatstone. Stereoscopc photography s a specal moton pcture camera system that records mages from two dfferent perspectves. Eyewear s then used to combne these perspectves and create the lluson of depth. In 1965, a team led by Charles Lang from Cambrdge Unversty 7

started conductng 3-D CAD modelng research. And 3-D modelng became popular and has been appled to varous desgn works. In 1981, Hdeo Kodama of Nagoya Muncpal Industral Research Insttute publshed hs account of a functonal rapd prototypng system usng photopolymers. A sold, prnted model was bult up n layers, each of whch corresponded to a cross-sectonal slce n the model. On 31 January 2010, BSKYB became the frst broadcaster n the world to show a lve sports event n 3-D when Sky Sports screened a football match between Manchester Unted and Arsenal to a publc audence n several selected pubs. Wth the development and wde applcaton of 3-D technology, varous methods have been proposed to produce a lot of 3-D data. Smultaneously, 3-D data are ncreased n both local data storage and onlne data storage. To create a 3-D object or 3-D scene, two methods can be appled as Fgure 1.1 shows. One s to create 3-D object by usng some 3-D modelng methods ncludng actve methods and passve methods. The other s to create 3-D object by combnng or revsng exstng desgns. Accordng to a report from Gunn[1], 20% of desgns should be started from the very begnnng reconstructon, and 80% of desgns can be obtaned by combnng or revsng exstng desgns. Although lots of methods have been nvestgated, 3-D modelng s stll a hgh cost and laborous task. The combnaton and revson of exstng 3-D models can mprove model desgn performance. 8

3-D modelng drectly How to get a 3-D object combnng or revsng exstng 3-D object 3-D object retreval model-based 3DOR vew-based 3DOR Fgure 1.1 The structure of 3-D object rebuld 1.2 Image-based 3-D ModelngIBM and Vew-based 3-D Object RetrevalVBOR The motvaton of ths paper s to explore solutons to get 3-D object from mage vews. Image-based modelng methods rely on a set of 2-D mages of a scene to generate a 3-D model. In recent years, great progress has been made wth applcatons n the feld, and varous methodologes have been proposed to deal wth related problems. Some of these methods have produced outstandng results, and can be roughly dvded nto four categores [2]. The frst nvolves computaton of a 3-D volume and extracton of an optmal surface from t [3]. The accuracy of ths approach s lmted by the resoluton of the voxel grd. The second nvolves the use of voxels, level sets or surface meshes, and teratve evoluton of a surface to reduce or mnmze the cost functon. Although ths approach s wdely used n medcal mage 3-D reconstructon, ts applcablty s lmted. The thrd [4, 5, 6] s based on a set of depth maps. The solutons are flexble, but requre a set of such maps to be merged nto a 3-D scene wth consstency constrants. The fourth nvolves extractng feature ponts and matchng followed by the fttng of a surface to the reconstructed ponts. Wth ths method, wde-baselne stereo matchng s appled to an MVS model [7] to recover salent 3-D 9

features, and a vsual hull model can be shrunk so that the recovered ponts le on the surface. The results are then refned usng cost functon mnmzaton. Although 3-D model reconstructon has been nvestgated for decades, t s stll a challengng task and costs much computaton load. In recent years, 3-D object retreval[12] has attracted much research attenton from both research and ndustral felds. Extensve research attenton [13,14,15] has been dedcated n such an emergng feld[16-19], from ether model-based [20,21,22]or vew-based drectons [23-26], based on the representaton methods of 3-D objects. Most of early objects are model-based, where each object s represented by a vrtual 3-D model, such as trangle mesh. Model-based methods have shown advantages when descrbng the global spatal nformaton. However, one man lmtaton of such methods le n stuaton of lack of model data. Model-based methods hghly depend on the vrtual model, where such model nformaton may be not avalable n many practcal applcatons. Dfferent from model-based methods, vew-based methods [23,30,24,31] have become more useful n recent years. In vew-based methods, each object s represented by a set of vews from dfferent drectons. Such methods are much more flexble than model-based methods, as the model nformaton s not mandatorly requred. Also, vew-based methods can be benefced from exstng mage processng achevements, such as mage feature extracton and comparson. Fgure 1.2 provdes examples of vews from objects. Daras et al. [32] ntroduced that vew-based methods [33, 34] can be more dscrmnatve than model-based methods, and vew-based methods have been nvestgated n recent decade. In vew-based methods, generally, object comparson s based on mult-vew matchng. It s noted that t s stll a challengng task to compare two objects va two groups of vews, whch s dfferent from tradtonal mage comparson. In ths paper, we focus on vew-based object retreval and propose a mult-scale object retreval algorthm 10

va learnng on graph. Fgure 1.3 shows the framework of our proposed method. In our work, shape features are extracted from each vew of objects. The relevance among objects s formulated n a hypergraph structure, where the dstance of dfferent vews n the feature space s employed to generate the connecton n the hypergraph. To acheve better representaton performance, we propose a mult-scale hypergraph structure to model object correlatons. The learnng on graph s conducted to estmate the optmal relevance among these objects, whch are used for object retreval. To evaluate the performance of the proposed method, we conduct experments on the Natonal Tawan Unversty dataset and the ETH dataset. Expermental results and comparsons wth the state-of-the-art methods demonstrate the effectveness of the proposed method. Fgure 1.2 Example vews of 3-D objects 11

Fgure 1.3 The framework of the proposed method. 1.3 Thess Overvew Ths thess s organzed as follows. Ths chapter presents the background, research motvaton and structure of ths thess. We descrbe the hstory and progress of the development of 3-D technology. We also ntroduce the IBM and 3DOR as well as the solutons and our contrbutons to get 3-D object from mage vews. In Chapter 2 we provde a bref overvew of prevous research that can be consdered background for the proposed method. We do some experment n mage-based modelng, then ntroduce the recent progress n 3-D object retreval. Chapter 3 presents the 3-D object retreval method that we use n our research. We focus on vew-based object retreval and propose a mult-scale object retreval algorthm va learnng on graph from multmodal data. In Chapter 4 we ntroduce the testng datasets, comparng methods, evaluaton crtera and expermental results. Experments are conducted on two publc datasets the Natonal Tawan Unversty dataset and the ETH dataset. Methods of ED, AVC and QVS are employed for comparson. The crtera of NN, F, DCG and ANMRR s employed to evaluate the performance of dfferent methods. 12

Chapter 5 s the concludng remarks and future work. 13

Chapter 2 2. Related Work From the 2-D mage sequence to get the 3-D model, the general approach s the method of 3-D modelng. Many research about 3-D modelng has been nvestgated, but t s stll a hgh-cost and laborous task to model hghly realstc 3-D models. We have also done some work n ths area. The technology and the analyss wll be ntroduced n secton 2.1. Wth the development of 3-D technology and 3-D applcatons, people can easly get a lot of exstng 3-D models. We can use these exstng models nstead of a drect 3-D modelng from mages. So we focus on the vew-based 3-D object retreval. From Secton 2.2 we wll ntroduce recent progress on 3-D object retreval. 2.1 From Images to 3-D models 2.1.1 A General IBM Method The acquston methods of the real object model can be roughly dvded nto two categores: actve method and passve method. The typcal representatve of the actve approaches s the method of usng scanner. It can get the 3-D model of objects accurately, whle ts cost s very hgh and t s dffcult to acqure enough data to reconstruct models n all applcatons. The passve method s to reconstruct 3-D model based on mages of the scene or objects one wshes to reconstruct, whch s economcal and can get features wth color drectly. The baselne of the mage-based method s shown as the fgure 2.1. Provded wth a sequence of mages of a scene or an object, t 14

can generate a realstc 3-D model n fve steps. Frstly, for an mage s just a large collecton of pxels wth ther own ntenstes, features matchng should be appled. That s to say, to reconstruct the model of ths object based on mages, t should fnd the correspondng features, such as ponts n dfferent mages, whch may be completed by comparng ntenstes over a small local wndow centered wth a pont. Secondly, the moton and the 3-D structure should be recovered. If there s not a pror calbraton of the camera used to get the mage sequences, a projectve skew should be contaned n ths stage to recovery the calbraton of camera. Thrdly, wth the knowledge of the camera parameters that we have got n the second step or based on pror, we can get a depth estmate for almost each pxel of an mage to match all pxels of an mage wth pxels n ts neghborng mages. In ths way, all ponts of the object can be reconstructed. Fourthly, 3-D model s bult by fusng the results together. Fnally, texture mappng should be appled to acheve the fnal photo-realstc model. Fgure 2.1 The baselne of the mage-based 3-D model reconstructon algorthms. 2.1.2 Super-pxel based IBM Generally, IBM processng s consdered hgh cost and neffcent for texture-less mages. We propose a super-pxel based IBM method to solve the problem. The pxels of the reference mages are clustered nto super-pxels n preprocessng before reconstructon. Then the reconstructng s based on these super-pxels. Fgure 2.2 gves an outlne of the super-pxel based approach. Frst, an mage sequence s nput, camera parameters are derved for the subsequent processng based on a general structure and moton method. At the same tme, good mages are selected for reference and splt nto small segments super-pxels. Feature ponts are then extracted for each mage, but correspondent relatve matchng ponts are found only between mages relatng to the references. From the salent matchng ponts and camera parameters, 3-D ponts are 15

derved va trangulaton and used to construct salent-bass 3-D surface patches. However, the number of these patches s small. It s necessary to expand the patches to nearby voxels n lne wth the geometrc relatonshps lnkng the super-pxels. A photometrc dscrepancy functon s used to remove the ncorrect 3-D surface patches at the post-expanson stage. Fnally, the 3-D patches are used to reconstruct the polygonal surface model va an optmzaton procedure. Fgure 2.2 Outlne of the super-pxel based 3-D modelng approach Camera calbraton s a basc technque n mult-vew mage processng. Many methods have been developed and appled n related areas. The bundle method [61] s used to smulate the process of structure and moton analyss. After ths step, a sparse set of 3-D ponts and camera parameters descrbng the relatonshp between the camera and the scene are occuped. The concept of super-pxel s proposed by Ren and Malk[58], orgnally defned as a knd of over-segmentaton. Super-pxel n ths study s defned as a group of many adjonng pxels, where all pxels have smlar propertes. For the applcaton on 3-D reconstructon, super-pxel has four specfc propertes: 16

Most of feature ponts should be on the boundares of super-pxels. Surfaces wthout textures should also be splt nto small clusters no specfc rule s appled for such splttng. Brghtness and texture dfferences n super-pxels are homogeneous. mage. The number of super-pxels should be large enough to preserve more features n one In ths study, we arrange ntal seeds under a lattce grd, where one ntal seed s selected n each grd wth the most sgnfcant features. If there are no sgnfcant features n a grd, the center pont of the grd s selected. The feature s defned usng the Harrs and SIFT method. Followed by the seed ntalzaton, the next step s to generate the local optmal path connected neghborhood of each seed vertcally and horzontally. The searchng algorthm of shortest path s used to fnd the path. We defne an regon undrected graph accordng to the poston and gradent of pxels. In the undrected graph, the node denotes the pxel, and each node has a weght. The weght s defned as: w 1.0 /1.0 s * t 2.1 t d d d d 2.2 x x y y Where d x and d y s horzontal and vertcal dervatves. Senstvty of s s a tunng parameter. Djkstra algorthm s employed to generate the shortest path, and seed connecton can be accomplshed. We use several ponts sequence on the edge of seed connecton to defne t. In our work, no more than 5 Harrs or SIFT feature ponts are selected for an 17

edge. Therefore, every super-pxel s depcted by a feature ponts sequence. Fgure 2.3 shows an example of super-pxels. It s obvously that most of feature ponts n ths mage, such as object boundares, are located on the edges of super-pxels. Fgure 2.3 An example of resultng super-pxels. After the defnton of super-pxel, 3-D surface patch s defned as followng. A 3-D surface patch model p s essentally a local tangent plane approxmaton of a surface wth specfcally boundary correspondng to a super-pxel n the reference mage. The projecton of a 3-D surface patch onto the reference mage forms a super-pxel n the mage. It has four propertes: poston pos p ; unt normal vector n p ; vertexes node p correspondng to the super-pxel polygon vertexes; and a reference mages r p n whch p s vsble. In contrast to the usual defnton of a patch [7], the patch referred here may be polygonal rather than rectangular Fgure 2.4. 18

3D surface patch node p n p node p node p pos p node p node p Fgure 2.4 Patch defnton 3-D surface patch s correspondng to 2-D super-pxel. Frst, as a normal dense matchng method, feature ponts n each mage are detected usng SIFT and Harrs operators. In ths study, super-pxel s appled to reduce the processng tme and ncrease computng effcency by control the threshold number of features n super-pxels. Only 4 feature ponts are computed for a super-pxel wth the sze of 256 pxels n ths study. For each feature pont n the super-pxel of the reference mage I r, the best matchng pont f f ' n the other mage s searched, and 3-D ponts c assocated wth the pars of matchng ponts are trangulated as shown n Eq.2.3. Multple 3-D ponts may be derved because multple feature ponts the most s 4 n a sngle super-pxel may be derved. The mean coordnates of these ponts are set for the ntal poston of ths 3-D surface patch correspondng to the super-pxel see Eq. 2.4. c { trangulaton from f and f '} 2.3 pos p c 2.4 The drecton vector of the patch s defned as a unt vector from a poston c orentng toward the camera O I of the reference mage as r 19

n p pos p O I / pos p O I 2.5 r r It s clearly that a pont and a normal drecton can defne a plane. The nodes of 3-D surface patches are defned by the ntersecton ponts va back projecton of the nodes of super-pxel polygon wth the patch plane see Fgure 2.5. For the ntalzaton of reference mage r p, t s frst ntalzed from the outcome of camera calbraton, and optmzed by removng some mages wth bgger photometrc dscrepancy score compared to others. Indeed some patches are also removed when all the dscrepancy scores are bg. Nodes of super-pxel Nodes of 3D patches pos p 3D surface patch Back projecton I 1 Super-pxel f I 2 O I r Fgure 2.5 The procedure of back projecton, durng ths procedure3-d surface patch nodes are acqured. As some mages are texture-less mages, and some patches are removed for bg dscrepancy score. The ntal 3-D surface patches are sparse, not all super-pxels have relatve 3-D surface patches. Ths study uses an expanson procedure to create more patches. The goal of the expanson s to reconstruct correspondng patch for each super-pxel. In the study, the super-pxel s called fxed super-pxel, when t has a relatve 3-D patch. Ths spreads fxed super-pxel to nearby super-pxels and makes as many super-pxels assocated wth a correspondng determnate 3-D patches as possble. For a 3-D surface patch, all the neghborng super-pxels of ts fxed super-pxel are 20

dentfed n the reference mage. If the neghbor s already assocated wth a patch, expanson n ths orentaton s termnated. For a known patch p and ts correspondng fxed super-pxel s, a new patch p ' correspondng to the neghborng super-pxel s ' s generated as followng. Unt normal vector n p' and reference mages r p' are ntalzed by replcatng the values of the known patch p. Poston pos p' s ntalzed based on the ntersectng pont where a vewng ray passng through the center of the super-pxel s ' ntersects the plane of patch p. Depth testng s also performed to prevent expanson from the depth wth dramatc changng. Untl now, the expanded patches are some coarse ones because ther normal vectors may be wth bg mstakes. As Fgure 2.6 shows, when the expandng surface plane s curved surface, the normal vector of patch must be corrected as Fgure 2.7 shows. The problem can be thought as a nonlnear mnmzaton problem. Quas-Newton method wth a relevance evaluaton s appled to get the rght normal vectors. 3D surface patch Patch poston Surface s plane Patch orentaton Surface s curved Super-pxel Fgure 2.6 Patches expanded on curved surface. Normal vector correct s necessary. 21

Fgure 2.7 The purpose of normal vector correcton s to fnd the angles a and b, t s a nonlnear mnmzaton problem. The proposed algorthm was tested wth a number of datasets, and was found to be vald as Fgure 2.8 and Fgure 2.9 shows. Its applcablty to the processng of complcated surface objects s currently lmted e.g. leaf, grass etc. Fgure 2.8 Sample mages from the nput dataset 22

Fgure 2.9 Fnal polygon model smulated from 3-D surface patches usng meshlab software. 23

2.2 3-D Object Retreval In ths secton, we wll ntroduce recent progress on 3-D object retreval. 3-D object retreval can be dvded nto two types of methods,.e., model-based methods and vew-based methods. 2.2.1 Model-based Methods Most early 3-D object retreval methods are model-based method. For model-based methods, exstng vrtual 3-D model nformaton s requred. For these methods, low level features, such as volumetrc descrptor [27], surface dstrbuton [21] and geometry[20, 28, 29] can be extracted for object descrpton. Other methods [63, 64] extract hgh level feature for object structure descrpton. To extract model-based feature, Papadaks et al.[35] ntroduced a panoramc vew,.e., panoramc object representaton for accurate model attrbutng PANORAMA, where were generated by projectng the model to a lateral surface of a cylnder. To compare two 3-D models, the dstance was measured by matchng between two PANORAMA mages. In [30], Gao et al. ntroduced a spatal structure crcular descrptor SSCD, whch employed the projected model nformaton n a crcular regon to represent the 3-D model. In SSCD, the projected mage s able to preserve the global spatal nformaton, and the comparson between two 3-D models s acheved by the hstogram dstance for each SSCD vew. Vranc et al. [37] ntroduced an Extenson Ray-based Descrptor ERD method, where the concentrc spheres were used to extract the surface nformaton. In ths method, each samplng surface pont had a value on the correspondng sphere surface. The dsadvantage of model-based 3-D object retreval s that the 3-D model nformaton s requred for the 3-D object retreval. In the case where no 3-D model s avalable, the 3-D model reconstructon s needed to generate a model frst. 2.2.2 Vew-based Methods 24

Vew-based method s much more flexble compared wth model-based methods, because t does not need the vrtual model nformaton. For vew-based methods, each 3-D model s represented by clusters of multple vews and the feature s extracted from vews. The general process s composed of four steps: vew capture, vew selecton, feature extracton and object matchng as shown n Fgure 2.10. 3-D object data The query set Vew Vew Feature Object capture selecton extracton matchng Retreved objects Fgure 2.10 The general framework of vew-based 3-D object retreval Chen et al. [38] proposed the frst vew-based 3-D object retreval method,.e., Lghtng Feld Descrptor LFD. In LFD, several groups of 10 vews are used to represent each 3-D object. For these vews, the Zernke moments and Fourer descrptors were employed as the features. To compare two 3-D objects, the mnmal dstance between two groups of vews were used n [38]. Shh et al.[39] ntroduced an Elevaton Descrptor ED, whch employed sx range vews from dfferent drectons to represent 3-D objects. In ED, the depth hstogram was extracted as the ED feature and matchng between two ED hstograms was measured as the dstance between two 3-D objects. To represent 3-D objects va a set of vews, fve sets of mages, from four vertcal and one horzontal loop drectons, were employed n [65]. In ths method, each group of vews were formulated as a Markov Chan MC. The comparson between two 3-D models can be dvded nto the comparson n the vew set level and the comparson n the model level, and the objectve of 3-D model retreval was to fnd the maxmal a posteror MAP gven the query model. Daras et al. [32] ntroduced the Compact 25

Mult-Vew Descrptor CMVD, whch contaned 18 vews from the vertces of a 32-hedron. Mahmoud et al. [40] proposed a model to employ the curvature scale space as the vew descrptor. The curvature scale space was combned wth Zernke moments to compare 3-D models. Adan et al. [41] proposed a depth gradent mage DGI model, whch employed both the surface and the contour nformaton to avod restrctons concernng the layout and vsblty of 3-D models. To select representatve vews from the large vew pool, a query vew selectonqvs method was ntroduced n [45]. In QVS, a small set of canddate vews were frst selected va vew clusterng. Then the user relevance feedback was nvolved to nteractvely select representatve vews. In ths way, the vews were ncrementally selected, whch can be more dscrmtve to the query nformaton. Ansary et al. [23] proposed the Adaptve Vews Clusterng AVC, where 320 ntal vews were frstly captured and about 20 to 40 representatve vews were selected. The objectve for 3-D model retreval was formulated as a probablstc approach to measure the posteror probablty for one object gven the query object, and the hgher the posteror probablty of one 3-D model, the hgher relevance between t and the query. In [34], a probablstc matchng method was ntroduced, where a postve matchng model and a negatve matchng model were generated ndvdually. For each target object, the postve matchng rato and the negatve matchng rato were measured and then combned for retreval. To measure the dstance between two groups of vews, Gao et al. [31] proposed a Hausdorff dstance learnng method, where a vew-level Mahalanobs dstance metrc for Hausdorf dstanc was learnt through relevance feedback. Hypergraph learnng has also been nvestgated n 3-D object retreval [30], where the relatonshp of 3-D objects were formulated n a hypergraph structure. In recent years, the bag-of-words methods have been nvestgated n 3-D object retreval [46, 47, 48]. Ohbuch et al. [24] ntroduced a bag-of-vsualfeature BoVF method, where the local SIFT features [49] were extracted from the mages and vsual 26

words were generated usng a clusterng-based vsual vocabulary. A hstogram of vsual words was bult as the feature for the 3-D object, and the Kullback-Lebler dvergence KLD was used to measure the dstance between 3-D objects. 27

Chapter 3 3. Mult-Scale Object Retreval va Learnng on Graph In ths Chapter, we ntroduce the proposed mult-scale object retreval va learnng on graph. The proposed method s composed of three components,.e., mult-vew object dstance, mult-scale object graph constructon and learnng on the graph, as shown n Fgure 1.3 3.1 Shape Feature Extracton Feature extracton s an ntegral part of multmeda nformaton retreval. For vew feature extracton, several effectve features can be used, such as Fourer descrptors[8] and Zernke moments[50,51,23,62]. In our work, the Zernke moments are employed as the vsual descrptors. Moments are consdered as popular pattern representaton methods. Zernke moments are a class of orthogonal moments and have been shown effectve n terms of mage representaton. Zernke moments are rotaton nvarant and can be easly constructed to an arbtrary order. Zernke moments are constructed usng a set of complex polynomals whch form a complete orthogonal bass set defned on the unt crcle. Zernke polynomals are expressed n polar coordnates by { P st x, y} as jt P x, y P, R e, 3.1 st st st 28

Where s 0,1,2,..., and defnes the order, t s an postve or negatve nteger depctng the angular dependence, or rotaton, subject to the condtons: s t even, t s 3.2 And, are defned over the unt crcle, j 1 and R s the orthogonal radal polynomal, whch s defned as st R st s t z 1 F s, t, z, 3.3 2 z0 where F s, t, z, s z! s t s t z! z! z! 2 2 s2z 3.4 Zernke moments are the projectons of the mage functon onto the orthogonal basc functons. The s t th Zernke moment for an mage functon f x, y s defned as s 1 ZM st f x, y P st, 3.5 x y k The number of moments for the k th order s k 1 k 1. 2 3.2 Mult-vew Object Dstance Dstance metrc plays an mportant role n multmeda nformaton retreval. For the mage retreval task, whch s based on the matchng of two sngle mages. 3-D object retreval s more complex as the hgh-order nformaton contaned n the multple vews of 3-D object. Mult-vew 3-D object matchng s a many to many matchng problem. 29

Here, we let O { v 1, v 2,... vn} denote one object O wth n vews, and let O j { v j1, v j 2,... v jn } denote another object O j. For each vew v a, a shape feature s extracted for vew representaton. The Zernke moments are employed as the vsual descrptors. To measure the dstance between O and O j based on multple vews, the followng dstance measure s employed n our work: n 1 d O, O j d mn va, O j 3.6 n a1 where d mn v a, O j s the mnmal dstance between v a and all the vews n O j : d mn v a, O j mn{ d v, v b [1, n]} 3.7 a jb Here, d v a, v jb s the Eucldean dstance between v a and v jb based on the Zermke moments feature. 3.3 Mult-scale Object Graph Constructon In our work, the relatonshp among objects are formulated n a object hypergraph. Here, each object s regarded as a vertex n the object hypergraph V, E,, and how to construct the vertex connecton s mportant. Here, the star expanson method n [52] s employed to generate the hyperedges. Each tme, one vertex s selected as the centrod vertex and a hyperedge s generated by connectng ts nearest neghbors. A parameter K s used to select the number of nearest neghbors for each hyperedge. Fgure 3.1 llustrates the constructon of hyperedge va star expanson. We note that dfferent selectons of K ndcates dfferent representaton scales for object formulaton. A large K value ndcates a large amount of objects can be connected by one edge and a small K value wll lead to strct constrant on the smlartes of 30

connected objects by each edge. As t s dffcult to dentfy the optmal K value for hyperedge constructon, multple K values are used n our work to construct edges on the hypergraph, whch leads to a mult-scale object graph. As shown n Fgure 3.1, multple K values can construct multple hyperedges. The mult-scale object hypergraph V, E, s composed by a vertex set V, a hyperedge set E, and the weghts of the edges. Here, each vertex v V denotes one object, the each edge e E s constructed va star expanson, and the weght s assgned wth the equal wght 1. Fgure 3.1 Hyperedge generaton usng dfferent K values. The structure of the object hypergraph s presented by an ncdence matrx H, where the entry of H s calculated by: f v e f v e 1 H v, e 0 3.8 For each vertex, the vertex degree d v s defned as: d v e E ehv, e 31 3.9

For each hyperedge, the edge degree e s defned as: e v V h v, e 3.10 For the vertex degree and the edge degree, two matrces D v and D e are used to denote the dagonal matrces of the vertex degrees and the edge degrees respectvely. 3.4 Graph Learnng for Object Retreval To estmate the relevance among objects, t s mportant to explore the relatonshp of the vertces n the hypergraph structure. In recent years, learnng methods [53, 54] have been performed on hypergraphs, such as data clusterng, classfcaton and rankng. In our work, we formulate the object relevance exploraton task as a bnary classfcaton task,.e., whether one object belongs to the query type or not. here, The followng normalzed Laplacan hypergraph learnng framework s employed arg mn{ R f f } 3.11 f where f s the relevance from each object to the query, R f s the emprcal loss of hypergraph learnng on the labeled data,.e., the query, f s a regularzer on the hypergraph structure, and s a parameter to balance dfferent components on the objectve functon. The emprcal loss R f s defned as: 2 f f y uv R f u y u 3.12 2 32

33 where y s the labeled vector. In y, all the entres are zeros expect the query, whch s one. The regularzer on the hypergraph f s defned as: f I f e v d u d v f e v h e e u h u f e e v h u d e u h e u f v d u d v f u f u d u f e e v h e u h e f T E e V v u V v E e V u E e V v u,,,,,,, 2, 2 3.13 where 2 1 1 2 1 v T e v D H HWD D. Here we let I, equaton can be rewrtten as f f f T 3.14 Now, the cost functon on the learnng task can be rewrtten as: 2 y f f f f T 3.15 The relevance among objects can be learned by the mnmzng the objectve functon, and the optmal f can be solved va: y f 1 1 1 3.16 To further reduce the computatonal cost, Eq.3.16 can be solved usng an teratve process as shown n the follows. We frst ntalze f wth 0 t. Then, we update f

by f t 1 1 t 1 f 1 y 1 3.17 We let t t 1and then go back to update f. Ths process s repeated untl the cost functon does not further sgnfcantly reduced. Wth the learned relevance vector f, all the objects can be ranked n a descendng order, whch generates the object retreval results. 3.5 Computatonal Cost In ths part, we analyze the computatonal cost of our proposed method. The man computatonal load les n the hypergraph constructon and the learnng part. It can be obtaned that the hypergraph constructon process costs O n n K O, where n K s the number of employed K n the hyperedge constructon part, and no s the number of objects n the dataset. The computatonal cost for the learnng part s O n n O, where nt s the teraton number for the alternatng optmzaton process. 2 t 34

Chapter 4 4. Experment In ths secton, we ntroduce the testng datasets, compared methods, evaluaton crtera and expermental results. 4.1 Evaluaton Datasets To evaluate the performance of the proposed method, we have conducted 3-D object retreval experments on two publc datasets,.e., Natonal Tawan Unversty 3-D Model database NTU [38], and Edgenösssche Technsche Hochschule Zürch 3-D object dataset ETH [55]. Fgure 4.1 and Fgure 4.2 shows examples n the NTU and the ETH datasets. 4.1.1 Natonal Tawan Unversty 3-D Model DatabaseNTU The NTU 3-D model database contans two parts, one s the NTU 3-D model benchmark and the other s the NTU 3-D model database. NTU provdes 3-D models for research purpose n 3-D model retreval, matchng, recognton, classfcaton, clusterng and analyss. The benchmark contans a database of 1,833 3-D models, whch are free downloaded from 3-DCafe http://www.3-dcafe.com n Dec. 2001, but removes several models wth faled formats n decodng. The benchmark was clustered nto 47 classes ncludng 549 3-D models manly for vehcle and household tems, and 35

all the other 1,284 models classfed as "mscellaneous". 3-D models n the mscellaneous class are not the same functon but nose for correct retreval. The database contans a database of 10,911 3-D models, whch are free downloaded from the Internet n July 2002. All 3-D models are converted nto Wavefront fle format.obj n the database. Thumbnal mages of each 3-D model are also ncluded n the database. The frst dataset used n our work s the NTU benchmark. In the NTU dataset, each object contans a correspondng 3-D model. Here, the vrtual cameras are set to capture multple vews for each object. In our work, a vrtual camera array wth 60 cameras are used, whch locate on the vertces of a polyhedron wth the smlar structure of Buckmnsterfullerene as Fgure 4.2 shows. Usng these vrtual cameras, 60 vews can be obtaned for each 3-D object. Fgure 4.1 3-D object examples n the NTU database 36

Fgure 4.2 Structure of Buckmnsterfullerene 4.1.2 The Edgenösssche Technsche Hochschule Zürch DatabaseETH The ETH dataset s a real world 3-D object dataset wth multple vews. There are 80 objects from 8 categores n the ETH dataset, such as apple, pear, tomato, dog, cow, cup, car, and horse. In ths dataset, each object contans 41 vews, whch are captured spaced evenly over the upper vewng hemsphere. All the cameras are located on the postons by subdvdng the faces of an octahedron to the thrd recurson level. All mages have been taken wth a Sony DFW-X700 progressve scan dgtal camera wth 1024 768 pxel resoluton and a Tamron 6-12mm varfocal lens F1.4. For every mage, a hgh-qualty segmentaton mask s provded. Fgure 4.3 3-D object examples n the ETH database 37

4.2 Compared Methods To evaluate the 3-D object retreval performance of our method, we employ the followng state-of-the-art methods for comparson. 4.2.1 Elevaton DescrptorED Elevaton descrptor ED [39] s a vew-based 3-D object retreval method, whch proposed a new feature for 3-D model retreval, for 2-D slhouettes represented by bnary mages do not descrbe the alttude nformaton of the 3-D model from dfferent vews well. Based on the new feature, a new content-based multmeda retreval system of 3-D models retreval s desgned. One 3-D model should be represented wth sx elevatons to descrbe ts alttude nformaton from sx dfferent vews. Each evaluaton s represented by a 2-D gray-level mage decomposed nto several concentrc crcles and obtaned by takng the dfference between the alttude sums of two successve concentrc crcles. For each 3-D model, a tghtest boundng box s constructed and decomposed nto a 2L 2L 2L voxel grd. voxel m, n, h represents a voxel located at m, n, h. Based on whether there s a polygonal surface located wthn the voxel, voxel m, n, h 1 or voxel m, n, h 0. As a result, each voxel wth a polygonal surface s weghted equally. Let the model s center move to the locaton L, L, L, the average dstance from all voxels wth voxel m, n, h 1 to the center s lnearly. In ths way, ED s robust for rotaton and nvarant to translaton and scalng of 3-D models. In addton, an effectve way for extractng features from each gray-level mage s employed n order to make them less senstve to rotatons. Each elevaton s decomposed nto L concentrc crcles C j, j 1,2..., L around the center pont to extract the ED from sx elevatons. For the k th elevaton, g k j s 38

the sum of gray values of pxels n the j th crcle and calculated by gk j fk r, c, j 1,2,..., L r, c C j 4.1 The dfference between two successve concentrc crcles s defned as: dk j gk j gk j 1 4.2 and the sum of all d k j values for the k th elevaton s D k d k j 4.3 L j1 The ED X of a 3-D model s defned as: X ] T T T T T T T [ x1, x2, x3, x4, x5, x6 4.4 where x ] T k [ xk 1, xk 2,..., xk L. For a 3-D model, sx elevatons are obtaned to descrbe the alttude nformaton of 2-D projectons from sx dfferent vews: front, top, rght, rear, bottom and left whch are notated successvely as, k 1,2..., 6. Takng the relatve postons of the E k elevatons nto account, t can dvde the sx elevatons nto three pars 1 4 2 5 3 E6 E, E, E, E, E, and reduce the matchng tme between two models to 3! 2 3 48 matchng operatons. For the th permutaton, 1 48 p matchng operaton, the dstance between X ] T T T T T T T [ x1, x2, x3, x4, x5, x6 and Y ] T T T T T T T [ y1, y2, y3, y4, y5, y6 s defned as 39

6 6 L X, Y xk y p k xk r y p k r k1 k1 r1 Ds 4.5 and the dstance between these two models s Ds mn X Y Ds, X, Y 148 4.6 The smlarty measure between X and Y s defned as Sm X, Y 1 4.7 Ds X, Y The larger the smlarty value s, the more smlar a model s. Take the model wth the largest smlarty value as the retreved model. 4.2.2 Adaptve Vews ClusterngAVC Adaptve vews clusterng AVC [23] provdes a probablstc Bayesan method for 3-D model retreval from these vews. A set of characterstc vews V V, V,..., V } represent a model n model collecton { 1 2 c D b, wth C the number of characterstc vews. Correspondng to a 3-D request model Q, the target retreval model M s the closest one n hghest probablty of P M Q. P M Q can be wrtten as D b, wth the P M K k 1 k k Q P M V P V Q 4.8 Q Q Where K s the number of characterstc vews of the model Q. Let H be the set of all the possble hypotheses of correspondence between the request vewv and a k Q 40

model k k k M, H h h... h }. A hypothess { 1 2 N k h p means that the vew p of the model s the vew request k V. The sgn represents logc or operator. If an cq hypothess k h p s true, all the other hypotheses are false. P M V cq can be expressed k k by P M H. N k j k P M H j P M, VM h 1 j 4.9 The sum j N j k P M, VM h 1 j can be reduced to the only true hypothess P M, V j c M H k j. In fact, a characterstc vew from the request model Q can match only one characterstc vew from the model the maxmum probablty. M. The characterstc vew s chose wth P M K k 1 Q j k k Max P M, V h P V Q 4.10 j M j Q Usng the Bayes theorem: P M k j j K P h j VM, M P V M P M M k Q Max P V Q k 1 j Q 4.11 N K k j j P h V, M P V M P M 1 k 1 j M M Wth P M the probablty to observe the model M. 1. M / M N P M e. Where M s the number of characterstc vews of the model M. s a parameter to hold the effect of the probablty P M. The algorthm concepton makes that, the complex s the geometry of the 3-D model, the greater s the number of ts characterstc vews. 41

On the other hand P V j M j. N Vr / 320 M e j V r M M 1 Where N s the number of vews represented by the characterstc vew j of the model M. The greater s the j V r M number of represented vews N, the more the characterstc vew V j c M s mportant and the best t represents the 3-D model. The coeffcent s ntroduced to reduce the effect of the vew probablty. k j The value P h V, M s the probablty that, knowng that we observe the j M characterstc vew j of the model M, ths vew s the k vew of the 3-D query model Q : P h V k j j M, M 1 D Wth D h q, h the Eucldean dstance between k Q, h j v M j v M the 2-D Zernke descrptors of Q and of the V j c M characterstc vew of the 3-D model M. In AVC, the representatve vews are selected ether from the 60 vews n the NTU dataset or the 41 vews n the ETH dataset. Then, the probablstc matchng s conducted to measure the relevance between each object to the query. 4.2.3 Query Vew Selecton MethodQVS For query vew selecton QVS [45], the query vews are nteractvely selected and ncrementally ncreased. In ths method a group of vews are provded to descrbe the query object Q q, q,..., q } that contans m vews. Vew clusterng s frst { 1 2 m conducted to group vews nto clusters usng a herarchcal agglomeratve method[66]. The central vew s selected from each cluster to generate a canddate vew set Q ~ wth r canddate query vews. After generatng the canddates, a vew graph s constructed based on the relatonshp among the canddate query vews, and a random walk process s employed to select the ntal query vew. In the vew graph, each node denotes a 42

canddate and the edge between two canddates smlarty: v and v j s defned as ther vsual s j d v, v j exp 4.12 The transton probablty between the th and the j th canddates s defned as sj p j 4.13 s k k and the random walk process s actually repeatng t 1 t 0 k pk 1 k 4.14 untl convergence. The canddate q ~ K wth the hghest score s selected as the ntal query vew. If users are not satsfed wth the ntal retreval results, a relevance feedback process can be conducted to refne the search results by mnmzng ts dstance to the postve samples and maxmzng ts dstance to the negatve samples. A dstance metrc can be learned for the selected query vew wth the crteron that mnmzes the dstance between the selected query vew and the relevant objects and maxmzes ts dstance to the rrelevant objects. All the selected query vews are combned usng the learned weghts for next search. The dstance estmaton of two mult-vew 3-D objects n QVS plays the central role n the retreval process. Gven two objects Q and O, the dstance d Q, O s defned as d Q, O d q ~, O 4.15 K j1 j j 43

where K s the number of the vews of the query object, and d q ~ j, O s defned by ~ mn ~ T d q, O q v W q ~ v 4.16 j j, p j j, p P Here q ~ j s the j th selected query vew of Q, and p v, s the p th vew of the th object n the database. And Mahalanobs dstance metrc for W j s the correspondng weght metrc of the q ~ j. In our experments, we only compare our proposed method wth the QVS method wthout relevance feedback. In our experments, we mplemented ED, AVC and QVS followng the ntroductons n [39, 23, 45]. For our method, s set as 100, and the number K of nearest neghbors s selected as {5; 10; 15; 20}. In 3-D object retreval experments, each tme one object s selected as the query, and ths process s repeated untl all the objects are severed as the query once. The average retreval performance for all the objects n each dataset wll be used for comparson. 44

4.3 Evaluaton Crtera For measurng the 3-D object retreval performance, evaluaton crtera s mportant to evaluate the dfferent methods. Gven a vew of the query object, 3-D object retreval method can be appled to calculate the smlarty wth other objects n object database. After gotten the fnal rankng lst of retreved 3-D objects, the outcome should be justfed by some evaluaton crtera. We employ the followng crtera to compare dfferent methods n our experments. The crtera ncludes nearest neghbor precson, F-measure, dscounted cumulatve gandcg, and average normalzed modfed retreval rankanmrr. 4.3.1 The Accuracy of The Nearest NeghborNN The accuracy of the nearest neghbor NN evaluates the retreval accuracy of the frst returned result. NN ranges from 0 to 1, a hgher value ndcates better performance. The NN s a smple but effectve crtera whch s defned as 4.17 shows. 4.3.2 F-MeasureF F measure[67] s a composte measure of both the precson and the recall for a fxed number of returned results. The precson of retreved objects can be defned as follows: {relevant objects} {retreved objects} precson 4.17 {retreved objects} Where {retreved objects} are the retreved objects gven the query, {relevant objects} are the relevant objects for the querythe groundtruth. X s the number of objects n X, and X Y s the ntersecton whch specfy the objects n both sets X and Y. The precson value ranges from 0 to 1. Precson of 1 ndcates 45

all the retreved results are correct. The recall evaluates the recall of retreved objects compared wth the groundtruth. The recall s defned as follows: {relevant objects} {retreved objects} recall 4.18 {relevant objects} The recall value ranges from 0 to 1. Recall of 1 ndcates that all the relevant results have been correctly retreved. A commonly used performance measure that combnes Precson and Recall s the F-measure, also known as the balanced F-score: P R 2 P R K K F 4.19 K K Where K s the number of selected top returned results, the top K results, and P K s the precson for R K s the recall for the top K results. F ranges from 0 to 1, a hgher value ndcates better performance. In our experment K s set as 20. F s defned as F 2 P R P R 20 20 4.20 20 20 where P 20 and R 20 are the precson and the recall of the top 20 retreval results, respectvely. 4.3.3-Dscounted Cumulatve GanDCG DCG[56] measures the rankng performance of the retreved result lst, whch s a statstc that gves relevant object hgh score. DCG works under the assumpton that a 46

user s less lke to consder lower results. The DCG value s then defned as G[1] f 1 DCG [ ] G[ ] 4.21 DCG[ 1] otherwse log 2 where 1 f the th result s correct G [ ] 4.22 0 otherwse We explore the behavor of DCG as the relatve weght gven to hghly relevant objects vares. By manpulatng ths weght we can closely approxmate ether evaluaton by all relevant objects or evaluaton by hghly relevant objects only. DCG ranges from 0 to 1, a hgher value ndcates better performance. Assumng the number of all relevant objects s, and the number of all objects s n, the maxmal DCG s computed as DCGn DCG max 4.23 1 1 2 log 2 4.3.4 Average Normalzed Modfed Retreval RankANMRR ANMRR[57] measures the rank performance gven a rankng lst, whch consders the rankng nformaton of relevant objects among the top-retreved objects. ANMRR ranges from 0 to 1, and the smaller the value of ths measure the better the matchng qualty of the query s. ANMRR s defned as follows: To calculate ANMRR, the average retreval rank AVR Q for a gven k th query Q k s depcted as: k NR Q k RANK AVR Q 4.24 k 1 NR Qk 47

where NR Qk s the number of relevant objects for the query Q k. If the th result s relevant to the query then RANK s the rankng poston; otherwse RANK 1. 25 S k. k S s the top-ranked returned retrevals, where: S k mn{ 4 NR Q,2GMT} 4.25 k and GMT s the maxmal number of relevant objects for all queres. The modfed retreval rank s: MRR 1 NR Q AVR Qk 4.26 2 k Qk Then the modfed retreval rank can be normalzed to compute the normalzed MRR as follows: NMRR Q k MRR Qk 4.27 1 NR Qk 1.25 Sk 2 Fnally, the average NMRRANMRR can be calculated by averagng the NMRR values over all queres: n NMRR Qk k ANMRR 1 n 4.28 Where n s the number of queres. 48

4.4 Expermental Results Expermental results on the NTU dataset and comparson of dfferent methods are shown n Fgure 4.4. As shown n these results, our proposed method,.e., mult-scale object graph learnng MSOGL, acheves the best performance compared wth other methods. Based on NN, the proposed method acheves an mprovement of 146.91%, 53.72%, and 12.66%, compared wth ED, AVC, and QVS, respectvely. In terms of F, the mprovement from the proposed methods s 99.40%, 27.80%, and 7.77%, respectvely. For DCG, the gan s 55.32%, 8.06%, and 6.01%, respectvely. Regardng the rankng performance, the proposed method acheves an mprovement of 9.96%, 4.03%, and 3.31% n terms of ANMRR compared to ED, AVC, and QVS, respectvely. We can also observe smlar results on other crtera. Fgure 4.4 Expermental results on the NTU dataset. 49

Expermental results on the ETH dataset and comparson of dfferent methods are shown n Fgure 4.5. As shown n these results, our proposed method acheves the best performance compared wth other methods. Based on NN, the proposed method acheves an mprovement of 23.8%, 23.8%, and 11.8%, compared wth AVC and QVS, respectvely. In terms of F, the mprovement from the proposed methods s 19.69% and 11.94%, respectvely. For DCG, the gan s 10.08% and 4.12%, respectvely. Regardng the rankng performance, the proposed method acheves an mprovement of 25.1% and 14.3% n terms of ANMRR compared to ED, AVC, and QVS, respectvely. Smlar results can be observed from other crtera. Fgure 4.5 Expermental results on the ETH dataset. As shown n the results, we can have the followng observatons. 50

The proposed acheves the best performance compared wth all other compared methods. Ths satsfactory result can be dedcated to the better formulaton of our method on object relatonshp. The proposed method s able to formulate the connectons among objects from multple scales. Compared wth exstng formulaton methods, our proposed method can be much more robust to object varatons. In ths way, dfferent connectons among objects can be modeled n the hypergraph structure and the learnng on hypergraph can explore the optmal relevance among these objects. The gan on the ETH dataset s smaller than that on the NTU dataset. Ths can be dedcated to the hgh performance of all methods on the ETH dataset. The better performance can be dedcated to the flexble structure of the proposed method. In the mult-scale relatonshp formulaton,.e., mult-scale hypergraph constructon, the proposed method s able to explore all possble vew matchng results, whch s hard to fnd the best one n tradtonal methods. In ths way, the proposed method can acheve better performance through the optmal data dstrbuton modelng. 51

4.5 Analyss 4.5.1 On the Mult-scale Hypergraph Connectons In ths subsecton, we evaluate the nfluence of the mult-scale hypergraph connectons. The parameter K ndcates how many nearest neghbors can be connected by one hyperedge. When K s too large, the connected objects wll be much dssmlar. When K s too small, the constructed hyperedge wll lose the dscrmnatve performance. Here we evaluate dfferent K values and the combnaton of multple K values, whch s used n our work. More specfcally, we vary K from 5, 10, 15, to 20, and the expermental results on the two datasets are demonstrated n Fgure 4.6 and Fgure 4.7, respectvely. As shown n these two fgures, the performance wth sngle K value vares a lot wth respect to the selecton of K, and dfferent K values have dfferent performance n the two datasets. When multple K values are employed, as shown n our method, the overall performance becomes steady and better than that of each sngle one. These results can demonstrate the effectveness of our proposed mult-scale hypergraph connecton method. 52

Fgure 4.6 Expermental results wth respect to dfferent connecton numbers on the NTU dataset. 53

Fgure 4.7 Expermental results wth respect to dfferent connecton numbers on the ETH dataset. 4.5.2 On Parameter In ths secton, we evaluate the nfluence of the parameter on the 3-D object retreval performance. s a parameter to balance dfferent components n the learnng formulaton. In ths part, we vary from 0.001 to 10000, and evaluate the performance of our method. Expermental results on the two datasets are demonstrated n Fgure 4.8 and Fgure 4.9, respectvely. As shown n these results, when s too small, such as 0.001, the performance s not very satsfactory. When vares n a large range, such as from 1 to 10000, the performance s steady and satsfactory. These results can demonstrate that the proposed method s robust to the selecton of the parameter. 54

Fgure 4.8 Expermental results wth respect to dfferent values on the NTU dataset. 55

Fgure 4.9 Expermental results wth respect to dfferent values on the ETH dataset. 56