Similarity Search of Flexible 3D Molecules combining Local and Global Shape Descriptors

Size: px
Start display at page:

Download "Similarity Search of Flexible 3D Molecules combining Local and Global Shape Descriptors"

Transcription

1 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 1 mlarty earch of Flexble 3D Molecules combnng Local and Global hape Descrptors Apostolos Axenopoulos, Member, IEEE, Dmtros Rafalds, Georgos Papadopoulos, Elas Housts, and Petros Daras, enor Member, IEEE Abstract In ths paper, a framework for shape-based smlarty search of 3D molecular structures s presented. The proposed framework explots smultaneously the dscrmnatve capabltes of a global, a local and a hybrd local-global shape feature to produce a geometrc descrptor that acheves hgher retreval accuracy than each feature does separately. Global and hybrd features are extracted usng parwse computatons of dffuson dstances between the ponts of the molecular surface, whle the local feature s based on accumulatng parwse relatons among orented surface ponts nto local hstograms. The local features are ntegrated nto a global descrptor vector usng the bag-of-features approach. Due to the ntrnsc property of ts consttutng shape features to be nvarant to artculatons of the 3D objects, the framework s approprate for smlarty search of flexble 3D molecules, whle at the same tme t s also accurate n retrevng rgd 3D molecules. The proposed framework s evaluated n flexble and rgd shape matchng of 3D proten structures as well as n shape-based vrtual screenng of large lgand databases wth qute promsng results. Index Terms Bonformatcs (genome or proten) databases; flexble 3D molecular shape comparson; vrtual screenng. 1 INTRODUCTION T Φ HE THREE DIMENIONAL TRUCTURE of a bologcal molecule s very mportant n order to understand ts functon and bologcal acton. Comparson of the 3D molecular structures s useful n a varety of applcatons such as proten functon predcton, computer aded molecular desgn, ratonal drug desgn and proten dockng. Followng the smlarty property prncple [1], accordng to whch smlar structures are lkely to have smlar propertes, several approaches for molecular structure comparson have been proposed, usng dfferent representatons of the molecules. As an example, n ratonal drug desgn, the process of vrtual screenng s usually appled, where gven a target molecule, a search s performed n a large database for compounds that are most smlar to the target. nce these compound databases range from thousands to mllons of structures, an deal method should provde accurate and at the same tme rapd smlarty matchng. Among the varous exstng structural comparson methods [3] [52], those that are based on comparson of structures by ther manchan orentaton [53] or the spatal arrangement of secondary structure [5] are qute slow, thus, smlarty search n large molecular databases can be tme-consumng. Therefore, n order to accelerate the search tme, methods of 3D shape matchng have been proposed n the lterature. A. Axenopoulos and E. Housts are wth the Department of Computer & Communcaton Engneerng, Unversty of Thessaly, olos, Greece (e-mal: axenop@t.gr; enh@nf.uth.gr) P. Daras and D. Rafalds are wth the Informaton Technologes Insttute, Centre for Research & Technology Hellas, Therm- Thessalonk, Greece (e-mal: daras@t.gr; drafal@t.gr) G. Papadopoulos s wth the Department of Bochemstry & Botechnology, Unversty of Thessaly, Larssa, Greece (e-mal: geopap@med.uth.gr)

2 2 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC 1.1 Related Work Technques for smlarty matchng of molecular structures can be classfed nto dfferent categores based on the molecular representaton [2]. These representatons may nclude backbone Ca postons [3], dstance maps [4], secondary structure elements [5] and backbone torson angles [6]. The technque/algorthm that s used for comparson hghly depends on the chosen representaton. As an example, for backbone representatons, a common technque s dynamc programmng [3]; spatal arrangements are used wth secondary structure elements [5], whle Monte Carlo algorthms are used wth dstance maps [4]. A broad category of technques for Molecular hape Comparson (MC) rely on fndng an optmal superposton of the molecules that are compared (superposton methods) [11]. uperposton s also appled to proten structural algnment to compare a par of structures, where the algnment between equvalent resdues s not gven a pror [4][64][65][66][67][68]. Although superposton methods are partcularly effectve (n terms of dentfyng smlartes between molecular structures), they lack effcency; they are extremely computatonally expensve, whch makes search n large molecular databases a tme consumng task. As the need for rapd and accurate comparson s becomng even more crtcal, due to the ncreasng sze of the databases, descrptor-based methods have been ntroduced [11][28]. These extract low level features (descrptors) that capture the spatal profle of the molecule as a multdmensonal feature vector. In ths case, smlarty matchng s reduced to descrptor comparson usng a common dstance measure, whch obvates the need for superposton. nce the work presented n ths paper belongs to the category of descrptor-based technques, a more detaled state-ofthe-art analyss of these methods s provded n the sequel. In shape-based approaches, the molecule s treated as a three-dmensonal (3D) object, on whch an approprate algorthm s appled to extract low-level descrptors that unquely characterze ts shape. A common representaton that s extensvely used s the molecular surface [7]. Consderng the molecular surface as nput, several features can be generated, such as pn Images [8] or hape Hstograms [9]. pn Images are local 2D descrptons of the surface based on a reference frame that s defned by the assocated surface ponts. hape Hstograms, on the other hand, explot global geometrc propertes of the molecule captured n the form of a probablty dstrbuton sampled from a shape functon (e.g. angles, dstances, areas). In [10], the 3D molecular surface s gven as nput and 2D vews of the surface are taken from 100 unformly sampled vewponts. Comparson s performed by mult-vew matchng usng 2D Zernke moments and Fourer descrptors for each 2D vew. Mult-vew representaton has been proven qute effectve for shape matchng of 3D objects [59]; however, for most mult-vew-based methods, the optmal performance s acheved when the

3 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 3 database objects have symmetres,.e. n retreval of generc objects [54]. In the case of molecular shapes, these symmetres are not present. Ths obstacle could be overcome by usng features oblvous to symmetres, such as ntegratng local features nto bag-of-words to represent each vew [60]; however, such approaches have not been reported so far to address the MC problem. Apart from the molecular surface, other representatons are also possble. The method presented n [11][12] descrbes the shape of a molecule through ts set of nteratomc dstances, whch s encoded as a geometrcal descrptor vector. The method acheves very fast comparson tmes and s approprate for vrtual screenng problems. Another shape representaton for molecular structure comparson s alpha shapes [71][72], whch provde a coarser representaton of the Connolly surface. Due to ther hgh computatonal complexty, molecular structure comparson algorthms are usually parallelzed [73][74] n order to dstrbute the processng task nto several processors, thus, speed-up the matchng process. Fnally, there s also a category of more recent methods wth the ablty to dentfy subtle dfferences among very smlar protens, whch asssts n fndng small structural varatons that create dfferences n bndng specfcty [75][76][77]. The latter s partcularly nterestng, takng nto account the fact that the varaton of just a few resdues can be enough to alter actvty or bndng specfcty. An nterestng category of shape-based approaches comprses methods that extract moments from the 3D object. These have been successfully appled n pattern recognton problems [13]. The moment-based representatons result n compact descrptor vectors wth hgh dscrmnatve power. Examples of moments are based on the theory of orthogonal polynomals, such as 2D/3D Zernke moments and Legendre moments [14]. These descrptors allow also reconstructon of the object from ts moments [15]. The method n [16] takes as nput the volume of the 3D molecular structure producng a new doman of concentrc spheres. In ths doman, 2D Polar-Fourer coeffcents and 2D rawtchouk moments are appled, resultng n a completely rotaton-nvarant descrptor vector. phercal Harmoncs have been wdely used n molecular smlarty comparson problems such as vrtual screenng [17], proten structure representaton and comparson [18] and molecular dockng [19][20]. phercal Harmoncs have the advantage of allowng the surface nformaton to be encoded n a compact form as an orthonormal 1D vector of real numbers allowng fast comparson. Ther man dsadvantages are: a) they represent only star-shape surfaces; and b) the handlng of algnment problems s assocated wth the fast comparson of objects [21]. Recently, 3D Zernke descrptors (3DZD) have been ntroduced as a representaton of the proten surface shape [22]. These are based on a seres expanson of a gven 3D functon. 3DZDs are rotaton nvarant, wth the proten structures not necessarly beng

4 4 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC algned to perform the molecular shape comparson. Another advantage of 3DZDs s that they allow other characterstcs of a proten surface, such as electrostatc potentals, to be ncorporated nto the descrptor vector [22]. 3DZDs have been used n problems of proten structure retreval [2], proten-proten dockng [23] and vrtual screenng [24] wth qute satsfactory results. In all methods for molecular shape comparson descrbed above, the 3D molecules are treated as rgd objects. A drawback of these approaches s that they are not robust to shape deformatons of flexble molecules. nce many molecules are flexble and ths flexblty s part of ther functon, t should by no means be underestmated. To address such problems, methods for non-rgd shape matchng should be utlzed. uch methods have been ntroduced to address problems that nclude artculaton of the 3D objects (e.g. dfferent human or anmal poses n generc 3D object retreval), as rgd shape descrptors have been proven napproprate [30][61]. The two man categores of non-rgd approaches are: a) global-shape-based and b) localshape-based methods. The former [25][26][27] usually transform the Eucldean space or Eucldean metrcs [40] to a metrc space where the parwse dstances between ponts of the 3D object surface are nvarant to deformatons of the 3D object. These dstances are usually accumulated nto a hstogram, whch provdes the fnal descrptor vector. Examples nclude canoncal forms [31], geodesc dstances (GD) [32], nner dstances (ID) [27] or dffuson dstances (DD) [28]. The dfference of DD comparng to GD and ID s that DD s computed as the average length of paths connectng two ponts, whle GD and ID represent the length of the shortest path. Ths makes DD more robust to topologcal changes and, thus, t has been proven more effcent to flexble molecular shape comparson problems. In [28], the Dffuson Dstance hape Descrptor (DDD) s a hstogram of the dffuson dstances between all sample pont pars on the molecular surface. Experments n a database of flexble molecules show that DDD outperforms smlar approaches. Local-shape-based methods sample the surface and extract descrptors for each of the sampled local regons. Then, a codebook s created and a bag-of-features method s appled to generate a global shape descrptor [33][34][35]. A man challenge n these problems s the selecton of the most approprate local shape descrptor [61]. Apart from the dscrmnatve ablty, the descrptors should fulfl addtonal crtera such as fast descrptor extracton, compactness and rotaton nvarance. everal descrptors have been proposed that fulfl the above selecton crtera. The hape Impact Descrptor (ID) was ntroduced n [69] as a shape smlarty measure for 3D objects and t s based on the dea that objects of smlar shape wll have smlar surroundng felds created by the nserton of the 3D object n the space. The Local pectral Descrptor has been proposed n [33] for retreval of non-rgd 3D meshes and t s based on the extracton of geometrc de-

5 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 5 scrptors, that s egenvectors of the Laplace-Beltram operator (LBO), from a surface patch centered around a sample pont on the mesh. The urflet-par-relaton Hstograms method was ntroduced n [37] for global shape representaton; furthermore ths method was exploted n [63] as a local feature for non-rgd 3D object retreval. It computes ntrnsc geometrc propertes (azmuthal angle, cosne of polar angle, drecton and dstance) between pars of orented surface ponts n a 3D surface. Another approach, whch was ntroduced n [29] for fast screenng of protens, s based on extracton of local patches from the proten surface and computaton of a geometrc fngerprnt (dstrbuton of curvatures) for each patch. It explots local surface smlartes and acheves rapd shape comparsons. Although local-shape-based methods are approprate for non-rgd shape matchng problems, most of them have nferor performance n rgd shape retreval over rgd methods [35]. In fact, only few methods acheve hgh performance n both rgd and non-rgd 3D objects [60][62]. It has been recently proven that combnng multple shape descrptors can sgnfcantly mprove the performance of rgd 3D shape retreval [36]. In [35], a combnaton of global and local features s proposed, where the Local Dstance Feature (LDF) enhances the local descrptors extracted n 3D meshes by addng spatal context. LDF combnes local characterstcs as t s computed on unformly-sampled keyponts of the 3D surface wth global characterstcs as t takes nto account the set of dffuson-lke dstances from each keypont to the surface ponts of the entre mesh. These dffuson-lke dstances are computed by usng a Manfold Rankng algorthm [41]. Followng the same concept, a framework that combnes multple shape descrptors to address both rgd and flexble molecular shape matchng problems s proposed n ths paper. 1.2 Method Overvew and Contrbutons In Fg. 1, the block dagram of the proposed method s depcted. The crystal structure of the molecule s gven as nput (e.g. PDB fle) and ts olvent Excluded urface (Es) s generated n the form of a trangulated mesh. Then, a mesh smplfcaton step s performed on E, resultng n two sets of ponts: a set of N orented ponts and a set of N keyponts ( N < N ) that provde a coarser representaton of the 3D molecule. In the descrptor extracton step, two dfferent descrptor vectors are proposed n ths work: the Bag of Augmented Local Descrptor (BoALD) and the Modal Representaton of the Dffuson-Dstance Matrx (DDMR descrptor). These descrptor vectors are combned nto a common dstance measure n order to calculate the dssmlarty between the query molecule and the molecules of a database.

6 6 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC Fg. 1. Block dagram of the proposed method The man contrbuton of the proposed work s that t successfully addresses the problem of shape comparson of flexble 3D molecules by combnng a global, a local and a hybrd local-global feature nto a unfed descrptor. uch an approach has not been reported so far to the best of the authors knowledge. Although numerous non-rgd shape matchng approaches have been ntroduced [56][57], whch deal effectvely wth deformatons of artculated objects, t cannot be nferred that they are also applcable to flexble molecules [26]. The peculartes of the molecular shape as well as the complexty of molecular shape deformatons, as opposed to deformatons of artculated objects (e.g. humans, anmals), ndcate the need for a method that captures the molecular conformatons n a more effcent manner. The method should be hghly dscrmnatve and at the same tme able to handle shape deformaton of molecules wth topologcal changes. Based on the fact that combnaton of dfferent shape features produces more dscrmnatve descrptors [36][58], n our case, we exploted the propertes from a dversty of features, such as a global, a local and a hybrd localglobal feature to produce an effectve descrptor. Addtonally, the global shape feature that has been ntegrated n our framework s based on the dffuson dstance, whch s able to capture topologcal changes n molecular shapes [28]. The proposed unfed framework demonstrates superor performance to exstng methods for shape comparson of flexble molecules. Experments performed n a benchmark Database of Macromolecular Movements (MolMovDB)[38] show that our method clearly outperforms other state-of-theart approaches. At the same tme, our method acheves hgh accuracy n retreval of rgd molecules as well. More specfcally, t outperforms exstng molecular shape matchng approaches n three datasets. Thus, the proposed framework s applcable to both rgd-body and flexble molecular shape comparson problems. Addtonal contrbutons of our work nclude:

7 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 7 Introducton of an accurate global shape descrptor, by mprovng exstng work on Dffuson-Dstance-based descrptor: the work of [28] based on Dffuson Dstances s further extended n ths paper. Instead of computng hstograms of dffuson dstances between all sample pont pars on the molecular surface, we provde a new representaton by performng sngular value decomposton (D) on the matrx that summarzes all pontto-pont dffuson dstances on the molecular mesh. The proposed Modal Representaton of the Dffuson Dstance Matrx acheves better results n smlarty matchng of flexble molecules, than the method n [28]. Evaluaton of several state-of-the-art local shape descrptors n order to select the local feature that best fts to our framework: the selecton crtera nclude computatonal effcency n descrptor extracton, compactness of the descrptor, rotaton-nvarance and mproved dscrmnaton capacty n the flexble molecular shape comparson problem. Eventually, a shape descrptor, whch s based on urflet-par relatons [37] and fulfls the above crtera, has been selected. Fnally, t s worth mentonng that the resultng shape descrptors consttute a compact representaton of the molecular shape. nce t s a pure shape-based method (.e. the descrptors do not rely on physcochemcal nformaton), t s applcable to both macromolecules (e.g. protens) and small lgands. Thus, throughout the descrpton of the approach, n ectons 2, 3 and 4, the term molecule wll be used referrng to both protens and lgands. A dstncton wll be made n Experments secton, though, snce the frst two datasets refer to protens and other macromolecules and the last two datasets refer to small lgands. The rest of the paper s organzed as follows: n ecton 2, the pre-processng procedure s descrbed, ecton 3 analyses the computaton of modal representaton based on dffuson dstances (DDMR) and ecton 4 the computaton of the Augmented Local Descrptor (ALD). The combned matchng scheme that ncludes the global, the local and the hybrd feature s descrbed n ecton 5. Experments performed n four benchmark datasets are reported n ecton 6. Fnally, conclusons are drawn n ecton 7. 2 PREPROCEING The preprocessng procedure conssts of two steps: the frst step nvolves computaton of the olvent Excluded urface (Es) of the molecule, whle, durng the second step, the E s remeshed so that each molecule s represented by the same number of orented ponts. These preprocessng steps are requred for descrptor extracton. Input to the system s the crystal structure of the molecule (e.g. n PDB fle format), whch represents ts atoms n the 3-dmensonal space (x, y, z coordnates). In order to generate a E, the Maxmal peed Molecular urface (MM) [51] software has been utlzed, whch s based on rollng a probe sphere (of sze equal to the sze of the solvent molecule) over the exposed contact surface of each atom.

8 8 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC a) b) Fg. 2. a) the E of a proten that conssts of vertces and faces; b) the surface that s produced after the remeshng step, consstng of N = 3000 vertces and 5996 faces. The dark blue spheres depct the N =500 sub-sampled ponts, whle the green lnes depct the normals n. The output mesh s then used for the extracton of global and local shape descrptors. In order to apply the descrptor extracton algorthms, all molecules of the dataset should have the same number of mesh vertces. nce by usng the MM software we cannot determne the exact number of the extracted vertces, a remeshng step follows to produce a mesh wth the exact number of vertces N. For ths remeshng, the Computatonal Geometry Algorthms Lbrary (CGAL 1 ) has been used. Let p be the th vertex, p ts normal vector n s computed resultng n a set of Norented ponts p, n ) are further sub-sampled to generate a new set of N keyponts q,, ( = 1,, N. For each. These orented ponts = 1, N, where N N <, that provde a coarser representaton of the 3D molecule. ub-samplng s performed usng quas-random sequence, whch s a determnstc sequence that produces sample ponts more unformly dstrbuted than a pseudo-random sequence. In our case the obol sequence has been utlzed [39]. In Fg. 2a, the E of a proten s depcted. Ths mesh conssts of vertces and faces. The new surface after the remeshng step conssts of 3000 vertces and 5996 faces and t s shown n Fg. 2b. The normals n of the Norented ponts are gven n green lnes, whle the dark blue spheres depct the centers of the N sub-sampled ponts. 3 A GLOBAL HAPE DECRIPTOR BAED ON DIFFUION DITANCE The computaton of DD over the molecular surface s performed n three man steps: (a) calculaton of the Markov probablty matrx; (b) ngular alue Decomposton (D) of the matrx to generate the dffuson 1

9 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 9 map space; and (c) computaton of the dffuson dstances. Let p be the set of 2 ( p, p ) exp( p p / 2h ) j N vertces. Let () j be a kernel functon wth bandwdth h. The Gaussan kernel = s one of the most commonly used, where the bandwdth h controls the local scale of each data pont's neghborhood and p j 2 p s the Eucldean dstance between surface ponts and j. Then, the dffuson matrx L wth elements L ( p, p ) j = s normalzed as j 1 1 M = D LD by the degree matrx D wth D j = L. The normalzed dffuson matrx Ms a stochastc matrx wth all j row sums equal to one, and accordng to [42] t can be nterpreted as a random walk on a graph, where the vertces of the graph are the surface ponts = 1,, N and the weghts of the, j edges correspond to M j values. Thus, j M denotes the p (, j) one tme step ( t = 1). For any fnte tme t the Markov probablty matrx t M j s computed as M t p( t, j) j pont j at tme = 0 1 transton probablty from the surface pont j to pont n t M wth elements =, expressng the probablty dstrbuton of reachng surface pont, gven a startng p t, j = e M, where j t. Thus, the transton probablty s gven by ( ) t j e s a row vector of zeros wth a sngle entry equal to one at the j-th coordnate. Let the D of matrx t M be M t T = AΣ B, where = ( σ, σ, 0 1, ) Σ dag σ k and σ σ σ 0 are the k + 1 sngular values of 0 1 k = and B = [ b, b1,, ] wth = { a ( 1), a ( 2),, a( N )} t M, A [ a 0, a1,, a k ] { b ( 1), b( 2),, b( N )} 0 b k a and b = are the left and rght sngular vectors, respectvely, and a and 0 b are the frst 0 left and rght egenvectors, correspondng to the frst ( σ 0 =1) egenvalue. Note that followng [42], the frst egenvalue and the respectve egenvectors are excluded from the dffuson process and are used only for normalzaton purposes. The dffuson dstance between surface ponts, j at tme t s calculated as: t t t where ( ) = σ b( ) σ b ( ),, σ b ( ) t D ( ) k t k, j = Ψ Ψ j (1) 2 ( ) ( ) ( ) 2 t 1 1, 2 2 Ψ s the mappng of the -th surface pont from the orgnal kernel space (formed by the kernel functon () ) to the dffuson map space at tme t. t 3.1 Modal Representaton of Dffuson Dstance Gven the computaton of dffuson dstances between the molecular surface ponts, the next step s to explot ths feature for the computaton of a global shape descrptor. A common technque that has been already fol-

10 10 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC { } lowed n smlar works [28] s to accumulate these parwse dstances nto a hstogram. In ths paper, we propose an alternatve approach based on a modal representaton. The dea s to apply ngular alue Decomposton to the Dffuson Dstance Matrx = D 2 t (, j) DDM, where, j= 1,, N. In ths way, DDM s sep- arated nto a matrx that contans ntrnsc shape nformaton and a matrx wth nformaton about the correspondng ponts. The D of DDM yelds: T DDM= U L (2) where the sngular value matrx L = dag λ, λ,, λ ), contans the ntrnsc nformaton about geometry, ( 1 2 n and matrces U, contan the nformaton about correspondences between ponts. The frst n sngular values { λ, λ2,, } λ n 1 consttute the Modal Representaton of Dffuson Dstance (DDMR) descrptor DDMR D of the 3D object. It has been proven n [32] that the egenvalue matrx s nvarant to samplng order of the surface ponts. eepng only a relatvely small percentage of the frst sngular values (ecton 6.1) provdes a hghly compact shape descrptor wth sgnfcantly dscrmnatve power and robustness to molecular shape conformatons. 4 AN AUGMENTED LOCAL DECRIPTOR The proposed Augmented Local Descrptor (ALD) s computed on each of the N provde a coarse approxmaton of the molecular surface. Ths results n a total of N keyponts (ecton 2) that ALD descrptor vectors ALD D (=1,, N ) that are extracted for each 3D molecule. Each descrptor vector D ALD conssts of two parts: the former s a purely local feature, the Local Descrptor based on urflet-par Relatons D s a Hybrd Local-Global feature D. HLG LDP, and the latter 4.1 A Local Descrptor based on urflet-par Relatons (LDP) The frst step for the extracton of local descrptors s to defne a local regon (patch) on the 3D surface, on whch the descrptor s computed. In our case, the local descrptor s defned on a sphercal regon of radus R centered at each keypont q, = 1,, N (Fg. 3). Regardng the computaton of geometrc features on the local patch, the urflet-par-relaton Hstogram descrptor [37] has been selected comparng wth other local descrptors (hape Impact Descrptor (ID) [69] and Local pectral Descrptor (LD) [33]), snce t acheved the hghest performance n molecular smlarty search, whle beng at the same tme fast to compute, compact and rotaton nvarant. Gven the set of orented ponts ( p, n ), = 1,, N of the 3D mole-

11 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 11 cule, the LDP s computed on the subset Q { p, n ), ( p, n ),,( p, n )} = of orented ponts wthn a ( N N sphercal regon around keypont q wth p - q R. For each par of orented ponts p, n ),( p, ), ( n 2 four attrbutes α, β, γ and δ are computed, representng the azmuthal angle, the cosne of polar angle, the drecton and the length of the translaton from p 1 to p 2, respectvely. Then, all 4-tuples ( α β, γ, δ) collected nto a 4-dmensonal jont hstogram., of Q are Fg. 3. The the local descrptor s defned on a sphercal regon (blue surface patch) of radus R centered at a keypont q. A more detaled descrpton n the computaton of attrbutes α, β, γ and δ s avalable n [37]. An mportant parameter that needs to be analyzed, though, s the number of bns kl for each dmenson of the jont hstogram. Takng nto account that the LDP descrptor D for each keypont q s a 1D vector of dmen- son LDP 4 k, the selecton of parameter L k should be such that the number of bns s adequate to produce a ds- L crmnatve descrptor, whle at the same tme kl s not very hgh so as to keep the descrptor dmensonalty low. For k < 5, the dscrmnatve power of the local feature was negatvely affected, whle for k > 5, the L descrptor dmensonalty was ncreasng dramatcally wthout achevng sgnfcant mprovement of accuracy, thus, k = 5was selected, resultng n a descrptor vector L LDP D of sze 625. The optmal value for radus R has been estmated n a smlar manner: very low values of R result n sphercal regons wth trval shape nformaton; for very hgh values of R, the local character of the descrptor, whch gves ts robustness L to non-rgd problems, dsappears. Eventually, an optmal choce for our experments was R= 0. 4 R, A where R s the radus of the 3D molecule s smallest boundng sphere. A 4.2 A Hybrd Local-Global feature (HLG) mlar to LDP, the Hybrd Local-Global feature (HLG) s computed for each keypont q, = 1,, N. More specfcally, the followng set s computed for each q :

12 12 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC where dd( p ) DD q { dd( q p ), dd( q, p ),, dd( q, p ),} =, (3), 1 2 N q, j s the dffuson dstance from the keypont q to sample pont p j, j = 1,, N. The N dffuson dstances of the set DD q are accumulated nto a 1D hstogram of k = 100 H bns. Agan, the dmenson k has been expermentally determned [35]. Ths hstogram, whch s normalzed so that the sum of h all values equals 1, consttutes the HLG descrptor HLG D of keypont q. Accordng to the above defnton, the HLG descrptor s nether a purely local feature nor a global descrptor. It combnes local characterstcs as t s computed for each keypont wth global characterstcs as t takes nto account the set of dffuson dstances of the entre molecule. HLG resembles to the Local Dstance Feature (LDF) that was proposed n [35]. However, n [35], the dstances to all ponts p j are computed usng a Manfold Rankng algorthm [41], accordng to whch each keypont q s used as the source of dffuson of rankng score for the MR. The resultng hstogram s created by all rankng scores at sample ponts p j. In ths paper, the dstances dd( p ) q, are computed usng the framework presented n ecton 3. Thus, j dffuson dstances are computed only once for both the DDMR and the HLG descrptors. 4.3 Creatng a Bag of Augmented Local Descrptors (BoALD) Durng ths step, the local LDP descrptors and the hybrd HLG descrptors are ntegrated nto a global hstogram. Ths process s summarzed n Fg. 4. Intally, for each keypont q wth LDP descrptor LDP LDP LDP ( ( 1),, ( 4 HLG HLG HLG D d d k ) and HLG descrptor D = ( d ( 1 ),, d ( k ) = L descrptor s gven by: ALD LDP LDP 4 HLG HLG ( ( ) ( ) ( ) ( ) ALD D = d 1,, d k L, d 1,, d k H (4) H, the ALD 4 D s a hstogram of dmenson k k + k = = 725. To produce a global descrptor from the N local descrptors ALD A = L H D, the Bag-of-Features approach has been utlzed. Let { v, v2,, } = v N be 1 a set of vsual words. The dmenson of each vsual word s equal to k A.e. of the ALD hstogram. The set s created by applyng k-means clusterng to a subset (tranng set) of the ALD descrptors ALD D of the molecular database. The descrptors that consttute the tranng set are selected randomly (10% of the local features of the database) n order to capture a representatve vew of the database. Each vsual word v s the

13 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 13 center of a cluster. Then, each ALD descrptor word and a hstogram of N vsual words s produced. Ths hstogram ALD D of the 3D molecule s vector quantzed nto a vsual BoALD D s called Bag-of-ALD descrptors or BoALD. The sze of vocabulary N should be carefully chosen snce t affects both retreval accuracy and computatonal cost. For large datasets, whch mply also a large number of samples to cluster, an ncrease of sze N would requre hgh computaton tmes for the k-means clusterng. On the other hand, retreval accuracy s mproved as vocabulary sze ncreases, untl a specfc upper lmt s reached, above whch no further mprovement s observed. Based on the aforementoned crtera, the optmal choce of vocabulary sze s N =1000, as t has been expermentally found. Fg. 4. The process for computng the BoALD descrptors. 5 IMILARITY MATCHING Let DDMR D and BoALD D be the DDMR and BoALD descrptor vectors that are extracted usng the methods descrbed n ectons 3 and 4, respectvely. The overall shape dssmlarty between two 3D molecules A and B can be calculated as the weghted sum of the dssmlartes of each descrptor separately: where DDMR w, DDMR ds and ds DDMR DDMR BoALD BoALD ( A, B) w ds ( A, B) + w ds ( A, B) =, (5) BoALD ds are the dssmlartes of DDMR and BoALD descrptors, respectvely, and BoALD w ther correspondng weghts. In general, the selecton of the optmal dstance metrc for each descrptor s not trval. An extensve study on the performance of the most well-known dssmlarty metrcs s avalable n [36]. In the case of the DDMR descrptor, the X-Dstance (or normalzed Manhattan Dstance) was

14 14 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC expermentally proven to be the optmal metrc: where DDMR D A, ds DDMR ( A B) DDMR = 1 DA + DDMR ( ) DB ( ) DDMR ( ) D () N DDMR D DA, = 2, (6) DDMR DB are he descrptors of molecules A and B, respectvely and N D s the dmensonalty of the descrptor vector. mlarly, the optmal dstance metrc for the BoALD descrptor s the ullback-lebler Dvergence: where BoALD D A, of the descrptor vector. ds BoALD BoALD DB ( ) ( ) A B ln BoALD N = BoALD BoALD ( A B) D ( ) D ( ),, (7) = 1 BoALD DB are the descrptors of molecules A and B, respectvely and N s the dmensonalty After selectng the optmal dssmlarty metrcs, the weghts B D DDMR w, A ( ) BoALD w need to be determned. In our case, we followed the Partcle warm Optmzaton (PO) strategy [36] for the weght optmzaton. PO s an algorthm for global optmzaton. It s motvated by the socal behavor of organsms such as brd flockng and fsh schoolng. PO optmzes a problem n whch a best soluton can be represented as a pont or surface n an n-dmensonal space. It teratvely tres to mprove a canddate soluton based on a gven qualty measure (ftness functon). PO establshes a populaton (swarm) of canddate solutons, known as partcles that move around n the search space, and are guded by the best found postons, updated whle better postons are found by the partcles. The populaton of canddate solutons, n our case, s the weghts DDMR w, BoALD w, whch can take arbtrary values between [ 0,1]. The ftness functon to be optmzed s the average Ter-1 precson, whch s calculated on a tran dataset. More specfcally, each 3D molecule of the dataset s used as query to retreve smlar objects, usng (5) as dssmlarty metrc. The retreved results are ranked n ascendng order. The Ter-1 precson s gven by the followng equaton: P C R ( ) =, = C 1 1 T (8) where s the number of frst retreved objects, R C () s the number of retreved objects wthn the - frst, whch are of the same class Cwth the query, and C s the number of objects that belong to class C. DDMR PO resulted n the followng weghts: = BoALD w, = w.

15 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 15 6 EXPERIMENTAL REULT For the expermental evaluaton of the proposed method, four dfferent datasets have been selected. The frst dataset s part of the Database of Macromolecular Movements (MolMovDB)[38], whch comprses molecules wth large conformatonal changes ( also ncludng the ntermedate morphs [70]. It conssts of 2695 PDB fles classfed nto 214 categores [55]. Each category conssts of a collecton of morphs representng dfferent states of the same molecule. Ths dataset s used for parameter selecton and for comparson wth exstng flexble molecular shape matchng approaches [27][28]. The second dataset conssts of D proten structures. It s a subset of the FP database [49] and was created by us to demonstrate the performance of the phercal Trace Transform (TT) n [16]. The 2631 protens are classfed nto 27 classes accordng to the FP/DALI algorthm [50]. Each class conssts of dfferent proten structures, whch have at least 25% smlarty n ther amno-acd sequence (accordng to the FP/DALI classfcaton). The hgh classfcaton accuracy acheved by TT n ths dataset reveals that the protens that belong to the same class, apart from ther 25% sequence smlarty, demonstrate also rgd shape smlarty. The second dataset has been used to evaluate the performance of the proposed method n rgd shape matchng of 3D proten structures and t s publcly avalable at vcl.t.gr/proten_retreval/pdb_fp.zp. It s worth mentonng that the frst two datasets are dfferent n nature and cannot be compared, snce they measure dfferent aspects of the molecular shape comparson problem (flexble vs rgd shape smlarty), ther classes have been created based on dfferent crtera and none of the datasets s subset of the other. Fnally, the thrd and fourth dataset are used to demonstrate the performance of our framework n large-scale vrtual screenng of lgands. Experments have been performed on a PC wth 5 2.8GHz processor, 4GB RAM. 6.1 Parameter electon for the DDMR Descrptor For the mplementaton of the DDMR descrptor (ecton 3.1), the Matlab Toolbox for Dmensonalty Reducton 2 (v0.8.1) has been selected, usng the default parameters h = 1 and t = 1. The dscrmnatve power of DDMR manly depends on two parameters: a) the number N of sample ponts p on the molecular surface, and b) the dmensonalty of DDMR descrptor vector,.e. the number n of frst sngular values { λ, λ2,, } λ n 1 of D (2). By ncreasng the number of sample ponts N, a hgher-qualty representaton of the surface s acheved and accuracy s mproved, however, ths results n hgher descrptor extracton tmes. Addtonally, an ncrease of n may also mprove the accuracy. We run several sets of experments us- 2

16 16 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC ng dfferent values of N and n. As a performance metrc, the average Ter-1 Precson has been selected (8). In Fg. 5 a), the average Ter-1 Precson for dfferent values of N and n, n MolMovDB s presented. It s obvous that as the number of sample ponts N ncreases a hgher precson s acheved. Usng a mesh resoluton hgher than 2000 ponts, though, the mprovement n accuracy s neglgble. mlar conclusons are drawn regardng the number n of frst sngular values. For values n hgher than 50-60, there s no sgnfcant mprovement n precson. A crtcal factor for the parameter selecton s the descrptor extracton tme. nce the process of extractng the DDMR descrptor nvolves computatons on N N matrces, the processng tme may ncrease prohbtvely as the number of sample ponts N ncreases. Ths s hghlghted n Table I, where t s obvous that for meshes consstng of 4000 ponts t takes approxmately one mnute for descrptor extracton, whle for meshes of 1000 ponts the extracton tme s less than 2 seconds. For the experments that wll be presented n the followng subsectons the values = 2000 N and = 50 n have been selected for DDMR. a) b) Fg. 5. a) Parameter selecton for DDMR descrptor: the average Ter-1 Precson n MolMovDB for dfferent values of n and N ; b) Parameter selecton for BoLDP descrptor: the average Ter-1 Precson n MolMovDB for dfferent values of radus R and N. Table I: Average extracton tmes of the DDMR descrptor for dfferent numbers of sample ponts. Number of sample DDMR Descrptor ponts N Extracton Tme (s) Parameter electon for the BoALD Descrptor The BoALD descrptor has been mplemented by us n C++ based on the works presented n [35] and [37]. The performance of BoALD s affected by several parameters: a) the radus R of the local descrptor LDP; b)

17 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 17 the number of sample surface ponts N ; c) the number of local ponts N, and d) the vocabulary sze N of the codebook. The number of surface ponts N s related to radus R as follows: a small R provdes suffcent localty to the descrptor but t requres a hgh N so that the local hstograms are well populated. The number of local ponts N affects the selecton of the vocabulary sze N : for a gven N, an ncrease of N mproves the accuracy untl a specfc upper lmt s reached. Beyond that lmt a further ncrease of N has no effect n accuracy. If we ncrease N, then we can acheve a hgher upper lmt for N resultng n a more dscrmnatve descrptor. It s worth mentonng that snce k-means clusterng (used n bag-offeatures) nvolves random selecton of cluster centers, the mean values of Ter-1 accuracy are reported, where each experment was repeated ten tmes. However, n many cases, the dfferences between the Ter-1 accuracy values are mnmal. To verfy ths, for all experments, we appled statstcal parwse t-tests, where the calculated dfferences of means were nsgnfcant at level In Fg. 5 b), the average Ter-1 Precson of the BoLDP (bag-of-features to LDP) descrptor n MolMovDB for dfferent values of radus R and number of sample ponts N s depcted. tartng from R = 0, precson ncreases as R ncreases, untl a maxmum s reached. As an example, for meshes wth 3000 ponts, the maxmum precson s acheved for 0.4 R. A Fg. 6. Parameter selecton for BoALD descrptor: the average Ter-1 Precson n MolMovDB for dfferent values of vocabulary sze N and N. In Fg. 6, the average Ter-1 Precson of the BoALD descrptor n MolMovDB for dfferent values of vocabulary sze N and number of local ponts affect the average precson. mlarly, for = 500 N s depcted. For N = 250, an ncrease of N does not N the precson s not mproved for N 1000 for N = 1000, the mprovement n accuracy comparng to = 500. Fnally, N s neglgble. It s also worth mentonng that the dmensonalty N of BoALD should be kept relatvely low to acheve faster matchng tmes.

18 18 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC For the experments that wll be presented n the sequel the values N = 3000, R=. 4 RA and = 1000 N have been selected for BoALD. Table II: Average extracton tmes of the BoALD descrptor. Number of sample ponts N LDP Descrptor Extracton Tme (s) HLG Descrptor Extracton Tme (s) BoALD bag-of-feature ntegraton tme (s) (N =500) , = The processng tmes for extracton of local features LDP and HLG and for the BoALD bag-of-feature ntegraton are gven n Table II. The codebook learnng va k-means clusterng s a computatonally expensve process. For the MolMovDB dataset wth 2695 molecules and N = 500 local features per molecule, the total number of tranng samples (10% of the dataset) s local features. The k-means clusterng of features wth vocabulary sze N = 1000 took about 1700s (28 mnutes). Then, the bag-of-features ntegraton tme for each molecule s 2.34s, thus, 6300s (105 mnutes) for the entre database. These computatons need to be performed only once, durng the pre-processng stage. N 6.3 Performance Evaluaton n MolMovDB Flexble mlarty Matchng For performance evaluaton n MolMovDB the precson-recall curve has been used, where precson s the proporton of the retreved molecules that are relevant to the query and recall s the proporton of relevant molecules n the entre database that are retreved. In a benchmark dataset that s classfed, such as MolMovDB, relevant tems are those belongng to the same category wth the query. In Fg. 7 a), a comparson of dfferent local surface descrptors n MolMovDB s presented. All local descrptors are extracted on the same set of keyponts q, followng a bag-of-features computaton to produce a global descrptor vector. For the local descrptors reported n secton 4.1, namely the hape Impact Descrptor (ID), the Local pectral Descrptor (LD) and the Local Descrptor based on urflet-par Relatons (LDP), the Bag-of-ID (BoID), Bag-of-LD (BoLD) and Bag-of-LDP (BoLDP) are created, respectvely. BoLDP acheves better retreval accuracy than the other two canddates, whch justfes ts selecton as a local feature. Moreover, the contrbuton of spatal context as a complementary feature to the purely local descrptors s also demonstrated n Fg. 7 a). Combnng LDP and the hybrd HLG nto the proposed BoALD descrptor acheves sgnfcantly hgher performance than the purely local descrptors. It s worth mentonng that BoALD s more dscrmnatve than the BoFoG descrptor presented n [35], whch also combnes a local wth a hybrd descrptor.

19 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 19 a) b) Fg. 7. a) Comparson of BoID, BoLD, BoLDP, BoFoG and BoALD n MolMovDB; b) Comparson of DD-Hst, GDMR, IDMR, DDMR, BoALD and DDMR-BoALD n MolMovDB. Another nnovatve feature of the proposed work s the modal representaton of the dffuson dstance matrx, whch results n the DDMR descrptor. In Fg. 7 b), DDMR s compared aganst the method of [28], whch accumulates the parwse dffuson dstances nto a hstogram (DD-Hst). The proposed DDMR descrptor outperforms DD-Hst especally for hgher values of recall. The superorty of Dffuson Dstance over Geodesc Dstance and Inner Dstance, n capturng molecular flexblty, s also demonstrated n Fg. 7 b). The modal representatons of GD (GDMR) and ID (IDMR) are derved by substtutng the Dffuson Dstance Matrx n equaton (2) wth the Geodesc Dstance Matrx and Inner Dstance Matrx, respectvely. Agan, the proposed DDMR descrptor acheves hgher retreval accuracy. Fnally, the combnaton of DDMR wth BoALD, usng the weghted sum of dssmlartes (5), s presented n Fg. 7 b). DDMR-BoALD clearly outperforms the rest of descrptors, whch confrms our assumpton that the combnaton of a global feature (DDMR) wth a local feature (BoALD) acheves hgher retreval accuracy than each descrptor separately. Fg. 13 shows three morph deformatons for each of the followng macromolecules: a) Dehydroqunase, b) NHP6A and c) trp repressor. The molecule n the frst column s gven as query and the respectve ones n second and thrd columns are retreved wthn the frst rankng postons. Despte the changes n ther global shape due to molecular flexblty, the morphs stll demonstrate hgh smlarty to the query. 6.4 Evaluaton of Rgd mlarty Matchng In Fg. 8, the precson-recall curves for the second dataset (subset of FP) are depcted. Our DDMR-BoALD descrptor s compared wth TT [16], whch s a rgd shape matchng method. It s obvous that DDMR- BoALD outperforms TT n a rgd-shape dataset as well. Ths s manly due to the fact that the combnaton of ntrnscally dfferent features (a global, a local and a hybrd local-global) ncreases the robustness of the resultng descrptor.

20 20 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC Fg. 8. Comparson of the proposed method wth TT n the subset of FP database that was used n [16]. 6.5 Comparson wth structural algnment methods In the experments presented n the prevous subsectons, the proposed method s compared wth descrptorbased approaches. A comparson of DDMR-BoALD descrptor wth two structural algnment methods, FATCAT [65] and TM-Algn [64], s presented n Fg. 9, where ther superorty over our descrptor n terms of performance s obvous. In general, structural algnment methods acheve better performance than descrptor-based methods. However, for a thorough comparson between these ntrnscally dfferent approaches, addtonal parameters need to be taken nto account. Frst of all, t s worth mentonng that DDMR- BoALD reles on geometrc nformaton only, whch makes t approprate for use n a wde range of molecules, both large macromolecules and small lgands. Ths s not possble n the case of structural algnment methods, whch are lookng for correspondences between atoms/resdues. As an example, FATCAT and TM- Algn cannot be appled to the lgands of secton 6.7. Another mportant parameter s the effcency of the method. In Table III, the tmes for comparng a par of molecules usng TM-Algn, FATCAT and the proposed DDMR-BoALD descrptor are reported. DDMR-BoALD s tmes faster than TM-Algn and tmes faster than FATCAT. Based on the above, descrptor-based and structural algnment methods should not be compettve but they should work collaboratvely,.e. a descrptor-based method can be used for fast flterng, at a frst stage, and a structural algnment method can be used to refne a smaller subset of the results. Table III: Average CPU tmes for comparng a par of molecules usng TM-Algn, FATCAT and the proposed DDMR-BoALD descrptor. Method TM-Algn FATCAT DDMR-BoALD Average CPU Tme for Parwse Comparson 0.25s 1.4s 0.02ms In Table I and Table, the performance of combnng the proposed DDMR-BoALD wth TM-Algn method s demonstrated n MolMovDB and the subset of FP datasets, respectvely. More specfcally, each tem of the dataset s used as query and DDMR-BoALD s appled to match the query wth all tems of the dataset (fast flterng stage). Then, TM-Algn s appled only to the frst ranked results for re-rankng. Dfferent percentages of the

21 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 21 frst ranked results are shown (from 20% of frst ranked to 80%). Performance s measured n Nearest Negbour, Ter-1 precson and Ter-2 precson. These evaluaton measures share the smlar dea, that s, to check the rato of models n the query s class that also appear wthn the top matches, where = 1 for Nearest Neghbor, = C 1 for Ter-1, and = 2 *( C 1) for Ter-2 and C s the number of class members. The reported scores are averaged by all the objects n database. It should be stressed that t was not possble to compute precson-recall dagrams, snce, for some tems (queres) of the dataset, recall of all relevant (to the query) tems (100%recall) may requre retreval of more than 80% of the frst ranked results. a) b) Fg. 9. Comparson of the proposed method wth a) FATCAT and TM-Algn n MolMovDB and b) TM-Algn n the subset of FP database From Table I and Table, t s clear that when we apply the TM-Algn as a re-rankng step to dfferent percentages of the ntally ranked results usng the DDMR-BoALD fast flterng, we preserve the accuracy hgh n terms of NN, Ter1 and Ter2, whle we sgnfcantly speed up the matchng tme. In ths set of experments, the percentages of the ntally ranked results vary from 20 to 100%, where n the case of 100% TM-Algn s appled to all the ranked results wthout the DDMR-BoALD fast flterng. By ncreasng the percentage of the frst ranked results to apply TM-Algn from 20% to 100%, NN s almost not affected, the mprovement n Ter-1 and Ter-2 s mnor (less than 2%), whle the matchng tme can be up to 5 tmes faster. Table I: Performance (average Nearest Neghbour, Ter-1 precson and Ter-2 precson) and matchng tmes of the proposed DDMR-BoALD method, the TM-Algn method and ther combnaton, n MolMovDB dataset. Percentages show the amount of frst tems ranked by DDMR-BoALD that are kept for re-rankng wth TM-Algn. In the last column, only the TM-Algn method has been used (no flterng step wth the proposed method has been appled). DDMR- TM-Algn TM-Algn TM-Algn TM-Algn TM-Algn BoALD (20%) (40%) (60%) (80%) (100%) NN Ter Ter All-to-all Matchng Tme (s)

22 22 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC Table : Performance (average Nearest Neghbour, Ter-1 precson and Ter-2 precson) and matchng tmes of the proposed DDMR-BoALD method, the TM-Algn method and ther combnaton, n the subset of FP dataset. Percentages show the amount of frst tems ranked by DDMR-BoALD that are kept for re-rankng wth TM-Algn. In the last column, only the TM-Algn method has been used (no flterng step wth the proposed method has been appled). For 20%, average Ter-2 precson cannot be computed, snce there are classes n the dataset, where the number of tems needed for Ter-2 s greater than the 20% of the dataset. DDMR- TM-Algn TM-Algn TM-Algn TM-Algn TM-Algn BoALD (20%) (40%) (60%) (80%) (100%) NN Ter Ter All-to-all Matchng Tme (s) rtual creenng of Lgands The proposed method has been also evaluated n large-scale vrtual screenng of lgand molecules, where the nvestgaton of an accurate algorthm for rapd shape matchng s a major scentfc challenge. Two benchmark datasets have been used n our tests. The frst s called the Drectory of Useful Decoys (DUD) [43]. DUD s derved from the ZINC database of commercally avalable compounds for vrtual screenng [44]. A subset of DUD 3 was downloaded, whch conssts of 13 targets and has been already used n recent studes [24]. The dataset s presented n Table I. More specfcally, each of the 13 targets s used as query to retreve smlar molecules from ts correspondng set of actves+decoys (e.g. ace s used as query n the set of 46 actves and 1796 decoys and so on). The more actves are ncluded among the frst retreved results the better the accuracy of the search algorthm s. The data n Table I are adapted from the work n [24], however, we provde t here as well, n order to have a better vsualzaton of the dataset. Table I: The subset of DUD dataset [24] that was used n our experments Target PDB Actves Decoys Decoys per Actve angotensn-convertng enzyme (ace) 1o acetylcholnesterase (ache) 1eve cycln-dependent knase 2(cdk2) 1ckp cyclooxygenase-2(cox2) 1cx epdermal growth factor receptor(egfr) 1m factor Xa(fxa) 1f0r HI reverse transcrptase(hvrt) 1rt enoyl ACP reductase(nha) 1p P38 mtogen actvated proten(p38) 1kv phosphodesterase(pde5) 1xp platelet derved growth factor receptor knase(pdgfrb) 1t tyrosne knase RC(src) 2src vascular endothelal growth factor receptor(vegfr2) 1fg The second benchmark s the ant-hi dataset derved from the Natonal Cancer Insttute 4 (NCI) and s employed to smulate a typcal vrtual screenng experment. It conssts of compounds [45], whch are splt nto 423 confrmed actves, 1081 moderately actves and confrmed nactves. The structures are

23 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 23 avalable for download 5 n DF format. The objectve of the vrtual screenng experment n ths dataset s to use the 1081 moderately actves as queres and search nto the database of actves and nactves. The more confrmed actves are retreved among the frst ranked results the hgher the accuracy of the algorthm s. Three dfferent metrcs have been used to evaluate the performance of the proposed method n these datasets. The frst s the Enrchment Factor (EF) [46], whch descrbes the rato of actves retreved relatve to the percentage of the database scanned: x EF = N T a A N T D x (9) where T A s the total number of actves n the database of sze T D and N a s the number of actves n the top x percent N x of the database. Another metrc s the Boltzmann Enhanced Dscrmnaton of Recever Operatng Characterstc (BEDROC) [47], calculated as: BEDROC= n N n r a N e (10) Ra snh( a / 2) 1 + a(1 R ) ( a / 2) cosh( a / 2 ar ) 1 e = 1 a a 1 e cosh a a / N e 1 where n s the number of actves among N compounds, R a = n / N, r s the rank of the th actve and a s a weghtng parameter. In our experments, a = s selected, whch corresponds to = 5% relatve rank. mlarly, x = 5% s also selected for the EF metrc (9). Fnally, the Area Under Curve for Recever Operator Characterstc (ROCAUC) [24] s computed by: N a 1 Ndecoys AUCROC= 1 (11) N N where N a and N d s the number of actves and decoys, respectvely, and a d x of the N decoys s the number of decoys ranked above the th actve. The proposed DDMR-BoALD descrptor s compared wth two approaches for fast vrtual screenng, whch are also based on shape smlarty matchng. The frst one s the 3D Zernke Descrptor (3DZD) [24], whch s based on a seres expanson of a gven 3D functon. The second one s the Ultrafast hape Recognton (UR) scheme [11], whch represents the molecular shape as a set of statstcal moments generated from all-atom dstance dstrbutons that are calculated wth respect to preselected reference locatons. Both aforementoned methods are rotaton-nvarant,.e. are able to capture the shape nformaton ndependent of orentaton. 5

24 24 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC Performance on the 13 targets of the DUD Dataset 3DZD UR DDMR-BoALD 1 0,9 0,8 Performance on the 13 targets of the DUD Dataset 3DZD UR DDMR-BoALD 10 0,7 EF 5% 8 6 AUCROC 0,6 0,5 0,4 4 0,3 2 0,2 0,1 0 ace ache cdk2 cox2 egfr fxa hvrt nha p38 pde5 pdgfrb src vegfr2 a) b) Fg. 10. Performance of the 3DZD, UR and the proposed method on the 13 targets of the DUD dataset, usng a) the Enrchment Factor metrc; b) the AUCROC metrc. In Fg. 10 a), Fg. 10 b) and Fg. 11, the performance of 3DZD, UR and DDMR-BoALD on the 13 targets of the DUD dataset s gven for the metrcs EF ( x = 5% ), AUCROC and BEDROC ( a = ), respectvely. For 3DZD, the descrptor of order-12 usng Correlaton Coeffcent as dstance metrc s reported, whle for UR, the descrptor of order-16 usng Correlaton Coeffcent as dstance metrc s reported [24]. 0 ace ache cdk2 cox2 egfr fxa hvrt nha p38 pde5 pdgfrb src vegfr2 0,6 0,5 Performance on the 13 targets of the DUD Dataset 3DZD UR DDMR-BoALD 0,4 BEDROC ,3 0,2 0,1 Fg. 11. Performance of the 3DZD, UR and the proposed method on the 13 targets of the DUD dataset, usng the BEDROC metrc. Regardng the EF metrc, the proposed method outperforms the other two n 4 out of 13 targets of the DUD Dataset, whle 3DZD and UR are better n 5 and 4 targets, respectvely. For the AUCROC metrc, DDMR-BoALD s better n 5 targets, 3DZD n 3 and UR n 5. Fnally, regardng the BEDROC metrc, the proposed method outperforms others n 6 targets, 3DZD n 6 and UR n 1 target. The average scores are gven n Table II. The results derved usng the 3 dfferent metrcs are not fully consstent, snce e.g. UR s better than 3DZD n EF and AUCROC but t s worse n BEDROC. Overall, the proposed method s slghtly better than the other two approaches n all metrcs. 0 ace ache cdk2 cox2 egfr fxa hvrt nha p38 pde5 pdgfrb src vegfr2 Table II: EF, AUCROC and BEDROC (average) n DUD dataset for 3DZD, UR and DDMR-BoALD. Descrptors Metrc Order EF 5% AUCROC BEDROC DZD Correlaton coeffcent UR Correlaton coeffcent DDMR-BoALD

25 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 25 The performance of 3DZD, UR and DDMR-BoALD s also compared n the ant-hi dataset. In Table III, the average values of EF ( x = 5% ), AUCROC and BEDROC ( a = ), for the three methods, are presented. everal results are avalable for both 3DZD and UR dependng on the order of expanson of descrptor and the dstance metrc used. Agan, the proposed method outperforms others n all three metrcs. A crtcal parameter that should be taken nto account n vrtual screenng, especally n large databases, s the smlarty matchng tme. In the ant-hi dataset, whch conssts of more than molecules, the search tmes for UR are approxmately s, whle for 3DZD are s. These methods are sgnfcantly faster than non-shape-based approaches, whch may take several hours for the same vrtual screenng task. The reason s that the shape-based descrptor vectors consttute a very compact representaton of the molecular structure, thus, smlarty matchng usng a common dstance metrc s rapd. The proposed DDMR- BoALD descrptor takes about 2.83s for a one-to-all matchng n the ant-hi dataset, thus, t s comparable to 3DZD. Consequently, snce DDMR-BoALD outperforms 3DZD and UR n terms of retreval accuracy, t can provde a better soluton for rapd geometrc vrtual screenng. Table III: Average values of EF, AUCROC and BEDROC n the ant-hi dataset for 3DZD, UR and the proposed method Descrptors Metrc Order EF 5% AUC ROC BEDROC 32, Correlaton coeffcent DZD Eucldean (DE) Manhattan (DM) Correlaton coeffcent UR Eucldean (DE) Manhattan (DM) DDMR-BoALD We have mplemented an onlne tool for shape smlarty search usng the proposed method. earch s performed n the followng datasets: a) the subset of the FP database; b) the subset of MolMovDB and c) the

26 26 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC ant-hi dataset. A snapshot of the onlne tool 6 s gven n Fg. 12. In ths example, a search task s performed nto MolMovDB usng as query the molecule ff3 that belongs to class The frst 14 retreved results are presented. It s worth mentonng that the frst 12 retreved results belong to the same class wth the query. These are actually representng the same molecule but wth conformatonal changes. Despte the flexblty that s observed n the lower left part of the molecules, the algorthm s robust n capturng ther global shape smlarty. The onlne tool allows vsualzaton of the 3D molecular structures usng the Jmol 7 open-source Java vewer for chemcal structures n 3D. By clckng on the thumbnal mage of a retreved molecule, a pop-up wndow of Jmol vewer appears. Fg. 12. Example of smlarty search n MolMovDB usng the proposed method ( 7 CONCLUION AND DICUION We have presented a framework for smlarty search of flexble molecules, whch explots both local and global geometrc features. The global feature s based on parwse computatons of dffuson dstances over the ponts of the surface and a sngular value decomposton of the resultng dffuson dstance matrx. The local feature s computed on each keypont of the surface by accumulatng parwse relatons among orented surface ponts nto a local hstogram. Fnally, the hybrd local-global feature s computed for each keypont, takng nto account the dffuson dstances from the keypont to all surface ponts, thus, enhancng the local keypont wth spatal context. The local and the hybrd features are concatenated nto a jont hstogram per keypont and the multple hstograms are ntegrated nto a global descrptor usng the bag-of-features ap

27 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR proach. The global and local features are combned to produce a geometrc descrptor that acheves hgher retreval accuracy than each feature does separately. The proposed method acheves hgh retreval accuracy n smlarty search of flexble molecules. In the MolMovDB dataset, whch conssts of molecules wth large conformatonal changes, the proposed framework clearly outperforms other exstng approaches n terms of precson-recall. At the same tme, DDMRBoALD descrptor acheves hgh retreval performance n datasets of rgd molecules. Addtonally, DDMRBoALD provdes a compact representaton of the 3D molecular structure; therefore, t s approprate for large-scale search tasks such as the vrtual screenng n large lgand databases. DDMR-BoALD s approprate for retrevng small lgands as well, snce t s comparable to slghtly better than exstng state-of-the-art approaches n two benchmarks for vrtual screenng. a.1 a.2 a.3 b.1 b.2 b.3 c.1 c.2 c.3 Fg. 13. Morph deformatons for the followng macromolecules a) Dehydroqunase, b) NHP6A and c) trp repressor. The molecule n the 1st column s gven as query and the respectve ones n 2nd and 3rd columns are retreved wthn the frst rankng postons, demonstratng hgh smlarty to the query. Nevertheless, the retreval accuracy especally n vrtual screenng can be further mproved, by enhancng the geometrc features wth non-geometrc ones, such as physcochemcal propertes. At the moment, the latter are exploted by approaches that are extremely tme-consumng, whch, n combnaton wth the rapd ncrease n sze of the molecular databases, leads to prohbtvely large search tmes. The effectve ntegraton of non-geometrc nformaton nto a compact representaton along wth the shape-based features stll remans a challenge for future research. Another mportant ssue s that the number of mesh vertces sampled on the 27

28 28 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC molecular surface remans the same rrespectve of the sze of the molecule. The motvaton behnd ths s the fact that a varable number of samples (proportonal to the sze of molecule) would result n descrptor vectors that are not comparable to each other. On the other hand, a fxed number of sample vertces produces scale nvarant descrptor vectors, that s molecules of smlar shape but wth dfferent sze are regarded as smlar. Whle the latter could be regarded as an advantage n the case of generc object retreval, n molecular smlarty comparson t ntroduces a lmtaton to the proposed descrptor. Thus, a challenge for future research s to nvestgate methods that are able to embed sze nformaton to the resultng descrptors. REFERENCE [1] A. Bender and R. C. Glen. "Molecular smlarty: a key technque n molecular nformatcs." Organc & bomolecular chemstry 2, no. 22 (2004): [2]. enkatraman, L. ael and D. hara. "Potental for proten surface shape analyss usng sphercal harmoncs and 3D Zernke descrptors." Cell bochemstry and bophyscs (2009): [3] D. hara and J. kolnck, The PDB s a coverng set of small proten structures, Journal of Molecular Bology, 334, , [4] L. Holm and C. ander. "Proten structure comparson by algnment of dstance matrces." Journal of molecular bology (1993): [5]. Mzuguch and N. Gö, "Comparson of spatal arrangements of secondary structural elements n protens", Proten Engneerng, 8.4 (1995): [6] M. R. Betancourt and J. kolnck, "Local propenstes and statstcal potentals of backbone dhedral angles n protens", Journal of molecular bology (2004): [7]. noshta and H. Nakamura, "Identfcaton of proten bochemcal functons by smlarty search usng the molecular surface database ef ste", Proten cence 12.8 (2003): [8] M. E. Bock, G. M. Cortelazzo, C. Ferrar and C. Guerra, Identfyng smlar surface patches on protens usng a spn-mage surface representaton, In Combnatoral Pattern Matchng. prnger Berln Hedelberg, p [9] M. Ankerst, G. astenmüller, H. P. regel and T. edl, "3D shape hstograms for smlarty search and classfcaton n spatal databases." Advances n patal Databases. prnger Berln Hedelberg, [10] J.. Yeh, D.Y. Chen, B.Y. Chen, M. Ouhyoung, A web-based three-dmensonal proten retreval system by matchng vsual smlarty, Bonformatcs 2005, 21(13): [11] P. J. Ballester and W. G. Rchards, "Ultrafast shape recognton to search compound databases for smlar molecular shapes", Journal of Computatonal Chemstry (2007): [12] P. J. Ballester and W. G. Rchards, "Ultrafast shape recognton for smlarty search n molecular databases", Proceedngs of the Royal ocety A: Mathematcal, Physcal and Engneerng cence (2007): [13] M-. Hu, "sual pattern recognton by moment nvarants", IRE Transactons on Informaton Theory, 8.2 (1962): [14] M. R. Teague, "Image analyss va the general theory of moments", J. Opt. oc. Am 70.8 (1980): [15] C-H. Teh and R. T. Chn, "On mage analyss by the methods of moments", IEEE Transactons on Pattern Analyss and Machne Intellgence, 10.4 (1988): [16] P. Daras, D. Zarpalas, A. Axenopoulos, D. Tzovaras and M. G. trntzs, Three-dmensonal shape-structure comparson method for proten classfcaton, IEEE/ACM Transactons on Computatonal Bology and Bonformatcs (TCBB), 2006, 3(3), [17] W.Ca, J. Xu, X. hao,. Leroux, A. Beautrat and B. Magret, HEF: a vht geometrcal flter usng coeffcents of sphercal harmonc molecular surfaces, Journal of molecular modelng, 2008, 14(5), [18] R. J. Morrs, R. J. Najmanovch, A. ahraman and J. M. Thornton, Real sphercal harmonc expanson coeffcents as 3D shape descrptors for proten bndng pocket and lgand comparsons Bonformatcs, 2005, 21(10), [19] D. W. Rtche and G. J. emp. "Fast computaton, rotaton, and comparson of low resoluton sphercal harmonc molecular surfaces." Journal of Computatonal Chemstry 20.4: , [20] D. W. Rtche and G. J. emp, "Proten dockng usng sphercal polar Fourer correlatons." Protens: tructure, Functon, and Bonformatcs, 39.2 (2000): [21] L. Mak,. Grandson and R. J. Morrs, "An extenson of sphercal harmoncs to regon-based rotatonally nvarant descrptors for molecular shape descrpton and comparson", Journal of Molecular Graphcs and Modellng 26.7 (2008): [22] L. ael, B. L, D. La, Y. Fang,. Raman, R. Rustamov and D. hara, (2008). Fast proten tertary structure retreval based on global surface shape smlarty. Protens: tructure, Functon, and Bonformatcs, 72(4), [23]. enkatraman, Y. Yang, L. ael and D. hara, Proten-proten dockng usng regon-based 3D Zernke descrptors, Bmc Bonformatcs, 10(1), 407, [24]. enkatraman, P. R. Chakravarthy and D. hara, "Applcaton of 3D Zernke descrptors to shape-based lgand smlarty searchng", J Chemnform 17.1 (2009): 19. [25] Y. Fang, Y- Lu and. Raman, Three dmensonal shape comparson of flexble protens usng the local-dameter descrptor, BMC tructural Bology 2009, 9:29 do: / [26] Y- Lu, Y. Fang and. Raman, ID: deformaton nvarant sgnatures for molecular shape comparson, BMC Bonformatcs 2009, 10:157 do: / [27] Y- Lu,. Raman and M. Lu, Computng the Inner Dstances of olumetrc Models for Artculated hape Descrpton wth a sblty Graph, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 33, no. 12, December [28] Y- Lu, Q. L, G-Q. Zheng,. Raman, W. Benjamn, Usng dffuson dstances for flexble molecular shape comparson, BMC Bonformatcs 2010, 11:480. [29]. Yn, E. A. Proctor, A. A. Lugovskoy, and N.. Dokholyan, Fast screenng of proten surfaces usng geometrc nvarant fngerprnts, PNA 2009, vol. 106, no. 39, pp , eptember 29, 2009.

29 AXENOPOULO ET AL.: IMILARITY EARCH OF FLEXIBLE MOLECULE COMBINING LOCAL AND GLOBAL HAPE DECRIPTOR 29 [30] P. Heder, A. Perre-Perre, R. L and C. Grmm, "Local shape descrptors, a survey and evaluaton", In Proceedngs of the 4th Eurographcs conference on 3D Object Retreval, pp Eurographcs Assocaton, [31] Z. Lan, A. Godl, X. un, H. Zhang, Non-rgd 3D shape retreval usng multdmensonal scalng and bag-of-features, n Proceedngs of the Internatonal Conference on Image Processng (ICIP2010), 2010, pp [32] D. meets, T. Fabry, J. Hermans, D. andermeulen, P. uetens, Isometrc deformaton modelng for object recognton, n Proceedngs of the 13 th Internatonal Conference on Computer Analyss of Images and Patterns (CAIP 09), 2009, pp [33] G. Lavoue, Bag of words and local spectral descrptor for 3D partal shape retreval, n Proceedngs of the Eurographcs Workshop on 3D Object Retreval (3DOR 11), 2011, pp [34] R. Ohbuch,. Osada, T. Furuya, T. Banno, alent Local sual Features for hape-based 3D Model Retreval, Proc. IEEE Internatonal Conference on hape Modelng and Applcatons (MI 08), tony Brook Unversty, June 4-6, [35]. awamura,. Usu, T. Furuya, and R. Ohbuch. "Local goemetrcal feature wth spatal context for shape-based 3D model retreval." In Proceedngs of the 5th Eurographcs conference on 3D Object Retreval, pp Eurographcs Assocaton, [36] P. Daras, A. Axenopoulos, G. Ltos, "Investgatng the Effects of Multple Factors towards more Accurate 3D Object Retreval", IEEE Transactons on Multmeda, ol. 14, No. 2, Page(s): , Aprl [37] E. Wahl, U. Hllenbrand and G. Hrznger, "urflet-par-relaton hstograms: a statstcal 3D-shape representaton for rapd classfcaton", IEEE Fourth Internatonal Conference on 3-D Dgtal Imagng and Modelng, 3DIM [38] N. Echols, D. Mlburn and M. Gersten, MolMovDB: Analyss and vsualzaton of conformatonal change and structural flexblty, Nuclec Acds Research 2003, 31: [39] W. H. Press,. A. Teukolsky, W. T. etterlng and B. P. Flannery, Numercal recpes n C+: the art of scentfc computng, Cambrdge: Cambrdge Unversty Press, ol. 994, [40] R. Osada, T. Funkhouser, B. Chazelle and D. Dobkn, hape dstrbutons, ACM Transactons on Graphcs, 2002, 21(4): [41] D. Zhou, O. Bousquet, T. N. Lal, J. Weston and B. chölkopf, Learnng wth Local and Global Consstency, NIP [42] B. Nadler,. Lafon, R. R. Cofman, I. G. evrekds, Dffuson Maps, pectral Clusterng and Egenfunctons of Fokker-Planck Operators, n Advances n Neural Informaton Processng ystems 18, [43] N. Huang, B.. hochet, J. J. Irwn, Benchmarkng sets for molecular dockng, J. Med. Chem. 2006, 49: [44] J.J. Irwn, B.. hochet, ZINC a free database of commercally avalable compounds for vrtual screenng, J. Chem. Inf. Model 2005, 45: [45] O.. Weslow, R. ser, D. L. Fne, J. Bader, R. H. hoemaker, M. R. Boyd, New soluble-formazan assay for HI-1 cytopathc effects: applcaton to hghflux screenng of synthetc and natural products for AID-antvral actvty, J. Natl. Cancer. Inst. 1989, 81: [46] A. Bender, R. C. Glen, A dscusson of measures of enrchment n vrtual screenng: comparng the nformaton content of descrptors wth ncreasng levels of sophstcaton, J. Chem. Inf. Model 2005, 45: [47] J. F. Truchon, C. I. Bayly, Evaluatng vrtual screenng methods: good and bad metrcs for the early recognton problem, J. Chem. Inf. Model 2007, 47: [48] M. Hattor, Y. Okuno,. Goto, M. anehsa, Development of a chemcal structure comparson method for ntegrated analyss of chemcal and genomc nformaton n the metabolc pathways, J. Am. Chem. oc. 2003, 125: [49] L. Holm and C. ander, The FP Database: Fold Classfcaton Based on tructure-tructure Algnment of Protens, Nuclec Acds Research, vol. 24, pp , [50] L. Holm and C. ander, Tourng Proten Fold pace wth Dal/FP, Nuclec Acds Research, vol. 26, pp , [51] M.F. anner, A.J. Olson, and J.-C. pehner. Fast and robust computaton of molecular surfaces. In 11th ACM ymposum on Computatonal Geometry, [52] R. olodny, D. Petrey, and B. Hong, Proten structure comparson: mplcatons for the nature of fold space, and structure and functon predcton, Curr. Opn. truct. Bol. 2006;16: [53] I. N. hndyalov, P. E. Bourne, Proten structure algnment by ncremental combnatoral extenson (CE) of the optmal path, Proten Eng. 1998;11: [54] P. Daras, A. Axenopoulos, "A Compact Mult-ew Descrptor for 3D Object Retreval" IEEE 7th Internatonal Workshop on Content-Based Multmeda Indexng (CBMI 2009), Chana, Greece, Jun [55]. C. Flores, L. J. Lu, J. Yang, N. Carrero, and M. B. Gersten, Hnge Atlas: relatng proten sequence to stes of structural flexblty, BMC Bonformatcs 8: 167, [56] R. Gal, A. hamr and D. Cohen-Or. "Pose-oblvous shape sgnature." IEEE Transactons on sualzaton and Computer Graphcs, 13.2 (2007): [57] R. M. Rustamov, "Laplace-Beltram egenfunctons for deformaton nvarant shape representaton." Proceedngs of the ffth Eurographcs symposum on Geometry processng. Eurographcs Assocaton, [58] L. Nann, J. Y. h,. Brahnam and A. Lumn, Proten classfcaton usng texture descrptors extracted from the proten backbone mage Journal of theoretcal bology, 264(3), , [59] D.-Y. Chen, X. P. Tan, Y. T. hen and M. Ouhyoung, "On vsual smlarty based 3D model retreval" In Computer graphcs forum, vol. 22, no. 3, pp Blackwell Publshng, Inc, [60] T. Furuya and R. Ohbuch, Dense samplng and fast encodng for 3D model retreval usng bag-of-vsual features, CIR 2009, Artcle 26, do> / [61] Z. Lan, A. Godl, B. Bustos, M. Daoud, J. Hermans,. awamura, Y. urta, G. Lavoué, H. an Nguyen, R. Ohbuch, Y. Ohkta, Y. Ohsh, F. Porkl, M. Reuter, I. pran, D. meets, P. uetens, H. Taba, D. andermeulen, A comparson of methods for non-rgd 3D shape retreval, Pattern Recognton, olume 46 Issue 1, January, 2013, Pages [62] B. L, A. Godl, M. Aono, X. Ba, T. Furuya, L. L, R. López-astre, H. Johan, R. Ohbuch, C. Redondo-Cabrera, A. Tatsuma, T. Yanagmach, and. Zhang, HREC'12 Track: Generc 3D hape Retreval, Eurographcs Workshop on 3D Object Retreval [63] Y. Ohkta, Y. Ohsh, T. Furuya, R. Ohbuch, Non-rgd 3D Model Retreval Usng et of Local tatstcal Features, IEEE Internatonal Conference on Multmeda and Expo Workshops (ICMEW), pp , [64] Y. Zhang, J. kolnck, TM-algn: A proten structure algnment algorthm based on TM-score, Nuclec Acds Research, : [65] Y. Ye and A. Godzk, "Flexble structure algnment by channg algned fragment pars allowng twsts", Bonformatcs, 2003, 19(uppl 2):II246-II255. [66] I.N. hndyalov, P.E. Bourne, Proten structure algnment by ncremental combnatoral extenson (CE) of the optmal path, Proten Engneerng 11(9) , [67] M. hatsky, H.J. Wolfson, and R. Nussnov, "Flexble proten algnment and hnge detecton", Protens: tructure, Functon, and Genetcs, 48: , 2002.

30 30 IEEE/ACM TRANACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATIC [68] Z. L, P. Natarajan, Y. Ye, T. Hrabe, A. Godzk, POA: a user-drven, nteractve multple proten structure algnment server, Nucl. Acds Res. (2014) do: /nar/gku394. [69] A.Mademls, P.Daras, D.Tzovaras and M.G.trntzs, "3D Object Retreval based on Resultng Felds", 29th Internatonal conference on EUROGRAPHIC 2008, workshop on 3D object retreval, Crete, Greece, Apr [70]. C. Flores and M. B. Gersten, "FlexOracle: predctng flexble hnges by dentfcaton of stable domans", BMC bonformatcs 8.1 (2007): 215. [71] Y.Y. Tseng, J. Dundas and J. Lang. "Predctng proten functon and bndng profle va matchng of local evolutonary and geometrc surface patterns." Journal of molecular bology (2009): [72] T. A. Bnkowsk, L. Adaman and J. Lang. "Inferrng functonal relatonshps of protens from local sequence and spatal surface patterns." Journal of molecular bology (2003): [73] J. onc and D. Janežč. "ProB-2012: web server and web servces for detecton of structurally smlar bndng stes n protens." Nuclec acds research 40.W1 (2012): W214-W221. [74] A. harma, A. Papankolaou and E.. Manolakos. "Acceleratng all-to-all proten structures comparson wth TMalgn usng a NoC many-cores processor archtecture." Parallel and Dstrbuted Processng ymposum Workshops & PhD Forum (IPDPW), 2013 IEEE 27th Internatonal. IEEE, [75] B. Y. Chen and B. Hong. "AP: a volumetrc analyss of surface propertes yelds nsghts nto proten-lgand bndng specfcty." PLo computatonal bology 6.8 (2010): e [76] B. Y. Chen, "AP-E: pecfcty Annotaton wth a olumetrc Analyss of Electrostatc Isopotentals." PLo computatonal bology 10.8 (2014): e [77]. R. Amn et al. "Predcton and expermental valdaton of enzyme substrate specfcty n proten structures." Proceedngs of the Natonal Academy of cences (2013): E4195-E4202. Apostolos Axenopoulos was born n Thessalonk, Greece, n He receved the Dploma degree n electrcal and computer engneerng and the M.. degree n advanced computng systems from Arstotle Unversty of Thessalonk, Thessalonk, Greece, n 2003 and 2006, respectvely, and the PhD n Electrcal and Computer Engneerng from the Unversty of Thessaly n He s an Assocate Researcher at the Informaton Technologes Insttute, Thessalonk. Hs man research nterestsnclude 3D object ndexng, content-based search and retreval and bonformatcs. Dmtros Rafalds was born n Larssa, Greece, n He receved the Dploma n Informatcs from the Computer cence Department, the M.c. degree n Informaton ystems and the Ph.D. degree n Informaton Retreval from the Arstotle Unversty of Thessalonk, Greece n 2005, 2007 and 2011, respectvely. Hs man research nterests nclude Machne Learnng and Pattern Recognton, Multmeda Informaton Retreval, Databases, ocal Meda and Artfcal Intellgence ystems. Petros Daras (M'07) was born n Athens, Greece, n He receved the Dploma degree n electrcal and computer engneerng, the M.c. degree n medcal nformatcs, and the Ph.D. degree n electrcal and computer engneerng, all from the Arstotle Unversty of Thessalonk, Thessalonk, Greece, n 1999, 2002, and 2005, respectvely. He s a Researcher Grade C, at the Informaton Technologes Insttute (ITI) of the Centre for Research and Technology Hellas (CERTH). Hs man research nterests nclude search, retreval and recognton of 3D objects, 3D object processng, medcal nformatcs applcatons, medcal mage processng, 3D object watermarkng and bonformatcs. He regularly serves as a revewer/evaluator of European projects and he s a member of ΙΕΕΕ, a key member of the IEEE MMTC 3DRPC IG and char of the IEEE Image, deo and Mesh Codng IG. Georgos Papadopoulos receved hs dploma n Physcs from the Arstotle Unversty of Thessalonk/Greece n He studed further theoretcal Bophyscs n the department of Physcs of the Free Unverstat Berln/Germany and receved hs Ph.D degree n Bophyscs from the same department n He worked wth short term contracts as guest scentst n the Hahn-Metner Insttut/Berln/Germany and n the Fosrchungszentrum Julch/Germany. nce 1994 he has been teachng Physcs, Bostatstcs, Bonformatcs, Physcal Chemstry and Bophyscs n the Unversty of Thessaly/Greece, Democrtus Unversty of Thrace/Greece and the Arstotle Unversty of Thessalonk/Greece. nce 2009 he s lecturer of Bophyscs n the department of Bochemstry & Botechnology/UTh. Hs Research nterests are focused on the study of the structure of bologcal macromolecules and of ther nteractons usng theoretcal and computatonal methods. Elas. N. Housts s currently a full Professor of Computer Engneerng and Communcatons department at Unversty of Thessaly, Greece, Drector of Research Center of Thessaly (CE.RE.TE.TH.), and Emertus Professor of Purdue Unversty. UA. Most of hs academc career s assocated wth Purdue Unversty. He has been a Professor of Computer cence and Drector of the Computatonal cence & Engneerng Program of Purdue Unversty. He s a member of workng groups WG2.5 IFIP on mathematcal software and European ICT Drectors. Housts' current research nterests are n the areas of problem solvng envronments, networkng and parallel computng, enterprse systems, computatonal ntellgence and fnance, and e-servces.

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Image Alignment CSC 767

Image Alignment CSC 767 Image Algnment CSC 767 Image algnment Image from http://graphcs.cs.cmu.edu/courses/15-463/2010_fall/ Image algnment: Applcatons Panorama sttchng Image algnment: Applcatons Recognton of object nstances

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices Hgh resoluton 3D Tau-p transform by matchng pursut Wepng Cao* and Warren S. Ross, Shearwater GeoServces Summary The 3D Tau-p transform s of vtal sgnfcance for processng sesmc data acqured wth modern wde

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning Computer Anmaton and Vsualsaton Lecture 4. Rggng / Sknnng Taku Komura Overvew Sknnng / Rggng Background knowledge Lnear Blendng How to decde weghts? Example-based Method Anatomcal models Sknnng Assume

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Lecture #15 Lecture Notes

Lecture #15 Lecture Notes Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

A Clustering Algorithm for Key Frame Extraction Based on Density Peak Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao

More information

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Brushlet Features for Texture Image Retrieval

Brushlet Features for Texture Image Retrieval DICTA00: Dgtal Image Computng Technques and Applcatons, 1 January 00, Melbourne, Australa 1 Brushlet Features for Texture Image Retreval Chbao Chen and Kap Luk Chan Informaton System Research Lab, School

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Feature Selection for Target Detection in SAR Images

Feature Selection for Target Detection in SAR Images Feature Selecton for Detecton n SAR Images Br Bhanu, Yngqang Ln and Shqn Wang Center for Research n Intellgent Systems Unversty of Calforna, Rversde, CA 95, USA Abstract A genetc algorthm (GA) approach

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Gender Classification using Interlaced Derivative Patterns

Gender Classification using Interlaced Derivative Patterns Gender Classfcaton usng Interlaced Dervatve Patterns Author Shobernejad, Ameneh, Gao, Yongsheng Publshed 2 Conference Ttle Proceedngs of the 2th Internatonal Conference on Pattern Recognton (ICPR 2) DOI

More information

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros. Fttng & Matchng Lecture 4 Prof. Bregler Sldes from: S. Lazebnk, S. Setz, M. Pollefeys, A. Effros. How do we buld panorama? We need to match (algn) mages Matchng wth Features Detect feature ponts n both

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL Nader Safavan and Shohreh Kasae Department of Computer Engneerng Sharf Unversty of Technology Tehran, Iran skasae@sharf.edu

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

Correlative features for the classification of textural images

Correlative features for the classification of textural images Correlatve features for the classfcaton of textural mages M A Turkova 1 and A V Gadel 1, 1 Samara Natonal Research Unversty, Moskovskoe Shosse 34, Samara, Russa, 443086 Image Processng Systems Insttute

More information

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Accounting for the Use of Different Length Scale Factors in x, y and z Directions 1 Accountng for the Use of Dfferent Length Scale Factors n x, y and z Drectons Taha Soch (taha.soch@kcl.ac.uk) Imagng Scences & Bomedcal Engneerng, Kng s College London, The Rayne Insttute, St Thomas Hosptal,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

IMAGE FUSION TECHNIQUES

IMAGE FUSION TECHNIQUES Int. J. Chem. Sc.: 14(S3), 2016, 812-816 ISSN 0972-768X www.sadgurupublcatons.com IMAGE FUSION TECHNIQUES A Short Note P. SUBRAMANIAN *, M. SOWNDARIYA, S. SWATHI and SAINTA MONICA ECE Department, Aarupada

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Image Matching Algorithm based on Feature-point and DAISY Descriptor

Image Matching Algorithm based on Feature-point and DAISY Descriptor JOURNAL OF MULTIMEDIA, VOL. 9, NO. 6, JUNE 2014 829 Image Matchng Algorthm based on Feature-pont and DAISY Descrptor L L School of Busness, Schuan Agrcultural Unversty, Schuan Dujanyan 611830, Chna Abstract

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval KD Lab at mageclef 2012 Personal Photo Retreval Cha-We Ku, Been-Chan Chen, Guan-Bn Chen, L-J Gaou, Rong-ng Huang, and ao-en Wang Knowledge, nformaton, and Database ystem Laboratory Department of Computer

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Unsupervised Co-segmentation of 3D Shapes via Functional Maps

Unsupervised Co-segmentation of 3D Shapes via Functional Maps Unsupervsed Co-segmentaton of 3D Shapes va Functonal aps Jun Yang School of Electronc and Informaton Engneerng, Lanzhou Jaotong Unversty, Lanzhou 730070, P. R. Chna yangj@mal.lzjtu.cn Zhenhua Tan School

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information