Data-dependent Hashing Based on p-stable Distribution

Size: px

Start display at page:

Download "Data-dependent Hashing Based on p-stable Distribution"

Jemimah Gallagher
5 years ago
Views:

Data-depent Hashng Based on p-stable Dstrbuton Author Ba, Xao, Yang, Hachuan, Zhou, Jun, Ren, Peng, Cheng, Jan Publshed 24 Journal Ttle IEEE Transactons on Image Processng

Permsson from IEEE must be obtaned for all other uses, n any current or future meda, ncludng reprntng/republshng ths materal for advertsng or promotonal purposes, creatng

1 Data-depent Hashng Based on p-stable Dstrbuton Author Ba, Xao, Yang, Hachuan, Zhou, Jun, Ren, Peng, Cheng, Jan Publshed 24 Journal Ttle IEEE Transactons on Image Processng DOI Copyrght Statement 24 IEEE. Personal use of ths materal s permtted. Permsson from IEEE must be obtaned for all other uses, n any current or future meda, ncludng reprntng/republshng ths materal for advertsng or promotonal purposes, creatng new collectve works, for resale or redstrbuton to servers or lsts, or reuse of any copyrghted component of ths work n other works. Downloaded from Grffth Research Onlne

2 Data-depent Hashng Based on p-stable Dstrbuton Xao Ba, Hachuan Yang, Jun Zhou, Peng Ren and Jan Cheng Abstract The p-stable dstrbuton s tradtonally used for data-ndepent hashng. In ths paper, we descrbe how to perform data-depent hashng based on p-stable dstrbuton. We commence by formulatng the Eucldean dstance preservng property n terms of varance estmaton. Based on ths property, we develop a projecton method whch maps the orgnal data to arbtrary dmensonal vectors. Each projecton vector s a lnear combnaton of multple random vectors subject to p-stable dstrbuton, n whch the weghts for the lnear combnaton are learned based on the tranng data. An orthogonal matrx s then learned data-depently for mnmzng the thresholdng error n quantzaton. Combnng the projecton method and the orthogonal matrx, we develop an unsupervsed hashng scheme whch preserves the Eucldean dstance. Compared wth data-ndepent hashng methods, our method takes the data dstrbuton nto consderaton and gves more accurate hashng results wth compact hash codes. Dfferent from many datadepent hashng methods, our method accommodates multple hash tables and s not restrcted by the number of hash functons. To ext our method to a supervsed scenaro, we ncorporate a supervsed label propagaton scheme nto the proposed projecton method. Ths results n a supervsed hashng scheme whch preserves semantc smlarty of data. Expermental results show that our methods have outperformed several state-of-the-art hashng approaches n both effectveness and effcency. I. INTRODUCTION The volume of mage data has been ncreasng dramatcally every year. The bg data era has created great challenges to many tasks such as content-based mage retreval (CBIR). One typcal example s the nearest neghbor (NN) search, whch fnds the nearest sample for a query represented as a vectorzed descrptor n R d. It requres a dstance metrc be defned to measure the smlarty between mage descrptors, and the Eucldean dstance s one of the most wdely used metrcs. In ths scenaro, the query tme has lnear depence on the data sze, whch s mpractcal for large scale database. For data wth relatvely low dmensonalty, the problem can be solved usng tree based methods such as bnary search tree []. However, the dmensonalty of most popular mage descrptors, for example those constructed by the Bag-of- Words [2] or GIST [3], s too large. It degrades the effcency of these methods to that of exhaustve search [4]. X. Ba and H. Yang are wth School of Computer Scence and Engneerng, Behang Unversty, Bejng 9, Chna. (e-mal: baxao.buaa@googlemal.com.) J. Zhou s wth School of Informaton and Communcaton Technology, Grffth Unversty, Nathan, QLD 4, Australa. P. Ren s wth College of Informaton and Control Engneerng, Chna Unversty of Petroleum, Qngdao 2576, Chna. J. Cheng s wth Natonal Lab of Pattern Recognton, Insttute of Automaton, Chnese Academy of Scences, Bejng 9, Chna. Approxmate nearest neghbor (ANN) technques have been studed to break the bottleneck of NN search. Its key dea s to fnd an approxmate nearest neghbor rather than the exact one. Localty-senstve hashng () has been ntroduced for ths purpose [5] and has attracted lots of attenton. Its objectve s to map the orgnal vector v R d to a bnary strng y {,} r such that neghborng samples n the orgnal feature space have smlar bnary codes n the Hammng space. However, smple feature smlarty such as that based on Eucldean dstance n the orgnal feature space usually cannot fully capture the semantc smlarty,.e., the real affnty between the contents of objects. For example, n CBIR applcatons, f the mages are represented as GIST descrptors, the Eucldean metrc may result n some false postve nstances for a gven query. One possble soluton for ths problem s to ntroducng supervsed learnng based strateges nto hashng, whch have led to sgnfcant mprovement of the CBIR performance. Hashng methods whch only am at preservng feature smlarty are called unsupervsed hashng, and those based on supervsed learnng strategy are called supervsed hashng. Alternatvely, Hashng based technques can be classfed nto two categores, data-depent hashng or datandepent hashng, depng on whether or not they employ a tranng set to learn the hash functon. Data-ndepent hashng does not requre tranng data. A typcal example s the method presented n [6], whch uses data-ndepent mathematcal propertes to guarantee that the probablty of collson between hash codes reflects the Eucldean dstance of samples. The performance of data-ndepent methods s robust to the data varatons because the hash functons are establshed subject to specfc rules wthout the tranng process. The randomness property enables the data-ndepent methods to generate arbtrary number of hash functons. So one can construct multple hash tables to boost the recall rate. However, such methods suffer from the hgh demand on the dmensonalty of bnary representaton,.e., the length of codes r has to be very large n order to reduce the false postve rate. Ths ncreases the storage costs and degrades the query effcency. Data-depent hashng methods, on the contrary, am at learnng hash functons from a tranng set. A common objectve s to explctly make the smlarty measured n the orgnal feature space be preserved n the Hammng space [7], [8], [9], []. Some methods, such as kernelzed localty senstve hashng (K) [], do not have an explct objectve functon but stll requre a tranng set. Compared wth data-ndepent counterpart, data-depent hashng methods allow compact codng, whch s very feasble n practce. A typcal knd

3 nmages Random project (c dms). (c dms) Learnng Orthogonal transformaton M-ITQ M-SLP Assgn quas hash bt r tmes Propagaton r tmes Threshold by sgn( ) r tmes Threshold by probablty n r-bt bnary codes - (r bts) r tmes r tmes Data-ndepent Data-depent Fg.. The proposed method on extng p-stable dstrbuton theory to data-depent hashng. of data-depent method s the supervsed hashng whch not only consders data dstrbuton, but also ncorporates pror nformaton such as class labels for supervsed learnng. The dsadvantages of data-depent methods s that ther performance may be too depent on the tranng set and they usually have lmted amount of hash functons. We can see that both data-ndepent and data-depent solutons have ther pros and cons. An ntutve dea to overcome ther shortcomngs s developng an ntegrated strategy whch combnes both data-depent hashng and datandepent hashng, and makes them complementary to each other. To acheve ths goal, we propose a hashng method based on p-stable dstrbuton. The p-stable dstrbuton [2] s tradtonally used n data-ndepent hashng methods [6]. It has specal mathematcal propertes that guarantee the dstance under l p norm to be recovered by the projectons on specfc random vectors. In our work, we ext the p-stable dstrbuton to the data-depent settng. An overvew of the proposed method s llustrated n Fgure. Frstly, we project one orgnal feature vector through multple random vectors, and learn a sngle projecton vector for approxmatng the multple random vectors accordng to the data dstrbuton. The same procedure s repeated for r tmes, and gves r projecton vectors. Ths s dfferent from, whch drectly uses a sngle random vector as each projecton vector, and we thus refer to our method as multple localty senstve hashng (M). Based on M, we then apply an orthogonal transformaton [3] to the obtaned projecton vectors for preservng the Eucldean dstance wth bnary codes. Convenently, we refer to ths process as M-ITQ (M wth teratve quantzaton). Furthermore, we use the projecton result of M to assgn quas hash bts for some tranng samples and perform a label propagaton [4] lke process wth respect to the semantc smlarty to generate hash bts for the rest. We refer to ths supervsed hashng method as M-SLP (M wth supervsed label propagaton). In [5], we ntroduced the p-stable dstrbuton theory nto the data-depent hashng. Ths method conssts of two stages. In the frst stage, Gaussan random vector s drectly used to assgn ntal bnary labels for a part of data. In the second stage, the labels of the rest data are nduced accordng to the unsupervsed smlarty. In ths paper, the proposed M method follows a smlar two-stage framework, but wth completely dfferent strateges n both stages. In the frst stage, t uses the refned projecton vector based on deeper analyss of the p-stable property. In the second stage, M ncorporates dfferent ways based on two dfferent scenaros. For unsupervsed scenaro, teratve quantzaton s ncorporated to refne the hash functons for retrevng Eucldean neghbors. For supervsed scenaro, the supervsed label propagaton procedure s used to learn the hash functons for retrevng semantcally smlar nstances. The contrbutons of ths paper are summarzed as follows. Frstly, based on p-stable dstrbuton theory, we show how to vew the Eucldean dstance preservng problem as estmatng the varance of a p-stable dstrbuton. Ths observaton leads to a novel projecton method whch maps the samples n the orgnal feature space to arbtrary dmensonal real-valued vectors. For each dmenson, rather than drectly usng one sngle random vector, we generate ts projecton vector based on approxmatng the multple random vectors for recoverng the Eucldean dstance wthn the dataset. Secondly, based on ths mappng, we show how the teratve quantzaton method [3] can be used for mnmzng the loss of thresholdng. Ths leads to the development of the unsupervsed hashng M-ITQ. Fnally, we construct an objectve functon whch s smlar to [7] but characterzes semantc smlarty, and compute ts approxmate soluton by combnng the proposed projecton method wth a coordnate descent algorthm. Ths results n a novel supervsed hashng scheme for the purpose of preservng the semantc smlarty, whch to a certan extent elmnates the nconsstency of feature smlarty and semantc smlarty n hashng. In the rest of the paper, a revew of relevant hashng methods s gven n Secton II. The proposed unsupervsed hashng s descrbed n Secton III, followed by the ntroducton of a novel supervsed hashng n Secton IV. We present the expermental results n Secton V, then draw conclusons and

4 3 dscuss the future work n Secton VI. II. RELATED WORK Compared aganst ts data depent counterpart, datandepent methods are usually consdered to be more adaptve to heterogeneous data dstrbutons, but wth the decrease of effcency n practce [7]. Localty senstve hashng based on p-stable dstrbuton () [6] s one of the most representatve methods n the data ndepent hashng category. Based on the p-stable dstrbuton, hash functons can be generated drectly wthout any tranng data, and the mathematcal propertes of p-stable dstrbuton [6] guarantee that vectors close to each other n the orgnal feature space have hgh probablty to generate the same output by each hash functon. Each hash functon s a random lnear projecton and s ndepent to each other because of the randomness of the projecton vectors. Some other data-ndepent hashng schemes have been proposed besdes. For example, n [7], a datandepent hashng scheme has been reported, whch utlzes random Fourer features to make the Hammng dstance be related to the shft-nvarant kernel (e.g., Gaussan kernel) between the vectors. Recently, a kernelzed localty senstve hashng (K) [] has been proposed. It constructs random project vectors by usng a weghted sum of data nstances n a tranng set to approxmate the Gaussan random hyperplane n a hghly mplct kernel space. In many applcatons, the data dstrbuton s not very complex and can be well learned from a tranng set. In ths scenaro, data-depent approaches become very appealng. A representatve data-depent hashng scheme s spectral hashng () [7]. It transforms the problem of fndng smlarty preservng code for a gven dataset to a NPhard graph parttonng problem that s smlar to Laplacan egenmaps [8]. relaxes ths problem and solve t by a spectral method [7], [8]. For novel data pont, uses the Laplace-Beltram egenfunctons to obtan bnary codes under the hypothess that the data s unformly dstrbuted. To address the problem when data do not meet ths hypothess, anchor graph hashng (AGH) [9] has been proposed. AGH uses an anchor graph to obtan a low-rank adjacency matrx whch s computatonally feasble to approxmate the smlarty matrx and then processes t n constant tme based on the Nyström method [9]. Zhang et al. proposed a self-taught hashng [2] method that frstly performs Laplacan egenmaps and then thresholds egen-vectors to get bnary code for the tranng set. After that, t trans an SVM classfer as the hash functon for each bt. Recently, more extensons of the above methods have been developed. For nstance, multdmensonal spectral hashng [2] s guaranteed to mantan the affntes when the number of bts ncreases. L et al. exted the spectral hashng wth semantcally consstent graph n [22], whch ncorporates pror nformaton nto n a supervsed manner. Furthermore, Shen et al. [23] have developed a group of hashng technques based on a wde varety of manfold learnng approaches such as Laplacan egenmaps. Dmensonalty reducton methods have been wdely appled nto hashng problems. Several data-depent hashng methods have been developed based on Prncpal Component Analyss (PCA) [24], ncludng PCA-Drect [3] whch drectly thresholds the results after performng PCA, PCA- RR [25] whch apples a random orthogonal transformaton before thresholdng, PCA-ITQ [3] whch refnes an orthogonal transformaton to reduce quantzaton error, and Isotropc Hashng [26] whch learns orthogonal transformaton that makes projected dmensons have equal varance. In [3], Gonget al. also presented a supervsed hashng method CCA- ITQ based on Canoncal Correlaton Analyss (CCA) and the same teratve quantzaton method. LDAHash [27] ntroduces Lnear Dscrmnant Analyss (LDA) [28] nto hashng for local descrptors matchng. Bnary Reconstructve Embeddng (BRE) [8] and Mnmal Loss Hashng (MLH) [] optmze objectve functons drectly wth respect to the bnary code. BRE ams to reconstruct the Eucldean dstance n the Hammng space, and MLH has a hnge-lke loss functon. Varous learnng settngs have been explored n data depent hashng. Sem-supervsed hashng (S) [29] has been ntroduced to search semantc smlar nstances when only part of the data are labelled. It mnmzes the emprcal error over the labeled data, and apples an nformaton theoretc regularzer over both labeled and unlabeled data. Projecton learnng hashng method [3] has been proposed n a smlar form as S, contanng a sem-supervsed method and an unsupervsed method. Besde S, weaklysupervsed hashng [3] and kernel-based supervsed hashng (K) [32] are two other supervsed hashng schemes that have kernel based hash functons. Kuls et al. have exted functons to a learned metrc [33], whch can also be consdered as a supervsed method. Besde these methods, several other hashng methods have been proposed to address dfferent aspects of the modellng and computaton, ncludng semantc hashng [34], random maxmum margn hashng [35], Manhattan hashng [36], dual-bt quantzaton hashng [37], sphercal hashng [38] and k-means hashng [39]. III. UNSUPERVISED HAING FOR PRESERVING EUCLIDEAN DISTANCE In ths secton, we present our unsupervsed hashng scheme M-ITQ based on p-stable dstrbuton. As llustrated n Fgure, there are two major parts wthn our scheme, wth one beng data-ndepent and the other beng datadepent. The core dea s to use multple random vectors to generate one hash functon. A. Eucldean Dstance Preservng as Varance Estmaton We commence by revewng bascs of p-stable dstrbuton, and then descrbe how t can be used to preserve the orgnal dstance between data ponts. Ths process can be thought of as estmatng the varance of a specfc dstrbuton. A random varable has a stable dstrbuton f a lnear combnaton of ndepent copes of the varable follows a smlar dstrbuton. For a p-stable dstrbuton D, gven t real numbers b...b t and random varables X...X t whch are ndepently and dentcally drawn from dstrbuton D, b X wll follow the same dstrbuton as ( b p ) /p X,

5 4 where X s a random varable wth dstrbuton D and p s a parameter subject to p [6]. It has been proved that stable dstrbuton exsts when p (, 2] [2]. Partcularly, when p = and p = 2, the correspondng p-stable dstrbutons are Cauchy dstrbuton and Gaussan dstrbuton, respectvely. Let w denote a d-dmensonal random vector whose entres are generated ndepently from a standard Gaussan dstrbuton D s (wth zero mean and unt standard devaton). Let v and v j be two data vectors wth dmensonalty d, and the dstrbuton of w T v w T v j = w T (v v j ) follows a Gaussan dstrbuton D g whch has zero mean and varance v v j 2. Let W denote a d r matrx whose each column s a random vector whch can be thought of a vector behavng lkew. Ther entres of the vectorw T (v v j ) are ndepent of each other and follow D g. Ths mples that for arbtrary W T (v v j ), r WT (v v j ) 2 s an estmator of the varance of D g. We can get the expectaton of the random varable r WT (v v j ) 2 : E[ r WT (v v j ) 2 ] = v v j 2 () where s thel 2 norm. Equaton () also shows that ths s an unbased estmate. Furthermore, usng the probablty densty functon of Gaussan dstrbuton, we can get the varance of ths estmator: Var[ r WT (v v j ) 2 ] = 2 r v v j 4 (2) We observe that larger r leads to smaller varance and gves more precse estmaton. In, r corresponds to the length of hash code. Therefore, equaton (2) also explans why performs better wth longer hash codes. B. Learnng Projecton Vectors The scheme uses one random vector to generate one hash functon (hash bt). Precse characterzaton of requres a large number of random vector samples, whch leads to long hash code. However, long hash code s less preferred n practce because t leads to low recall, sparse hash table and decreased effcency. An ntrnsc soluton to overcome ths dsadvantage s to change the one-to-one correspondence between random vectors and hash bts. Dfferent from, we propose multple localty senstve hashng (M) whch uses c dfferent Gaussan random vectors to generate one bt. By usng c r random vectors, our M generates r hash bts. In contrast, by usng the same number of random vectors, results n longer code wth c r hash bts, whch s less effcent. For a hashng scheme wth r hash bts, our method can be mplemented through estmatng the varance of the Gaussan dstrbuton D g based on c r random samples, whch s motvated by the prncples descrbed n Secton III-A. Let Q be a d c matrx whose each column s a Gaussan random vector. If our hash functon s constraned to be n a lnear form, then for each hash functon, our objectve s fndng a d-dmensonal projecton vector u: arg mn u n ( Q T v Q T v j 2 (u T v u T v j ) 2 ) 2 (3),j By dscardng the magntude factor, we can assume that u = Ql where l s a c-dmenson unt vector,.e., l 2 =. So the term Q T v Q T v j 2 (l T Q T v l T Q T v j ) 2 s always non-negatve, and our objectve becomes: mn l n ( Q T v Q T v j 2 (l T Q T v l T Q T v j ) 2 ) (4),j Proposton. Fndng the optmal soluton n problem (4) s equvalent to the maxmzaton problem: maxl T Q T VV T Ql l subject to l 2 =. where V s a matrx wth the th column beng v. Proof. The mnmzaton problem n (4) can be transformed to the maxmzaton problem as follows: n arg max (l T Q T v l T Q T v j ) 2 (6) l,j The sum of the squared parwse dfference has a proportonal relaton wth the varance. Suppose V s a matrx wth the th column beng v, and we have: n (l T Q T v l T Q T v j ) 2 Var(l T Q T V) (7),j where Var( ) s the sample varance of elements n the vector. For the zero-mean data, Var(l T Q T V) = n lt Q T VV T Ql. Fnally, we transform the ntal objectve (4) n terms of the optmzaton problem (5). The optmal l s obtaned by the egen-decomposton of the matrx Q T VV T Q, where l s the egenvector assocated wth the largest egenvalue of Q T VV T Q. Accordng to Proposton, ths s also the optmal soluton of objectve (4). Therefore, the approxmate soluton of u for equaton (3) s obtaned by u = Ql. A d r matrx Û s then establshed, wth ts columns beng vectors resulted from equaton (5) by usng r dfferent random matrces Q separately. We have U = c r Û, and U T (v v j ) 2 s an 2 approxmaton for the estmator wth varance c r v v j 4 accordng to equaton (2). C. Mnmzng the Error of Thresholdng For the d r matrx U obtaned n Secton III-B, let U k denote ts kth column. The bnary code for a feature vector v can be obtaned by applyng sgn functon to Uk Tv. However, drectly usng sgn( ) leads to consderable loss of accuracy n the bnary code. The quantzaton error of thresholdng can be estmated as: n r (sgn(uk T v ) Uk T v ) 2 (8) k The desred U should have a small quantzaton error. Note that n [6], Datar et al. quantzed the real-valued output to dscrete ntegers to mantan accuracy. Nonetheless, bnary codes are more convenent for retreval, whch therefore, s adopted n ths paper. (5)

6 5 Algorthm : M-ITQ Data: A d n matrx V wth each column beng a feature vector n the tranng set; The length of hashng codes r. Result: A d r projecton matrx U. for m = to r do Generate d c matrx Q wth each column beng a Gaussan random vector; Perform egen-decomposton of the matrx Q T VV T Q and let l equals the egenvector assocated wth the largest egenvalue; u Ql; U m u; U c r [U,U 2...U r ]; Solve r r orthogonal matrx R n () by the teratve Procrustes method n [3]; U UR. Proposton 2. Gven a projecton matrx U and an arbtrary orthogonal r r matrx R, U and UR have the same power for reconstructng the Eucldean dstance. Proof. For an arbtrary par of feature vectors v and v j, we have: (UR) T v (UR) T v j 2 = (UR) T (v v j ) 2 = (v v j ) T (UR)(UR) T (v v j ) = (v v j ) T UU T (v v j ) = U T v U T v j 2 (9) So the parwse Eucldean dstance of projecton results under U and UR s the same. Accordng to Proposton 2, U R behaves the same as U. In the lght of ths observaton, we am to obtan an optmal soluton R to acheve the least quantzaton loss of thresholdng: R = argmn R sgn((ur)t V) (UR) T V 2 F () where F denotes the Frobenus norm. We follow the teratve method descrbed n [3] to solve objectve functon (). In each teraton, t uses the classc Orthogonal Procrustes problem soluton [4] to fnd an orthogonal rotaton R () (r r orthogonal matrx) to algn vector set U T V wth sgn(u T V). After updatng U by UR (), t starts a new teraton. After t teratons, the fnal r r orthogonal matrx R = R () R (2) R (3)...R (t) s obtaned. The proposed method s summarzed n Algorthm. D. Constructng Multple Hash Tables For most data-depent hashng methods, the lmtaton on the amount of hash functons leads to the ncapablty of constructng multple hash tables. Snce the matrx Q s random, our method can construct multple hash tables n the same way as. In ths settng, the Hammng dstance between bnary codes of v and v j s: dst(v,v j ) = mn t=..l d Hammng(Y t (v ),Y t (v j )) () where Y t (v ) s the bnary code of v n the t-th hash table. In [4], [42], methods are presented to buld hash tables wth data-depent strategy. However, these methods concentrate on the hash tables constructon process. The dea s to tran the hash functons of a data-depent method wth dfferent data or parameters, whch leads to the generaton of dfferent hash tables. Our method, on the other hand, focuses on the hashng method tself and generates multple hash tables by random vectors subject to p-stable dstrbuton. IV. SUPERVISED HAING BY INCORPORATING SEMANTIC SIMILARITY The method presented n Secton III reconstructs the Eucldean dstance n the orgnal space and learns the hash functon n an unsupervsed manner. In many stuatons, the Eucldean dstance between feature vectors v and v j does not reflect the real semantc smlarty of objects. In ths secton, we present a supervsed hashng method M-SLP whch explores supervsed parwse smlarty. The whole procedure of ths method and the relaton wth M s shown n Fgure. A. Hashng Objectve for Semantc Smlarty Let L() denote the class label of the object v and S denote a matrx whose (,j)th entry S j represents the supervsed semantc smlarty between two objects and j. S j s defned as: {, L() = L(j); S j = (2), otherwse. Our goal s to learn bnary codes subject to the requrement that neghborng samples n the same class are mapped to smlar codes n the Hammng space. The neghborhood of object samples s measured n terms of semantc smlarty. In ths scenaro, our method seeks an r-bt Hammng embeddng Y {,} r n for n samples n the orgnal space, and learns r hash functons h,2,...,r : R d {,}. Let y denote the th column of Y,.e., the bnary code for object v. We have an ntutve objectve: arg mn Y S j y y j 2 (3),j subject to: y {,} r, n y = n 2 r where every column of Y should be ndepent of each other. r s an r-dmensonal vector of ones. The constrant n y = n 2 r enables the data to be mapped nto the hash table unformly. Mnmzng (3) leads to small S j beng assocated wth large Hammng dstance y y j 2, and vceversa. Though slghtly dfferent, ths defnton s equvalent to the prevous bnary output {-,}.

7 6 Wess et al. [7] have shown that a smlar problem to Equaton (3) s NP hard. Ther soluton s relaxaton of the problem to that of egen-decomposton, whch s based on the smlarty measured n the orgnal feature space. In contrast, we explot semantc smlarty and approxmate a soluton usng the p-stable dstrbuton theory and a coordnate descent method. Let y (m) be an n-dmensonal row vector denotng the mth row of Y. We transform the orgnal problem of learnng Y {,} r n to r subproblems of learnng y (m) {,} n (m =,2...r). Then each row vector y (m) could be learned separately for m =, 2,..., r through the same learnng strategy. Let y (m) denote the th element n y (m), we relax the orgnal problem wth a probablstc form. Let p (m) be an n-dmensonal vector wth the th componentp (m) beng the probablty for y (m) =,.e., the probablty of v havng the bnary output wth respect to the mth hash functon. The expectaton ofy (m) s E[y (m) ] = p (m) +( p (m) ) = p (m). We formulate the objectve functon for the mth subproblem of all the r subproblems as follows: arg mn p (m),j S j p (m) subject to p (m) [,] p (m) j 2 (4) The method for obtanng optmal p (m) satsfyng (4) s descrbed n the followng two subsectons. B. Quas Hash Bts In ths subsecton, we present a strategy for ntalzng the hash probablty p (m),.e., the probablty for the mth hash functon to map the feature vector v to. We commence by generatng a d-dmensonal vector u by usng the M presented n Secton III. Then, for samples =,...,n n the tranng set, p (m) s ntalzed as follows:, u T v > α + ; p (m) =, u T v < α ; (5).5, otherwse. Here α + and α, whch represent postve and negatve threshold parameters respectvely, wll be set emprcally. The ntalzaton strategy s developed n the lght of the ntuton that f the Eucldean dstance between feature vectors of two objects s very large, then t s nearly mpossble that they are semantc neghbors. Note that we have already shown that the dfference of projectons on u could reflect the Eucldean dstance between the orgnal vectors. If p p j 2 >, whch means sgn(u T v ) sgn(u T v j ), then u T v u T v j > α + α. Suppose α + α s large enough, S j = wll be wth a hgh probablty. When p p j 2 =, S j wll not nfluence the sum. Furthermore, u T v has a zero mean whch approxmately satsfes y(m) = n/2, and the randomness of u makes y (m) (m =,2...r) ndepent of each other. Therefore, ths partal soluton satsfes the constrants n equaton (3). For the tme beng, we set the hash bt y (m) for v to be f p (m) =, and set t to be f p (m) =. We refer to these hash bts thus obtaned as quas hash bt. Furthermore, the remanng feature vectors assocated wth the hash probablty.5 t to be less dstnctve n terms of the projecton on u, and we do not assgn quas hash bts to them. C. Coordnate Descent In ths subsecton, we use the coordnate descent method to teratvely update the hash probabltes whch are not assocated wth quas hash bts. In each teraton, we mnmze the objectve functon (4) by settng the dervatve wth to be zero. Specfcally, we treat one p (m) respect to p (m) wth the ntal value.5 as a unque varable, hold all the other hash probabltes fxed, and update p (m) p (m) = n j=,j as follows: S j p (m) j n k=,k S. (6) k Suppose we have n total n hash probabltes whch are not assocated wth quas hash bts. The coordnate descent method behaves n a way that one loop of teratons enumerate all the n hash probabltes and then starts another loop of teratons. Snce S j s always non-negatve, and p (m) p (m) j 2 s convex, problem (4) s a convex optmzaton problem whch means t has global optmal soluton. Furthermore, the subproblem of the coordnate descent method s also convex, p (m) j 2 decreases after so the objectve value,j S j p (m) each teraton. Fgure 2 shows the convergence process of the optmzaton method for solvng the problem (4) on the CIFAR- dataset. Detals on ths dataset are presented n Secton V. Objectve value x # Outer loops Fg. 2. Numercal result of the convergency of the optmzaton process on the CIFAR- dataset. After p (m) converges, we get the refned hash probabltes. Then for one sample v whch s not assgned a quas hash bt, we generate ts bnary code wth respect to the mth hash functon as follows: y (m) = {, p (m) >.5;, otherwse. (7) Repeat ths procedure r tmes, r n-dmensonal row vectors can be generated. Fnally, the {,} r n matrx Y can be establshed by concatenatng the r n-dmensonal row vectors.

8 7 D. Bnary Codes for Queres The scheme presented n Secton IV-C only generates the bnary representatons for samples n the tranng set. In ths subsecton, we nvestgate how to generate the codes of a query. Accordng to the defnton of hashng, one hash functon h m maps a sample v n the orgnal feature space to a bnary value y (m) {,}. In ths scenaro, one hash functon can be consdered as a bnary classfer. Therefore, generatng the bnary code for a query can be thought of as a bnary classfcaton problem. We use the tranng dataset consstng of v for =,,n and the correspondng r-bt Hammng embeddng Y obtaned n Secton IV-C to tran r bnary classfers. Themth bnary classfer categorzes a query nto the class wth label or, whch s the mth bnary code for the query accordngly. Therefore, t s reasonable to refer to the mth bnary classfer as the mth hash functon h m. E. A Label Propagaton Vew of the Proposed Framework We consder one hash functon as a bnary classfer and the hash bts as the labels of samples. For normal classfcaton problems, the labels of tranng samples are usually obtaned through human annotaton. On the other hand, for a hash functon, a sample s assgned a hash bt. Specfcally, the crteron for ths assgnment s based on equaton (3), whch s ntrnscally smlar to that of label propagaton. Dfferent from general label propagaton that uses feature smlarty to propagate the label, our method uses semantc smlarty to propagate the hash bts. However, our method and label propagaton share the common underlyng prncple that one classfer should assgn the same class labels to neghborng samples wth a hgh probablty. Therefore, we refer to our method descrbed n ths secton as multple localty senstve hashng wth supervsed label propagaton (M-SLP). The detal of the whole procedure s descrbed n Algorthm 2. V. EXPERIMENTS A. Datasets and Experments Setup We evaluated the performance of the proposed methods on three popular mage datasets. These datasets vary n content, mage szes, class labels, and human annotatons. CIFAR- dataset [43] conssts of 6, color mages n classes, wth 6, mages per class. The classes nclude arplane, automoble, brd, cat, deer, dog, frog, horse, shp, and truck. Fgure 3 shows some sample mages randomly selected from each class. MNIST database conssts of 7, handwrtten dgt mages, ncludng 6, examples n the tranng set, and, examples n the test set. It s a subset extracted from a larger set avalable from NIST. The mages are grey scale. Ths dataset has classes correspondng to the 9 dgts, wth all mages beng labeled. NUS-WIDE s a web mage dataset created by the Lab for Meda Search n Natonal Unversty of Sngapore, whch contans 269,648 mages downloaded from Flckr. The groundtruth of these mages are provded n multple labels such that each mage s labeled as a vector of zeros and ones to represent Algorthm 2: M-SLP Data: A d n matrx V wth each column beng a feature vector n the tranng set; A smlarty matrx S; The length of hashng codes r. Result: Bnary codes Y ; A set of r hash functons h m ( ) for m =,2,...,r. for m = to r do Generate a vector u as descrbed n Secton III-B; Intalze p (m) accordng to (5); for = to n do f p (m) = then Assgn a quas hash bt to v ; f p (m) = then Assgn a quas hash bt to v ; whle p (m) s not converged do for = to n do f v does not have a quas hash bt then p (m) n j=,j Assgn y (m) accordng to (7); h m Classfer(V, y (m) ); Y = [y ()T,y (2)T,...,y (r)t ] T. S jp (m) j n k=,k S k ; whether t belongs to one of the 8 defned concepts. Each mage can be assgned wth multple concepts. We extracted dfferent mage features on each dataset due to dfferent propertes of correspondng mages. For CIFAR-, the mages are too small to extract good scale nvarant local features such as SIFT [44]. Consderng that mages are n the same sze, we used a 52-dmensonal GIST descrptor [3] to represent each mage. In MNIST, the dgt n each mage s well algned, so the gray values of each mage can be treated as a 784-dmenson feature vector. Snce a major porton of pxels are clean background pxels, each feature vector has a sparse form. Images n NUS-WIDE are n larger sze and contan lots of detal nformaton. In the experments, we used the 5-dmensonal Bag-of-Words [2] feature vector bult from SIFT descrptons for mage retreval. The proposed M-SLP method can work wth varous classfers. In the experments, we chose lnear SVM as the model of hash functon n order to meet the effcency requrements of mage retreval. The lnear model s effcent n the predcton phase, whch s very mportant to the ndexng tme. In the mplementaton, we employed the LIBLINEAR [45] whch has low tme complexty and good classfcaton accuracy. The man parameters are set as default values provded by LIBLINEAR,.e., cost C =, dual maxmal volaton tolerance ǫ =..

9 8 Fg. 3. Random samples from the CIFAR- dataset. Each row contans mages of one class. B. Evaluaton Protocols and Baselne Methods In the experments, we evaluated the proposed M-ITQ and M-SLP methods n both unsupervsed and supervsed settngs. In the unsupervsed settng, we used the Eucldean neghbors as the ground truth. Smlar to [7], we used the average dstance of all query samples to the 5th nearest neghbor as a threshold to determne whether a pont n the dataset should be consdered as a true postve for a query. In the supervsed settng, we used class labels as the ground truth. In the CIFAR- dataset and the MNIST dataset, each mage has a sngle class label, then mages n the same class are consdered as the true neghbors to each other. Whle on NUS-WIDE dataset, we followed the protocol n [29] such that the ground truth s determned based on whether two samples share at least one semantc label. We randomly chose samples from each dataset as the query mages, and used the rest of the dataset as the target of the search. For M-ITQ, we used all samples but the query mages n the dataset as the tranng set. We randomly selected 2 samples from each dataset for tranng M- SLP because of ts relatvely hgh computatonal complexty. We used the same sze of tranng sets as descrbed n ther orgnal papers for all alternatve methods. We adopted the precson-recall curve to compare the overall performance of all methods. In our experments, t was computed by: precson = recall = Number of retreved relevant pars Total number of retreved pars Number of retreved relevant pars Total number of relevant pars (8) (9) For the gven queres, we ncrease the Hammng radus from to r to generate r + pars of precson and recall values, then the precson-recall curve s plotted. As a complement, we also calculated the mean average precson (map), whch s the area under the precson-recall curve. In practce, there are two major applcatons for the resulted hash bnary codes,.e. Hammng rankng and hash lookup. Hammng rankng compares the bnary code of the query wth all samples n the database, whch leads to lnear complexty but can be effcent thanks to the effcacy of the comparson operator of bnary codes. Hammng rankng s usually used wth longer code length. Hash lookup constructs a lookup table for the database. Wth the bnary code of a query, t retreves samples that fall wthn a bucket of the Hammng radus δ. To guarantee the effcency of retreval, the lookup table should not be too sparse and the bnary code should be compact. In our experments, we also compute the mean precson under dfferent hash code lengths for the Hammng radus δ and the top k returned samples of Hammng rankng: mean precson = Number of retreved relevant samples for query Total number of retreved samples for query Number of test samples (2) If there s nothng n the buckets (.e., no retreved samples) for certan Hammng radus δ and query sample, we consder t to be zero precson. We compared our methods wth some state-of-the-art unsupervsed hashng methods, whch nclude teratve quantzaton based on PCA (PCA-ITQ) [3], k-means hashng () [39], sphercal hashng () [38], unsupervsed sequental learnng hashng () [3], spectral hashng () [7], and localty senstve hashng () [6]. For supervsed or sem-supervsed hashng methods, we evaluated teratve quantzaton based on CCA (CCA-ITQ) [3], sem-supervsed hashng (S) [29] and sem-supervsed sequental projecton learnng hashng (S3PLH) [3]. In these methods, and our method M- ITQ can drectly construct multple hash tables, we denote them as -m and M-ITQ-m, respectvely. A summary on dfferent propertes of these methods s gven n Table I. Method Hash Functon Learnng Paradgm PCA-ITQ [3] lnear unsupervsed [39] nonlnear unsupervsed [38] nonlnear unsupervsed [3] lnear unsupervsed [7] nonlnear unsupervsed [6] lnear (data-ndepent) CCA-ITQ [3] lnear supervsed S [29] lnear sem-supervsed S3PLH [3] lnear sem-supervsed TABLE I SUMMARY OF PROPERTIES OF HAING METHODS UNDER COMPARISON. Through Table I, we observe that hash functons n PCA- ITQ, CCA-ITQ,, S, and S3PLH have the lnear form of hash functons, whch are the same as our methods. On the other hand,, and use nonlnear hash functons, but stll acheve a constant tme complexty for computng bnary codes.

10 9 5.6 map.5..5 map L c L c 7 9 (a) CIFAR- (b) MNIST Fg. 4. The map of dfferent parameter settngs for M-ITQ-m on CIFAR- and MNIST. C. Evaluaton of Unsupervsed Hashng Methods Unsupervsed hashng methods am at fndng the nearest neghbors of the query accordng to the Eucldean dstance. They were orgnally developed for mprovng the tme effcency of nearest neghbor search. Because the class labels are not avalable, the results dep on the dstrbuton of the data. We chose two datasets,.e., CIFAR- and MNIST whch have dstnctve data dstrbutons. GIST descrptors were extracted from CIFAR-, whch usually conssts of nonzero real numbers. On the other hand, features extracted from MNIST are sparse vectors, wth most entres beng zeros. After settng the ground truth by Eucldean dstance as n secton V-B, we compared,,,,, PCA-ITQ and the proposed method M-ITQ. There are two parameters n our method, the number of random vectors for one bt c, and the number of constructed hash tables L. We set them wth dfferent values rangng from to 9, and computed the map on each parameter settng. Fgure 4 shows the map of dfferent parameter settngs on CIFAR- and MNIST. When usng multple hash tables, we returned samples wthn certan Hammng dstance δ n all L hash tables. Therefore, the recall always goes up wth the ncrease of L, but the precson does not. Because map s computed as the area under the precson-recall curve, too large L wll decrease t. Large c can decrease the varance of the Eucldean dstance estmator accordng to Secton III-A, but may ncrease the approxmaton error n equaton (4). Because we use the egen-decomposton based method solve l, too large c wll dlute the nformaton of the egenvector correspondng to the largest egenvalue. We can see that too large or too small values for both L and c do not lead to good performance. Therefore, n the followng experments, we set L = 7 and c = 3 for both CIFAR- and MNIST. Fgures 5 and 6 show precson-recall curves for Eucldean neghbor retreval on CIFAR- and MNIST, respectvely. On CIFAR-, our method wth multple hash tables (M- ITQ-m) outperforms all alternatve methods when the code length s 32. When the code length equals 64 or 28, the performances of M-ITQ-m and PCA-ITQ are very close. Our method wth sngle a hash table (M-ITQ) outperforms all alternatves except PCA-ITQ, and ts performance s very close to PCA-ITQ when the code length s greater than 64. For the alternatve methods, and have sgnfcant mprovement when the code length ncreases.,, and do not work well on CIFAR-. On MNIST, and have better performance and performs the worst. Although both and our method are based on the p- stable dstrbuton, our method outperforms sgnfcantly because of the data-depent component. Ths superorty s more obvous wth short code length because our method takes the data dstrbuton nto consderaton. To take the quanttatve evaluaton of the hash technques one step further, we used the mean precson and recall of Hammng radus δ to evaluate dfferent methods for hash lookup. Smlar to many other hashng methods, we set the Hammng radus δ < 2, and computed the recall accordng to (9) and the mean precson accordng to (2). Fgures 7 and 8 llustrate these two measurements wth respect to the length of hashng codes, respectvely. When the hash code length r goes too large, the hash table becomes too sparse. For a gven query, the buckets wthn Hammng radus δ may contan nothng, so the precson s looked as zero. Therefore, the performance wth Hammng radus δ may degrade when the code lengthr ncreases. It s clear that our sngle hash table method M-ITQ outperforms alternatve sngle table methods. We also observe that the M-ITQ benefts from usng multple hash tables for hash lookup, because M-ITQ-m has demonstrated sgnfcant advantages over the alternatve methods. Furthermore, we make an emprcal comparson between M-ITQ-m and -m, under dfferent number of hash tables. Fgure 9 shows the map for -m and M-ITQ-m

11 Precson M-ITQ M-ITQ m Precson M-ITQ M-ITQ m Precson M-ITQ M-ITQ m (a) 32 bts.4.6 (b) 64 bts.4.6 (c) 28 bts Fg. 5. Precson-recall curves on CIFAR-, usng Eucldean ground truth. Precson M-ITQ.4 M-ITQ-m (a) 32 bts Precson M-ITQ.4 M-ITQ m (b) 64 bts Precson M-ITQ.4 M-ITQ m (c) 28 bts Fg. 6. Precson-recall curves on MNIST, usng Eucldean ground truth. Mean precson M-ITQ M-ITQ m Mean precson M-ITQ M-ITQ m (a) CIFAR- (b) MNIST Fg. 7. Mean precson n Hammng radus δ < 2 on CIFAR- and MNIST, usng Eucldean ground truth.

12 M-ITQ M-ITQ m M-ITQ M-ITQ m (a) CIFAR- (b) MNIST Fg. 8. n Hammng radus δ < 2 on CIFAR- and MNIST, usng Eucldean ground truth map map M-ITQ m m Number of hash tables L (a) CIFAR-.5 M-ITQ m m Number of hash tables L (b) MNIST Fg. 9. The map for -m and M-ITQ-m wth code length 48. wth the fxed code length r = 48. It s clear that our proposed method outperforms -m. And n most cases, both methods have better performance wth more hash tables, and the map of M-ITQ-m changes slghtly when L 5. It s also clear from the expermental results that the datandepent methods perform better wth longer code length. The reason for ths s that the data-ndepent methods rely on random projectons. The larger number of projectons they have, the more precse they can recover the orgnal dstance. On the other hand, data-depent methods capture the dstrbuton of data, so they usually have good performance wth relatvely short code length. D. Evaluaton of Supervsed Hashng Methods In some practcal applcatons, the neghborhood of a gven query s not based on smple metrc such as Eucldean dstance, but reles on the semantc smlarty such as whether two samples belong to the same class. Therefore, we used the class labels of mage samples as the ground truth. We compared the proposed M-SLP method wth several alternatve hashng methods ncludng [6], [7], S3PLH [3], S [29], and CCA-ITQ [3]. Fgure shows the mean average precson under dfferent code lengths on each dataset. The proposed M-SLP method acheves the best results on all three datasets, and performs better wth longer code. S3PLH gets the secondbest rank on both CIFAR- and MNIST. performs the second-best on NUS-WIDE. For the other methods under comparson, and S generate poor map, though S sometmes performs better than on NUS-WIDE. The map of CCA-ITQ degrades when the code length ncreases. The reason may be that CCA-ITQ s based on the Canoncal Correlaton Analyss whch usually has good performance wth low dmensonal output. If the dmensonalty of the output becomes hgher, the useless dmensons of output may be ntroduced and wll tarnsh the useful dmensons. It should be noted that some methods do not generate consstent performance on dfferent dataset. On CIFAR- and MNIST datasets, has lower mean average precson than S3PLH, but on NUS-WIDE, exceeds S3PLH. We can fnd that

13 map map map.9 M-SLP CCA ITQ S S3PLH (a) CIFAR- (b) MNIST (c) NUS-WIDE Fg.. The map on CIFAR-, MNIST, and NUS-WIDE, usng class label ground truth Mean precson.3 5 Mean precson Mean precson 2.9 M-SLP CCA ITQ S S3PLH (a) CIFAR (b) MNIST (c) NUS-WIDE Fg.. Mean precson of top 5 Hammng neghbors on CIFAR-, MNIST, and NUS-WIDE, usng class label ground truth. performs better than many supervsed methods on NUS- WIDE. Ths s because the Bag-Of-Words features used n NUS-WIDE can represent the content well,.e., the Eucldean dstance between the features can already gve a good retreval result. We also show n Fgure the mean precson of top 5 Hammng neghbors. The code lengthr s n range[32,28]. It s clear that M-SLP outperform the alternatves methods wth a large margn on all datasets. S3PLH performs well on CIFAR-, but has a low precson on NUS-WIDE. The mean precson of CCA-ITQ s lower wth longer code. In general, CCA-ITQ and S perform better wth compact hash code than other methods whch do not use the supervsed nformaton. Fnally, we show samples of retreved mages on the CIFAR- dataset n Fgure 2 wth false postves labeled by red rectangles. Ths fgure gves a qualtatve evaluaton of the retreval performance of dfferent methods. E. Computatonal Cost Table II shows the tranng and ndexng tme on CIFAR- from each method. All experments were mplemented usng MATLAB, and ran on a PC wth Core-7 3.4GHZ CPU and 6GB memory. does not have a tranng phase because t s a data-ndepent method. We fnd that M-SLP, and take the hghest tranng tme. In the tranng procedure of M-SLP, dong propagaton and tranng the SVM classfer cost the majorty of tme. Although the tranng phase of M-SLP s tme consumng, t can be boosted wth parallel computng because tranng each hash functon s ndepent. Almost all methods requre only short ndexng tme expect whch has a more complex nonlnear hash functon that takes a longer tme to get the bnary code. When multple hash tables are used, the tranng tme and ndexng tme wll be L tmes longer than the sngle hash table verson. Ths can also be reduced f we generate the hash functons and bnary codes n parallel. VI. CONCLUSION In ths paper, we have revewed the propertes of p-stable dstrbuton and shown how to ncorporate t wth tranng data n data depent settng. We have presented M- ITQ whch takes the dstrbuton of data nto consderaton. It combnes multple random projectons for mnmzng the dfferences between parwse dstances of bnary codes and orgnal vectors. Repeatng the same procedure r tmes, we can generate a vector n R r. We have also used an orthogonal transformaton to mnmze the thresholdng error, makng bnary codes accurately preserve the Eucldean dstance. Compared wth data-ndepent hashng such as, ths method mproves the performance under compact bnary codes. In practce, we can buld multple hash tables to mprove the precson and recall rate whle most data-depent hashng can only use a sngle hash table. For ANN search based on semantc smlarty, we ext our method wth supervsed nformaton. We have proposed a supervsed hashng method (M-SLP), whose tranng procedure s smlar to label propagaton. For each bt, we use the p-stable propertes to

each methods, and show the false postves n red rectangle.

Tranng Tme Indexng Tme Tranng Tme Indexng Tme M-SLP 58.9.9 5.87 4 23.78.39 463.3.63 M-ITQ 2.

98.5 S.9.8..5.22 6.9.52 -. -.6-9 -.53.7.5 9.84.7 6.97.87 27.2 7.44.9 4.33 4 27.56.

assgn the quas bts to a porton of samples n the tranng set, and then optmze the

We have evaluated these two hashng methods on three publc mage data sets.

M-SLP has produced the best performance for the supervsed settng.

dmensonalty reducton. REFERENCES [] J. L.

of the ACM, vol. 8, no. 9, pp. 59 57, 975. [2] S. Lazebnk, C. Schmd, and J.

categores, n Proceedngs of the IEEE Conference on Computer Vson and Pattern

Torralba, Modelng the shape of the scene: A holstc representaton of the spatal

hgh-dmensonal spaces, n Proceedngs of the Internatonal Conference on Very Large Data

Motwan, Approxmate nearest neghbors: towards removng the curse of dmensonalty, n

the Annual Symposum on Computatonal Geometry, 24, pp. 253 262. [7] Y. Wess, A.

14 3 Fg. 2. Qualtatve results on CIFAR-. We retreved 25 Hammng neghbors of some query examples under 48-bt hashng codes usng each methods, and show the false postves n red rectangle. Methods 32 bts 64 bts 28 bts 256 bts Tranng Tme Indexng Tme Tranng Tme Indexng Tme Tranng Tme Indexng Tme Tranng Tme Indexng Tme M-SLP M-ITQ S3PLH S PCA-ITQ CCA-ITQ TABLE II TRAINING AND INDEXING TIME (SECONDS) ON CIFAR-. assgn the quas bts to a porton of samples n the tranng set, and then optmze the assgnment of hash bts to the remanng samples accordng to the semantc smlarty. We have evaluated these two hashng methods on three publc mage data sets. Compared wth several state-of-the-art hashng approaches, the proposed methods have shown ther superorty. M-ITQ wth multple hash tables has acheved the best results for unsupervsed cases and M-SLP has produced the best performance for the supervsed settng. In the future, we wll expand ths dea to other problems such as clusterng or dmensonalty reducton. REFERENCES [] J. L. Bentley, Multdmensonal bnary search trees used for assocatve searchng, Communcatons of the ACM, vol. 8, no. 9, pp , 975. [2] S. Lazebnk, C. Schmd, and J. Ponce, Beyond bags of features: spatal pyramd matchng for recognzng natural scene categores, n Proceedngs of the IEEE Conference on Computer Vson and Pattern Recognton, vol. 2, 26, pp [3] A. Olva and A. Torralba, Modelng the shape of the scene: A holstc representaton of the spatal envelope, Internatonal Journal of Computer Vson, vol. 42, no. 3, pp , 2. [4] R. Weber, H.-J. Schek, and S. Blott, A quanttatve analyss and performance study for smlarty-search methods n hgh-dmensonal spaces, n Proceedngs of the Internatonal Conference on Very Large Data Bases, 998, pp [5] P. Indyk and R. Motwan, Approxmate nearest neghbors: towards removng the curse of dmensonalty, n Proceedngs of the Annual ACM symposum on Theory of computng, 998, pp [6] M. Datar, N. Immorlca, P. Indyk, and V. Mrrokn, Localty-senstve hashng scheme based on p-stable dstrbutons, n Proceedngs of the Annual Symposum on Computatonal Geometry, 24, pp [7] Y. Wess, A. Torralba, and R. Fergus, Spectral hashng, n Proceedngs of the Neural Informaton Processng Systems Conference, 28, pp [8] B. Kuls and T. Darrell, Learnng to hash wth bnary reconstructve embeddngs, Proceedngs of the Neural Informaton Processng Systems Conference, vol. 22, pp. 42 5, 29. [9] W. Lu, J. Wang, S. Kumar, and S.-F. Chang, Hashng wth graphs, n Proceedngs of the Internatonal Conference on Machne Learnng, 2, pp. 8. [] M. Norouz and D. J. Fleet, Mnmal loss hashng for compact bnary codes, n Proceedngs of the Internatonal Conference on Machne Learnng, 2, pp [] B. Kuls and K. Grauman, Kernelzed localty-senstve hashng, IEEE

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department