Data-dependent Hashing Based on p-stable Distribution

Size: px
Start display at page:

Download "Data-dependent Hashing Based on p-stable Distribution"

Transcription

1 Data-depent Hashng Based on p-stable Dstrbuton Author Ba, Xao, Yang, Hachuan, Zhou, Jun, Ren, Peng, Cheng, Jan Publshed 24 Journal Ttle IEEE Transactons on Image Processng DOI Copyrght Statement 24 IEEE. Personal use of ths materal s permtted. Permsson from IEEE must be obtaned for all other uses, n any current or future meda, ncludng reprntng/republshng ths materal for advertsng or promotonal purposes, creatng new collectve works, for resale or redstrbuton to servers or lsts, or reuse of any copyrghted component of ths work n other works. Downloaded from Grffth Research Onlne

2 Data-depent Hashng Based on p-stable Dstrbuton Xao Ba, Hachuan Yang, Jun Zhou, Peng Ren and Jan Cheng Abstract The p-stable dstrbuton s tradtonally used for data-ndepent hashng. In ths paper, we descrbe how to perform data-depent hashng based on p-stable dstrbuton. We commence by formulatng the Eucldean dstance preservng property n terms of varance estmaton. Based on ths property, we develop a projecton method whch maps the orgnal data to arbtrary dmensonal vectors. Each projecton vector s a lnear combnaton of multple random vectors subject to p-stable dstrbuton, n whch the weghts for the lnear combnaton are learned based on the tranng data. An orthogonal matrx s then learned data-depently for mnmzng the thresholdng error n quantzaton. Combnng the projecton method and the orthogonal matrx, we develop an unsupervsed hashng scheme whch preserves the Eucldean dstance. Compared wth data-ndepent hashng methods, our method takes the data dstrbuton nto consderaton and gves more accurate hashng results wth compact hash codes. Dfferent from many datadepent hashng methods, our method accommodates multple hash tables and s not restrcted by the number of hash functons. To ext our method to a supervsed scenaro, we ncorporate a supervsed label propagaton scheme nto the proposed projecton method. Ths results n a supervsed hashng scheme whch preserves semantc smlarty of data. Expermental results show that our methods have outperformed several state-of-the-art hashng approaches n both effectveness and effcency. I. INTRODUCTION The volume of mage data has been ncreasng dramatcally every year. The bg data era has created great challenges to many tasks such as content-based mage retreval (CBIR). One typcal example s the nearest neghbor (NN) search, whch fnds the nearest sample for a query represented as a vectorzed descrptor n R d. It requres a dstance metrc be defned to measure the smlarty between mage descrptors, and the Eucldean dstance s one of the most wdely used metrcs. In ths scenaro, the query tme has lnear depence on the data sze, whch s mpractcal for large scale database. For data wth relatvely low dmensonalty, the problem can be solved usng tree based methods such as bnary search tree []. However, the dmensonalty of most popular mage descrptors, for example those constructed by the Bag-of- Words [2] or GIST [3], s too large. It degrades the effcency of these methods to that of exhaustve search [4]. X. Ba and H. Yang are wth School of Computer Scence and Engneerng, Behang Unversty, Bejng 9, Chna. (e-mal: baxao.buaa@googlemal.com.) J. Zhou s wth School of Informaton and Communcaton Technology, Grffth Unversty, Nathan, QLD 4, Australa. P. Ren s wth College of Informaton and Control Engneerng, Chna Unversty of Petroleum, Qngdao 2576, Chna. J. Cheng s wth Natonal Lab of Pattern Recognton, Insttute of Automaton, Chnese Academy of Scences, Bejng 9, Chna. Approxmate nearest neghbor (ANN) technques have been studed to break the bottleneck of NN search. Its key dea s to fnd an approxmate nearest neghbor rather than the exact one. Localty-senstve hashng () has been ntroduced for ths purpose [5] and has attracted lots of attenton. Its objectve s to map the orgnal vector v R d to a bnary strng y {,} r such that neghborng samples n the orgnal feature space have smlar bnary codes n the Hammng space. However, smple feature smlarty such as that based on Eucldean dstance n the orgnal feature space usually cannot fully capture the semantc smlarty,.e., the real affnty between the contents of objects. For example, n CBIR applcatons, f the mages are represented as GIST descrptors, the Eucldean metrc may result n some false postve nstances for a gven query. One possble soluton for ths problem s to ntroducng supervsed learnng based strateges nto hashng, whch have led to sgnfcant mprovement of the CBIR performance. Hashng methods whch only am at preservng feature smlarty are called unsupervsed hashng, and those based on supervsed learnng strategy are called supervsed hashng. Alternatvely, Hashng based technques can be classfed nto two categores, data-depent hashng or datandepent hashng, depng on whether or not they employ a tranng set to learn the hash functon. Data-ndepent hashng does not requre tranng data. A typcal example s the method presented n [6], whch uses data-ndepent mathematcal propertes to guarantee that the probablty of collson between hash codes reflects the Eucldean dstance of samples. The performance of data-ndepent methods s robust to the data varatons because the hash functons are establshed subject to specfc rules wthout the tranng process. The randomness property enables the data-ndepent methods to generate arbtrary number of hash functons. So one can construct multple hash tables to boost the recall rate. However, such methods suffer from the hgh demand on the dmensonalty of bnary representaton,.e., the length of codes r has to be very large n order to reduce the false postve rate. Ths ncreases the storage costs and degrades the query effcency. Data-depent hashng methods, on the contrary, am at learnng hash functons from a tranng set. A common objectve s to explctly make the smlarty measured n the orgnal feature space be preserved n the Hammng space [7], [8], [9], []. Some methods, such as kernelzed localty senstve hashng (K) [], do not have an explct objectve functon but stll requre a tranng set. Compared wth data-ndepent counterpart, data-depent hashng methods allow compact codng, whch s very feasble n practce. A typcal knd

3 nmages Random project (c dms). (c dms) Learnng Orthogonal transformaton M-ITQ M-SLP Assgn quas hash bt r tmes Propagaton r tmes Threshold by sgn( ) r tmes Threshold by probablty n r-bt bnary codes - (r bts) r tmes r tmes Data-ndepent Data-depent Fg.. The proposed method on extng p-stable dstrbuton theory to data-depent hashng. of data-depent method s the supervsed hashng whch not only consders data dstrbuton, but also ncorporates pror nformaton such as class labels for supervsed learnng. The dsadvantages of data-depent methods s that ther performance may be too depent on the tranng set and they usually have lmted amount of hash functons. We can see that both data-ndepent and data-depent solutons have ther pros and cons. An ntutve dea to overcome ther shortcomngs s developng an ntegrated strategy whch combnes both data-depent hashng and datandepent hashng, and makes them complementary to each other. To acheve ths goal, we propose a hashng method based on p-stable dstrbuton. The p-stable dstrbuton [2] s tradtonally used n data-ndepent hashng methods [6]. It has specal mathematcal propertes that guarantee the dstance under l p norm to be recovered by the projectons on specfc random vectors. In our work, we ext the p-stable dstrbuton to the data-depent settng. An overvew of the proposed method s llustrated n Fgure. Frstly, we project one orgnal feature vector through multple random vectors, and learn a sngle projecton vector for approxmatng the multple random vectors accordng to the data dstrbuton. The same procedure s repeated for r tmes, and gves r projecton vectors. Ths s dfferent from, whch drectly uses a sngle random vector as each projecton vector, and we thus refer to our method as multple localty senstve hashng (M). Based on M, we then apply an orthogonal transformaton [3] to the obtaned projecton vectors for preservng the Eucldean dstance wth bnary codes. Convenently, we refer to ths process as M-ITQ (M wth teratve quantzaton). Furthermore, we use the projecton result of M to assgn quas hash bts for some tranng samples and perform a label propagaton [4] lke process wth respect to the semantc smlarty to generate hash bts for the rest. We refer to ths supervsed hashng method as M-SLP (M wth supervsed label propagaton). In [5], we ntroduced the p-stable dstrbuton theory nto the data-depent hashng. Ths method conssts of two stages. In the frst stage, Gaussan random vector s drectly used to assgn ntal bnary labels for a part of data. In the second stage, the labels of the rest data are nduced accordng to the unsupervsed smlarty. In ths paper, the proposed M method follows a smlar two-stage framework, but wth completely dfferent strateges n both stages. In the frst stage, t uses the refned projecton vector based on deeper analyss of the p-stable property. In the second stage, M ncorporates dfferent ways based on two dfferent scenaros. For unsupervsed scenaro, teratve quantzaton s ncorporated to refne the hash functons for retrevng Eucldean neghbors. For supervsed scenaro, the supervsed label propagaton procedure s used to learn the hash functons for retrevng semantcally smlar nstances. The contrbutons of ths paper are summarzed as follows. Frstly, based on p-stable dstrbuton theory, we show how to vew the Eucldean dstance preservng problem as estmatng the varance of a p-stable dstrbuton. Ths observaton leads to a novel projecton method whch maps the samples n the orgnal feature space to arbtrary dmensonal real-valued vectors. For each dmenson, rather than drectly usng one sngle random vector, we generate ts projecton vector based on approxmatng the multple random vectors for recoverng the Eucldean dstance wthn the dataset. Secondly, based on ths mappng, we show how the teratve quantzaton method [3] can be used for mnmzng the loss of thresholdng. Ths leads to the development of the unsupervsed hashng M-ITQ. Fnally, we construct an objectve functon whch s smlar to [7] but characterzes semantc smlarty, and compute ts approxmate soluton by combnng the proposed projecton method wth a coordnate descent algorthm. Ths results n a novel supervsed hashng scheme for the purpose of preservng the semantc smlarty, whch to a certan extent elmnates the nconsstency of feature smlarty and semantc smlarty n hashng. In the rest of the paper, a revew of relevant hashng methods s gven n Secton II. The proposed unsupervsed hashng s descrbed n Secton III, followed by the ntroducton of a novel supervsed hashng n Secton IV. We present the expermental results n Secton V, then draw conclusons and

4 3 dscuss the future work n Secton VI. II. RELATED WORK Compared aganst ts data depent counterpart, datandepent methods are usually consdered to be more adaptve to heterogeneous data dstrbutons, but wth the decrease of effcency n practce [7]. Localty senstve hashng based on p-stable dstrbuton () [6] s one of the most representatve methods n the data ndepent hashng category. Based on the p-stable dstrbuton, hash functons can be generated drectly wthout any tranng data, and the mathematcal propertes of p-stable dstrbuton [6] guarantee that vectors close to each other n the orgnal feature space have hgh probablty to generate the same output by each hash functon. Each hash functon s a random lnear projecton and s ndepent to each other because of the randomness of the projecton vectors. Some other data-ndepent hashng schemes have been proposed besdes. For example, n [7], a datandepent hashng scheme has been reported, whch utlzes random Fourer features to make the Hammng dstance be related to the shft-nvarant kernel (e.g., Gaussan kernel) between the vectors. Recently, a kernelzed localty senstve hashng (K) [] has been proposed. It constructs random project vectors by usng a weghted sum of data nstances n a tranng set to approxmate the Gaussan random hyperplane n a hghly mplct kernel space. In many applcatons, the data dstrbuton s not very complex and can be well learned from a tranng set. In ths scenaro, data-depent approaches become very appealng. A representatve data-depent hashng scheme s spectral hashng () [7]. It transforms the problem of fndng smlarty preservng code for a gven dataset to a NPhard graph parttonng problem that s smlar to Laplacan egenmaps [8]. relaxes ths problem and solve t by a spectral method [7], [8]. For novel data pont, uses the Laplace-Beltram egenfunctons to obtan bnary codes under the hypothess that the data s unformly dstrbuted. To address the problem when data do not meet ths hypothess, anchor graph hashng (AGH) [9] has been proposed. AGH uses an anchor graph to obtan a low-rank adjacency matrx whch s computatonally feasble to approxmate the smlarty matrx and then processes t n constant tme based on the Nyström method [9]. Zhang et al. proposed a self-taught hashng [2] method that frstly performs Laplacan egenmaps and then thresholds egen-vectors to get bnary code for the tranng set. After that, t trans an SVM classfer as the hash functon for each bt. Recently, more extensons of the above methods have been developed. For nstance, multdmensonal spectral hashng [2] s guaranteed to mantan the affntes when the number of bts ncreases. L et al. exted the spectral hashng wth semantcally consstent graph n [22], whch ncorporates pror nformaton nto n a supervsed manner. Furthermore, Shen et al. [23] have developed a group of hashng technques based on a wde varety of manfold learnng approaches such as Laplacan egenmaps. Dmensonalty reducton methods have been wdely appled nto hashng problems. Several data-depent hashng methods have been developed based on Prncpal Component Analyss (PCA) [24], ncludng PCA-Drect [3] whch drectly thresholds the results after performng PCA, PCA- RR [25] whch apples a random orthogonal transformaton before thresholdng, PCA-ITQ [3] whch refnes an orthogonal transformaton to reduce quantzaton error, and Isotropc Hashng [26] whch learns orthogonal transformaton that makes projected dmensons have equal varance. In [3], Gonget al. also presented a supervsed hashng method CCA- ITQ based on Canoncal Correlaton Analyss (CCA) and the same teratve quantzaton method. LDAHash [27] ntroduces Lnear Dscrmnant Analyss (LDA) [28] nto hashng for local descrptors matchng. Bnary Reconstructve Embeddng (BRE) [8] and Mnmal Loss Hashng (MLH) [] optmze objectve functons drectly wth respect to the bnary code. BRE ams to reconstruct the Eucldean dstance n the Hammng space, and MLH has a hnge-lke loss functon. Varous learnng settngs have been explored n data depent hashng. Sem-supervsed hashng (S) [29] has been ntroduced to search semantc smlar nstances when only part of the data are labelled. It mnmzes the emprcal error over the labeled data, and apples an nformaton theoretc regularzer over both labeled and unlabeled data. Projecton learnng hashng method [3] has been proposed n a smlar form as S, contanng a sem-supervsed method and an unsupervsed method. Besde S, weaklysupervsed hashng [3] and kernel-based supervsed hashng (K) [32] are two other supervsed hashng schemes that have kernel based hash functons. Kuls et al. have exted functons to a learned metrc [33], whch can also be consdered as a supervsed method. Besde these methods, several other hashng methods have been proposed to address dfferent aspects of the modellng and computaton, ncludng semantc hashng [34], random maxmum margn hashng [35], Manhattan hashng [36], dual-bt quantzaton hashng [37], sphercal hashng [38] and k-means hashng [39]. III. UNSUPERVISED HAING FOR PRESERVING EUCLIDEAN DISTANCE In ths secton, we present our unsupervsed hashng scheme M-ITQ based on p-stable dstrbuton. As llustrated n Fgure, there are two major parts wthn our scheme, wth one beng data-ndepent and the other beng datadepent. The core dea s to use multple random vectors to generate one hash functon. A. Eucldean Dstance Preservng as Varance Estmaton We commence by revewng bascs of p-stable dstrbuton, and then descrbe how t can be used to preserve the orgnal dstance between data ponts. Ths process can be thought of as estmatng the varance of a specfc dstrbuton. A random varable has a stable dstrbuton f a lnear combnaton of ndepent copes of the varable follows a smlar dstrbuton. For a p-stable dstrbuton D, gven t real numbers b...b t and random varables X...X t whch are ndepently and dentcally drawn from dstrbuton D, b X wll follow the same dstrbuton as ( b p ) /p X,

5 4 where X s a random varable wth dstrbuton D and p s a parameter subject to p [6]. It has been proved that stable dstrbuton exsts when p (, 2] [2]. Partcularly, when p = and p = 2, the correspondng p-stable dstrbutons are Cauchy dstrbuton and Gaussan dstrbuton, respectvely. Let w denote a d-dmensonal random vector whose entres are generated ndepently from a standard Gaussan dstrbuton D s (wth zero mean and unt standard devaton). Let v and v j be two data vectors wth dmensonalty d, and the dstrbuton of w T v w T v j = w T (v v j ) follows a Gaussan dstrbuton D g whch has zero mean and varance v v j 2. Let W denote a d r matrx whose each column s a random vector whch can be thought of a vector behavng lkew. Ther entres of the vectorw T (v v j ) are ndepent of each other and follow D g. Ths mples that for arbtrary W T (v v j ), r WT (v v j ) 2 s an estmator of the varance of D g. We can get the expectaton of the random varable r WT (v v j ) 2 : E[ r WT (v v j ) 2 ] = v v j 2 () where s thel 2 norm. Equaton () also shows that ths s an unbased estmate. Furthermore, usng the probablty densty functon of Gaussan dstrbuton, we can get the varance of ths estmator: Var[ r WT (v v j ) 2 ] = 2 r v v j 4 (2) We observe that larger r leads to smaller varance and gves more precse estmaton. In, r corresponds to the length of hash code. Therefore, equaton (2) also explans why performs better wth longer hash codes. B. Learnng Projecton Vectors The scheme uses one random vector to generate one hash functon (hash bt). Precse characterzaton of requres a large number of random vector samples, whch leads to long hash code. However, long hash code s less preferred n practce because t leads to low recall, sparse hash table and decreased effcency. An ntrnsc soluton to overcome ths dsadvantage s to change the one-to-one correspondence between random vectors and hash bts. Dfferent from, we propose multple localty senstve hashng (M) whch uses c dfferent Gaussan random vectors to generate one bt. By usng c r random vectors, our M generates r hash bts. In contrast, by usng the same number of random vectors, results n longer code wth c r hash bts, whch s less effcent. For a hashng scheme wth r hash bts, our method can be mplemented through estmatng the varance of the Gaussan dstrbuton D g based on c r random samples, whch s motvated by the prncples descrbed n Secton III-A. Let Q be a d c matrx whose each column s a Gaussan random vector. If our hash functon s constraned to be n a lnear form, then for each hash functon, our objectve s fndng a d-dmensonal projecton vector u: arg mn u n ( Q T v Q T v j 2 (u T v u T v j ) 2 ) 2 (3),j By dscardng the magntude factor, we can assume that u = Ql where l s a c-dmenson unt vector,.e., l 2 =. So the term Q T v Q T v j 2 (l T Q T v l T Q T v j ) 2 s always non-negatve, and our objectve becomes: mn l n ( Q T v Q T v j 2 (l T Q T v l T Q T v j ) 2 ) (4),j Proposton. Fndng the optmal soluton n problem (4) s equvalent to the maxmzaton problem: maxl T Q T VV T Ql l subject to l 2 =. where V s a matrx wth the th column beng v. Proof. The mnmzaton problem n (4) can be transformed to the maxmzaton problem as follows: n arg max (l T Q T v l T Q T v j ) 2 (6) l,j The sum of the squared parwse dfference has a proportonal relaton wth the varance. Suppose V s a matrx wth the th column beng v, and we have: n (l T Q T v l T Q T v j ) 2 Var(l T Q T V) (7),j where Var( ) s the sample varance of elements n the vector. For the zero-mean data, Var(l T Q T V) = n lt Q T VV T Ql. Fnally, we transform the ntal objectve (4) n terms of the optmzaton problem (5). The optmal l s obtaned by the egen-decomposton of the matrx Q T VV T Q, where l s the egenvector assocated wth the largest egenvalue of Q T VV T Q. Accordng to Proposton, ths s also the optmal soluton of objectve (4). Therefore, the approxmate soluton of u for equaton (3) s obtaned by u = Ql. A d r matrx Û s then establshed, wth ts columns beng vectors resulted from equaton (5) by usng r dfferent random matrces Q separately. We have U = c r Û, and U T (v v j ) 2 s an 2 approxmaton for the estmator wth varance c r v v j 4 accordng to equaton (2). C. Mnmzng the Error of Thresholdng For the d r matrx U obtaned n Secton III-B, let U k denote ts kth column. The bnary code for a feature vector v can be obtaned by applyng sgn functon to Uk Tv. However, drectly usng sgn( ) leads to consderable loss of accuracy n the bnary code. The quantzaton error of thresholdng can be estmated as: n r (sgn(uk T v ) Uk T v ) 2 (8) k The desred U should have a small quantzaton error. Note that n [6], Datar et al. quantzed the real-valued output to dscrete ntegers to mantan accuracy. Nonetheless, bnary codes are more convenent for retreval, whch therefore, s adopted n ths paper. (5)

6 5 Algorthm : M-ITQ Data: A d n matrx V wth each column beng a feature vector n the tranng set; The length of hashng codes r. Result: A d r projecton matrx U. for m = to r do Generate d c matrx Q wth each column beng a Gaussan random vector; Perform egen-decomposton of the matrx Q T VV T Q and let l equals the egenvector assocated wth the largest egenvalue; u Ql; U m u; U c r [U,U 2...U r ]; Solve r r orthogonal matrx R n () by the teratve Procrustes method n [3]; U UR. Proposton 2. Gven a projecton matrx U and an arbtrary orthogonal r r matrx R, U and UR have the same power for reconstructng the Eucldean dstance. Proof. For an arbtrary par of feature vectors v and v j, we have: (UR) T v (UR) T v j 2 = (UR) T (v v j ) 2 = (v v j ) T (UR)(UR) T (v v j ) = (v v j ) T UU T (v v j ) = U T v U T v j 2 (9) So the parwse Eucldean dstance of projecton results under U and UR s the same. Accordng to Proposton 2, U R behaves the same as U. In the lght of ths observaton, we am to obtan an optmal soluton R to acheve the least quantzaton loss of thresholdng: R = argmn R sgn((ur)t V) (UR) T V 2 F () where F denotes the Frobenus norm. We follow the teratve method descrbed n [3] to solve objectve functon (). In each teraton, t uses the classc Orthogonal Procrustes problem soluton [4] to fnd an orthogonal rotaton R () (r r orthogonal matrx) to algn vector set U T V wth sgn(u T V). After updatng U by UR (), t starts a new teraton. After t teratons, the fnal r r orthogonal matrx R = R () R (2) R (3)...R (t) s obtaned. The proposed method s summarzed n Algorthm. D. Constructng Multple Hash Tables For most data-depent hashng methods, the lmtaton on the amount of hash functons leads to the ncapablty of constructng multple hash tables. Snce the matrx Q s random, our method can construct multple hash tables n the same way as. In ths settng, the Hammng dstance between bnary codes of v and v j s: dst(v,v j ) = mn t=..l d Hammng(Y t (v ),Y t (v j )) () where Y t (v ) s the bnary code of v n the t-th hash table. In [4], [42], methods are presented to buld hash tables wth data-depent strategy. However, these methods concentrate on the hash tables constructon process. The dea s to tran the hash functons of a data-depent method wth dfferent data or parameters, whch leads to the generaton of dfferent hash tables. Our method, on the other hand, focuses on the hashng method tself and generates multple hash tables by random vectors subject to p-stable dstrbuton. IV. SUPERVISED HAING BY INCORPORATING SEMANTIC SIMILARITY The method presented n Secton III reconstructs the Eucldean dstance n the orgnal space and learns the hash functon n an unsupervsed manner. In many stuatons, the Eucldean dstance between feature vectors v and v j does not reflect the real semantc smlarty of objects. In ths secton, we present a supervsed hashng method M-SLP whch explores supervsed parwse smlarty. The whole procedure of ths method and the relaton wth M s shown n Fgure. A. Hashng Objectve for Semantc Smlarty Let L() denote the class label of the object v and S denote a matrx whose (,j)th entry S j represents the supervsed semantc smlarty between two objects and j. S j s defned as: {, L() = L(j); S j = (2), otherwse. Our goal s to learn bnary codes subject to the requrement that neghborng samples n the same class are mapped to smlar codes n the Hammng space. The neghborhood of object samples s measured n terms of semantc smlarty. In ths scenaro, our method seeks an r-bt Hammng embeddng Y {,} r n for n samples n the orgnal space, and learns r hash functons h,2,...,r : R d {,}. Let y denote the th column of Y,.e., the bnary code for object v. We have an ntutve objectve: arg mn Y S j y y j 2 (3),j subject to: y {,} r, n y = n 2 r where every column of Y should be ndepent of each other. r s an r-dmensonal vector of ones. The constrant n y = n 2 r enables the data to be mapped nto the hash table unformly. Mnmzng (3) leads to small S j beng assocated wth large Hammng dstance y y j 2, and vceversa. Though slghtly dfferent, ths defnton s equvalent to the prevous bnary output {-,}.

7 6 Wess et al. [7] have shown that a smlar problem to Equaton (3) s NP hard. Ther soluton s relaxaton of the problem to that of egen-decomposton, whch s based on the smlarty measured n the orgnal feature space. In contrast, we explot semantc smlarty and approxmate a soluton usng the p-stable dstrbuton theory and a coordnate descent method. Let y (m) be an n-dmensonal row vector denotng the mth row of Y. We transform the orgnal problem of learnng Y {,} r n to r subproblems of learnng y (m) {,} n (m =,2...r). Then each row vector y (m) could be learned separately for m =, 2,..., r through the same learnng strategy. Let y (m) denote the th element n y (m), we relax the orgnal problem wth a probablstc form. Let p (m) be an n-dmensonal vector wth the th componentp (m) beng the probablty for y (m) =,.e., the probablty of v havng the bnary output wth respect to the mth hash functon. The expectaton ofy (m) s E[y (m) ] = p (m) +( p (m) ) = p (m). We formulate the objectve functon for the mth subproblem of all the r subproblems as follows: arg mn p (m),j S j p (m) subject to p (m) [,] p (m) j 2 (4) The method for obtanng optmal p (m) satsfyng (4) s descrbed n the followng two subsectons. B. Quas Hash Bts In ths subsecton, we present a strategy for ntalzng the hash probablty p (m),.e., the probablty for the mth hash functon to map the feature vector v to. We commence by generatng a d-dmensonal vector u by usng the M presented n Secton III. Then, for samples =,...,n n the tranng set, p (m) s ntalzed as follows:, u T v > α + ; p (m) =, u T v < α ; (5).5, otherwse. Here α + and α, whch represent postve and negatve threshold parameters respectvely, wll be set emprcally. The ntalzaton strategy s developed n the lght of the ntuton that f the Eucldean dstance between feature vectors of two objects s very large, then t s nearly mpossble that they are semantc neghbors. Note that we have already shown that the dfference of projectons on u could reflect the Eucldean dstance between the orgnal vectors. If p p j 2 >, whch means sgn(u T v ) sgn(u T v j ), then u T v u T v j > α + α. Suppose α + α s large enough, S j = wll be wth a hgh probablty. When p p j 2 =, S j wll not nfluence the sum. Furthermore, u T v has a zero mean whch approxmately satsfes y(m) = n/2, and the randomness of u makes y (m) (m =,2...r) ndepent of each other. Therefore, ths partal soluton satsfes the constrants n equaton (3). For the tme beng, we set the hash bt y (m) for v to be f p (m) =, and set t to be f p (m) =. We refer to these hash bts thus obtaned as quas hash bt. Furthermore, the remanng feature vectors assocated wth the hash probablty.5 t to be less dstnctve n terms of the projecton on u, and we do not assgn quas hash bts to them. C. Coordnate Descent In ths subsecton, we use the coordnate descent method to teratvely update the hash probabltes whch are not assocated wth quas hash bts. In each teraton, we mnmze the objectve functon (4) by settng the dervatve wth to be zero. Specfcally, we treat one p (m) respect to p (m) wth the ntal value.5 as a unque varable, hold all the other hash probabltes fxed, and update p (m) p (m) = n j=,j as follows: S j p (m) j n k=,k S. (6) k Suppose we have n total n hash probabltes whch are not assocated wth quas hash bts. The coordnate descent method behaves n a way that one loop of teratons enumerate all the n hash probabltes and then starts another loop of teratons. Snce S j s always non-negatve, and p (m) p (m) j 2 s convex, problem (4) s a convex optmzaton problem whch means t has global optmal soluton. Furthermore, the subproblem of the coordnate descent method s also convex, p (m) j 2 decreases after so the objectve value,j S j p (m) each teraton. Fgure 2 shows the convergence process of the optmzaton method for solvng the problem (4) on the CIFAR- dataset. Detals on ths dataset are presented n Secton V. Objectve value x # Outer loops Fg. 2. Numercal result of the convergency of the optmzaton process on the CIFAR- dataset. After p (m) converges, we get the refned hash probabltes. Then for one sample v whch s not assgned a quas hash bt, we generate ts bnary code wth respect to the mth hash functon as follows: y (m) = {, p (m) >.5;, otherwse. (7) Repeat ths procedure r tmes, r n-dmensonal row vectors can be generated. Fnally, the {,} r n matrx Y can be establshed by concatenatng the r n-dmensonal row vectors.

8 7 D. Bnary Codes for Queres The scheme presented n Secton IV-C only generates the bnary representatons for samples n the tranng set. In ths subsecton, we nvestgate how to generate the codes of a query. Accordng to the defnton of hashng, one hash functon h m maps a sample v n the orgnal feature space to a bnary value y (m) {,}. In ths scenaro, one hash functon can be consdered as a bnary classfer. Therefore, generatng the bnary code for a query can be thought of as a bnary classfcaton problem. We use the tranng dataset consstng of v for =,,n and the correspondng r-bt Hammng embeddng Y obtaned n Secton IV-C to tran r bnary classfers. Themth bnary classfer categorzes a query nto the class wth label or, whch s the mth bnary code for the query accordngly. Therefore, t s reasonable to refer to the mth bnary classfer as the mth hash functon h m. E. A Label Propagaton Vew of the Proposed Framework We consder one hash functon as a bnary classfer and the hash bts as the labels of samples. For normal classfcaton problems, the labels of tranng samples are usually obtaned through human annotaton. On the other hand, for a hash functon, a sample s assgned a hash bt. Specfcally, the crteron for ths assgnment s based on equaton (3), whch s ntrnscally smlar to that of label propagaton. Dfferent from general label propagaton that uses feature smlarty to propagate the label, our method uses semantc smlarty to propagate the hash bts. However, our method and label propagaton share the common underlyng prncple that one classfer should assgn the same class labels to neghborng samples wth a hgh probablty. Therefore, we refer to our method descrbed n ths secton as multple localty senstve hashng wth supervsed label propagaton (M-SLP). The detal of the whole procedure s descrbed n Algorthm 2. V. EXPERIMENTS A. Datasets and Experments Setup We evaluated the performance of the proposed methods on three popular mage datasets. These datasets vary n content, mage szes, class labels, and human annotatons. CIFAR- dataset [43] conssts of 6, color mages n classes, wth 6, mages per class. The classes nclude arplane, automoble, brd, cat, deer, dog, frog, horse, shp, and truck. Fgure 3 shows some sample mages randomly selected from each class. MNIST database conssts of 7, handwrtten dgt mages, ncludng 6, examples n the tranng set, and, examples n the test set. It s a subset extracted from a larger set avalable from NIST. The mages are grey scale. Ths dataset has classes correspondng to the 9 dgts, wth all mages beng labeled. NUS-WIDE s a web mage dataset created by the Lab for Meda Search n Natonal Unversty of Sngapore, whch contans 269,648 mages downloaded from Flckr. The groundtruth of these mages are provded n multple labels such that each mage s labeled as a vector of zeros and ones to represent Algorthm 2: M-SLP Data: A d n matrx V wth each column beng a feature vector n the tranng set; A smlarty matrx S; The length of hashng codes r. Result: Bnary codes Y ; A set of r hash functons h m ( ) for m =,2,...,r. for m = to r do Generate a vector u as descrbed n Secton III-B; Intalze p (m) accordng to (5); for = to n do f p (m) = then Assgn a quas hash bt to v ; f p (m) = then Assgn a quas hash bt to v ; whle p (m) s not converged do for = to n do f v does not have a quas hash bt then p (m) n j=,j Assgn y (m) accordng to (7); h m Classfer(V, y (m) ); Y = [y ()T,y (2)T,...,y (r)t ] T. S jp (m) j n k=,k S k ; whether t belongs to one of the 8 defned concepts. Each mage can be assgned wth multple concepts. We extracted dfferent mage features on each dataset due to dfferent propertes of correspondng mages. For CIFAR-, the mages are too small to extract good scale nvarant local features such as SIFT [44]. Consderng that mages are n the same sze, we used a 52-dmensonal GIST descrptor [3] to represent each mage. In MNIST, the dgt n each mage s well algned, so the gray values of each mage can be treated as a 784-dmenson feature vector. Snce a major porton of pxels are clean background pxels, each feature vector has a sparse form. Images n NUS-WIDE are n larger sze and contan lots of detal nformaton. In the experments, we used the 5-dmensonal Bag-of-Words [2] feature vector bult from SIFT descrptons for mage retreval. The proposed M-SLP method can work wth varous classfers. In the experments, we chose lnear SVM as the model of hash functon n order to meet the effcency requrements of mage retreval. The lnear model s effcent n the predcton phase, whch s very mportant to the ndexng tme. In the mplementaton, we employed the LIBLINEAR [45] whch has low tme complexty and good classfcaton accuracy. The man parameters are set as default values provded by LIBLINEAR,.e., cost C =, dual maxmal volaton tolerance ǫ =..

9 8 Fg. 3. Random samples from the CIFAR- dataset. Each row contans mages of one class. B. Evaluaton Protocols and Baselne Methods In the experments, we evaluated the proposed M-ITQ and M-SLP methods n both unsupervsed and supervsed settngs. In the unsupervsed settng, we used the Eucldean neghbors as the ground truth. Smlar to [7], we used the average dstance of all query samples to the 5th nearest neghbor as a threshold to determne whether a pont n the dataset should be consdered as a true postve for a query. In the supervsed settng, we used class labels as the ground truth. In the CIFAR- dataset and the MNIST dataset, each mage has a sngle class label, then mages n the same class are consdered as the true neghbors to each other. Whle on NUS-WIDE dataset, we followed the protocol n [29] such that the ground truth s determned based on whether two samples share at least one semantc label. We randomly chose samples from each dataset as the query mages, and used the rest of the dataset as the target of the search. For M-ITQ, we used all samples but the query mages n the dataset as the tranng set. We randomly selected 2 samples from each dataset for tranng M- SLP because of ts relatvely hgh computatonal complexty. We used the same sze of tranng sets as descrbed n ther orgnal papers for all alternatve methods. We adopted the precson-recall curve to compare the overall performance of all methods. In our experments, t was computed by: precson = recall = Number of retreved relevant pars Total number of retreved pars Number of retreved relevant pars Total number of relevant pars (8) (9) For the gven queres, we ncrease the Hammng radus from to r to generate r + pars of precson and recall values, then the precson-recall curve s plotted. As a complement, we also calculated the mean average precson (map), whch s the area under the precson-recall curve. In practce, there are two major applcatons for the resulted hash bnary codes,.e. Hammng rankng and hash lookup. Hammng rankng compares the bnary code of the query wth all samples n the database, whch leads to lnear complexty but can be effcent thanks to the effcacy of the comparson operator of bnary codes. Hammng rankng s usually used wth longer code length. Hash lookup constructs a lookup table for the database. Wth the bnary code of a query, t retreves samples that fall wthn a bucket of the Hammng radus δ. To guarantee the effcency of retreval, the lookup table should not be too sparse and the bnary code should be compact. In our experments, we also compute the mean precson under dfferent hash code lengths for the Hammng radus δ and the top k returned samples of Hammng rankng: mean precson = Number of retreved relevant samples for query Total number of retreved samples for query Number of test samples (2) If there s nothng n the buckets (.e., no retreved samples) for certan Hammng radus δ and query sample, we consder t to be zero precson. We compared our methods wth some state-of-the-art unsupervsed hashng methods, whch nclude teratve quantzaton based on PCA (PCA-ITQ) [3], k-means hashng () [39], sphercal hashng () [38], unsupervsed sequental learnng hashng () [3], spectral hashng () [7], and localty senstve hashng () [6]. For supervsed or sem-supervsed hashng methods, we evaluated teratve quantzaton based on CCA (CCA-ITQ) [3], sem-supervsed hashng (S) [29] and sem-supervsed sequental projecton learnng hashng (S3PLH) [3]. In these methods, and our method M- ITQ can drectly construct multple hash tables, we denote them as -m and M-ITQ-m, respectvely. A summary on dfferent propertes of these methods s gven n Table I. Method Hash Functon Learnng Paradgm PCA-ITQ [3] lnear unsupervsed [39] nonlnear unsupervsed [38] nonlnear unsupervsed [3] lnear unsupervsed [7] nonlnear unsupervsed [6] lnear (data-ndepent) CCA-ITQ [3] lnear supervsed S [29] lnear sem-supervsed S3PLH [3] lnear sem-supervsed TABLE I SUMMARY OF PROPERTIES OF HAING METHODS UNDER COMPARISON. Through Table I, we observe that hash functons n PCA- ITQ, CCA-ITQ,, S, and S3PLH have the lnear form of hash functons, whch are the same as our methods. On the other hand,, and use nonlnear hash functons, but stll acheve a constant tme complexty for computng bnary codes.

10 9 5.6 map.5..5 map L c L c 7 9 (a) CIFAR- (b) MNIST Fg. 4. The map of dfferent parameter settngs for M-ITQ-m on CIFAR- and MNIST. C. Evaluaton of Unsupervsed Hashng Methods Unsupervsed hashng methods am at fndng the nearest neghbors of the query accordng to the Eucldean dstance. They were orgnally developed for mprovng the tme effcency of nearest neghbor search. Because the class labels are not avalable, the results dep on the dstrbuton of the data. We chose two datasets,.e., CIFAR- and MNIST whch have dstnctve data dstrbutons. GIST descrptors were extracted from CIFAR-, whch usually conssts of nonzero real numbers. On the other hand, features extracted from MNIST are sparse vectors, wth most entres beng zeros. After settng the ground truth by Eucldean dstance as n secton V-B, we compared,,,,, PCA-ITQ and the proposed method M-ITQ. There are two parameters n our method, the number of random vectors for one bt c, and the number of constructed hash tables L. We set them wth dfferent values rangng from to 9, and computed the map on each parameter settng. Fgure 4 shows the map of dfferent parameter settngs on CIFAR- and MNIST. When usng multple hash tables, we returned samples wthn certan Hammng dstance δ n all L hash tables. Therefore, the recall always goes up wth the ncrease of L, but the precson does not. Because map s computed as the area under the precson-recall curve, too large L wll decrease t. Large c can decrease the varance of the Eucldean dstance estmator accordng to Secton III-A, but may ncrease the approxmaton error n equaton (4). Because we use the egen-decomposton based method solve l, too large c wll dlute the nformaton of the egenvector correspondng to the largest egenvalue. We can see that too large or too small values for both L and c do not lead to good performance. Therefore, n the followng experments, we set L = 7 and c = 3 for both CIFAR- and MNIST. Fgures 5 and 6 show precson-recall curves for Eucldean neghbor retreval on CIFAR- and MNIST, respectvely. On CIFAR-, our method wth multple hash tables (M- ITQ-m) outperforms all alternatve methods when the code length s 32. When the code length equals 64 or 28, the performances of M-ITQ-m and PCA-ITQ are very close. Our method wth sngle a hash table (M-ITQ) outperforms all alternatves except PCA-ITQ, and ts performance s very close to PCA-ITQ when the code length s greater than 64. For the alternatve methods, and have sgnfcant mprovement when the code length ncreases.,, and do not work well on CIFAR-. On MNIST, and have better performance and performs the worst. Although both and our method are based on the p- stable dstrbuton, our method outperforms sgnfcantly because of the data-depent component. Ths superorty s more obvous wth short code length because our method takes the data dstrbuton nto consderaton. To take the quanttatve evaluaton of the hash technques one step further, we used the mean precson and recall of Hammng radus δ to evaluate dfferent methods for hash lookup. Smlar to many other hashng methods, we set the Hammng radus δ < 2, and computed the recall accordng to (9) and the mean precson accordng to (2). Fgures 7 and 8 llustrate these two measurements wth respect to the length of hashng codes, respectvely. When the hash code length r goes too large, the hash table becomes too sparse. For a gven query, the buckets wthn Hammng radus δ may contan nothng, so the precson s looked as zero. Therefore, the performance wth Hammng radus δ may degrade when the code lengthr ncreases. It s clear that our sngle hash table method M-ITQ outperforms alternatve sngle table methods. We also observe that the M-ITQ benefts from usng multple hash tables for hash lookup, because M-ITQ-m has demonstrated sgnfcant advantages over the alternatve methods. Furthermore, we make an emprcal comparson between M-ITQ-m and -m, under dfferent number of hash tables. Fgure 9 shows the map for -m and M-ITQ-m

11 Precson M-ITQ M-ITQ m Precson M-ITQ M-ITQ m Precson M-ITQ M-ITQ m (a) 32 bts.4.6 (b) 64 bts.4.6 (c) 28 bts Fg. 5. Precson-recall curves on CIFAR-, usng Eucldean ground truth. Precson M-ITQ.4 M-ITQ-m (a) 32 bts Precson M-ITQ.4 M-ITQ m (b) 64 bts Precson M-ITQ.4 M-ITQ m (c) 28 bts Fg. 6. Precson-recall curves on MNIST, usng Eucldean ground truth. Mean precson M-ITQ M-ITQ m Mean precson M-ITQ M-ITQ m (a) CIFAR- (b) MNIST Fg. 7. Mean precson n Hammng radus δ < 2 on CIFAR- and MNIST, usng Eucldean ground truth.

12 M-ITQ M-ITQ m M-ITQ M-ITQ m (a) CIFAR- (b) MNIST Fg. 8. n Hammng radus δ < 2 on CIFAR- and MNIST, usng Eucldean ground truth map map M-ITQ m m Number of hash tables L (a) CIFAR-.5 M-ITQ m m Number of hash tables L (b) MNIST Fg. 9. The map for -m and M-ITQ-m wth code length 48. wth the fxed code length r = 48. It s clear that our proposed method outperforms -m. And n most cases, both methods have better performance wth more hash tables, and the map of M-ITQ-m changes slghtly when L 5. It s also clear from the expermental results that the datandepent methods perform better wth longer code length. The reason for ths s that the data-ndepent methods rely on random projectons. The larger number of projectons they have, the more precse they can recover the orgnal dstance. On the other hand, data-depent methods capture the dstrbuton of data, so they usually have good performance wth relatvely short code length. D. Evaluaton of Supervsed Hashng Methods In some practcal applcatons, the neghborhood of a gven query s not based on smple metrc such as Eucldean dstance, but reles on the semantc smlarty such as whether two samples belong to the same class. Therefore, we used the class labels of mage samples as the ground truth. We compared the proposed M-SLP method wth several alternatve hashng methods ncludng [6], [7], S3PLH [3], S [29], and CCA-ITQ [3]. Fgure shows the mean average precson under dfferent code lengths on each dataset. The proposed M-SLP method acheves the best results on all three datasets, and performs better wth longer code. S3PLH gets the secondbest rank on both CIFAR- and MNIST. performs the second-best on NUS-WIDE. For the other methods under comparson, and S generate poor map, though S sometmes performs better than on NUS-WIDE. The map of CCA-ITQ degrades when the code length ncreases. The reason may be that CCA-ITQ s based on the Canoncal Correlaton Analyss whch usually has good performance wth low dmensonal output. If the dmensonalty of the output becomes hgher, the useless dmensons of output may be ntroduced and wll tarnsh the useful dmensons. It should be noted that some methods do not generate consstent performance on dfferent dataset. On CIFAR- and MNIST datasets, has lower mean average precson than S3PLH, but on NUS-WIDE, exceeds S3PLH. We can fnd that

13 map map map.9 M-SLP CCA ITQ S S3PLH (a) CIFAR- (b) MNIST (c) NUS-WIDE Fg.. The map on CIFAR-, MNIST, and NUS-WIDE, usng class label ground truth Mean precson.3 5 Mean precson Mean precson 2.9 M-SLP CCA ITQ S S3PLH (a) CIFAR (b) MNIST (c) NUS-WIDE Fg.. Mean precson of top 5 Hammng neghbors on CIFAR-, MNIST, and NUS-WIDE, usng class label ground truth. performs better than many supervsed methods on NUS- WIDE. Ths s because the Bag-Of-Words features used n NUS-WIDE can represent the content well,.e., the Eucldean dstance between the features can already gve a good retreval result. We also show n Fgure the mean precson of top 5 Hammng neghbors. The code lengthr s n range[32,28]. It s clear that M-SLP outperform the alternatves methods wth a large margn on all datasets. S3PLH performs well on CIFAR-, but has a low precson on NUS-WIDE. The mean precson of CCA-ITQ s lower wth longer code. In general, CCA-ITQ and S perform better wth compact hash code than other methods whch do not use the supervsed nformaton. Fnally, we show samples of retreved mages on the CIFAR- dataset n Fgure 2 wth false postves labeled by red rectangles. Ths fgure gves a qualtatve evaluaton of the retreval performance of dfferent methods. E. Computatonal Cost Table II shows the tranng and ndexng tme on CIFAR- from each method. All experments were mplemented usng MATLAB, and ran on a PC wth Core-7 3.4GHZ CPU and 6GB memory. does not have a tranng phase because t s a data-ndepent method. We fnd that M-SLP, and take the hghest tranng tme. In the tranng procedure of M-SLP, dong propagaton and tranng the SVM classfer cost the majorty of tme. Although the tranng phase of M-SLP s tme consumng, t can be boosted wth parallel computng because tranng each hash functon s ndepent. Almost all methods requre only short ndexng tme expect whch has a more complex nonlnear hash functon that takes a longer tme to get the bnary code. When multple hash tables are used, the tranng tme and ndexng tme wll be L tmes longer than the sngle hash table verson. Ths can also be reduced f we generate the hash functons and bnary codes n parallel. VI. CONCLUSION In ths paper, we have revewed the propertes of p-stable dstrbuton and shown how to ncorporate t wth tranng data n data depent settng. We have presented M- ITQ whch takes the dstrbuton of data nto consderaton. It combnes multple random projectons for mnmzng the dfferences between parwse dstances of bnary codes and orgnal vectors. Repeatng the same procedure r tmes, we can generate a vector n R r. We have also used an orthogonal transformaton to mnmze the thresholdng error, makng bnary codes accurately preserve the Eucldean dstance. Compared wth data-ndepent hashng such as, ths method mproves the performance under compact bnary codes. In practce, we can buld multple hash tables to mprove the precson and recall rate whle most data-depent hashng can only use a sngle hash table. For ANN search based on semantc smlarty, we ext our method wth supervsed nformaton. We have proposed a supervsed hashng method (M-SLP), whose tranng procedure s smlar to label propagaton. For each bt, we use the p-stable propertes to

14 3 Fg. 2. Qualtatve results on CIFAR-. We retreved 25 Hammng neghbors of some query examples under 48-bt hashng codes usng each methods, and show the false postves n red rectangle. Methods 32 bts 64 bts 28 bts 256 bts Tranng Tme Indexng Tme Tranng Tme Indexng Tme Tranng Tme Indexng Tme Tranng Tme Indexng Tme M-SLP M-ITQ S3PLH S PCA-ITQ CCA-ITQ TABLE II TRAINING AND INDEXING TIME (SECONDS) ON CIFAR-. assgn the quas bts to a porton of samples n the tranng set, and then optmze the assgnment of hash bts to the remanng samples accordng to the semantc smlarty. We have evaluated these two hashng methods on three publc mage data sets. Compared wth several state-of-the-art hashng approaches, the proposed methods have shown ther superorty. M-ITQ wth multple hash tables has acheved the best results for unsupervsed cases and M-SLP has produced the best performance for the supervsed settng. In the future, we wll expand ths dea to other problems such as clusterng or dmensonalty reducton. REFERENCES [] J. L. Bentley, Multdmensonal bnary search trees used for assocatve searchng, Communcatons of the ACM, vol. 8, no. 9, pp , 975. [2] S. Lazebnk, C. Schmd, and J. Ponce, Beyond bags of features: spatal pyramd matchng for recognzng natural scene categores, n Proceedngs of the IEEE Conference on Computer Vson and Pattern Recognton, vol. 2, 26, pp [3] A. Olva and A. Torralba, Modelng the shape of the scene: A holstc representaton of the spatal envelope, Internatonal Journal of Computer Vson, vol. 42, no. 3, pp , 2. [4] R. Weber, H.-J. Schek, and S. Blott, A quanttatve analyss and performance study for smlarty-search methods n hgh-dmensonal spaces, n Proceedngs of the Internatonal Conference on Very Large Data Bases, 998, pp [5] P. Indyk and R. Motwan, Approxmate nearest neghbors: towards removng the curse of dmensonalty, n Proceedngs of the Annual ACM symposum on Theory of computng, 998, pp [6] M. Datar, N. Immorlca, P. Indyk, and V. Mrrokn, Localty-senstve hashng scheme based on p-stable dstrbutons, n Proceedngs of the Annual Symposum on Computatonal Geometry, 24, pp [7] Y. Wess, A. Torralba, and R. Fergus, Spectral hashng, n Proceedngs of the Neural Informaton Processng Systems Conference, 28, pp [8] B. Kuls and T. Darrell, Learnng to hash wth bnary reconstructve embeddngs, Proceedngs of the Neural Informaton Processng Systems Conference, vol. 22, pp. 42 5, 29. [9] W. Lu, J. Wang, S. Kumar, and S.-F. Chang, Hashng wth graphs, n Proceedngs of the Internatonal Conference on Machne Learnng, 2, pp. 8. [] M. Norouz and D. J. Fleet, Mnmal loss hashng for compact bnary codes, n Proceedngs of the Internatonal Conference on Machne Learnng, 2, pp [] B. Kuls and K. Grauman, Kernelzed localty-senstve hashng, IEEE

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

MULTI-VIEW ANCHOR GRAPH HASHING

MULTI-VIEW ANCHOR GRAPH HASHING MULTI-VIEW ANCHOR GRAPH HASHING Saehoon Km 1 and Seungjn Cho 1,2 1 Department of Computer Scence and Engneerng, POSTECH, Korea 2 Dvson of IT Convergence Engneerng, POSTECH, Korea {kshkawa, seungjn}@postech.ac.kr

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Laplacian Eigenmap for Image Retrieval

Laplacian Eigenmap for Image Retrieval Laplacan Egenmap for Image Retreval Xaofe He Partha Nyog Department of Computer Scence The Unversty of Chcago, 1100 E 58 th Street, Chcago, IL 60637 ABSTRACT Dmensonalty reducton has been receved much

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Image Alignment CSC 767

Image Alignment CSC 767 Image Algnment CSC 767 Image algnment Image from http://graphcs.cs.cmu.edu/courses/15-463/2010_fall/ Image algnment: Applcatons Panorama sttchng Image algnment: Applcatons Recognton of object nstances

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Linear Cross-Modal Hashing for Efficient Multimedia Search

Linear Cross-Modal Hashing for Efficient Multimedia Search Lnear Cross-Modal Hashng for Effcent Multmeda Search Xaofeng Zhu Z Huang Heng Tao Shen Xn Zhao College of CSIT, Guangx Normal Unversty, Guangx, 544,P.R.Chna School of ITEE, The Unversty of Queensland,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Recognizing Faces. Outline

Recognizing Faces. Outline Recognzng Faces Drk Colbry Outlne Introducton and Motvaton Defnng a feature vector Prncpal Component Analyss Lnear Dscrmnate Analyss !"" #$""% http://www.nfotech.oulu.f/annual/2004 + &'()*) '+)* 2 ! &

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Learning an Image Manifold for Retrieval

Learning an Image Manifold for Retrieval Learnng an Image Manfold for Retreval Xaofe He*, We-Yng Ma, and Hong-Jang Zhang Mcrosoft Research Asa Bejng, Chna, 100080 {wyma,hjzhang}@mcrosoft.com *Department of Computer Scence, The Unversty of Chcago

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines. CS534 - Machine Learning Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

The Discriminate Analysis and Dimension Reduction Methods of High Dimension

The Discriminate Analysis and Dimension Reduction Methods of High Dimension Open Journal of Socal Scences, 015, 3, 7-13 Publshed Onlne March 015 n ScRes. http://www.scrp.org/journal/jss http://dx.do.org/10.436/jss.015.3300 The Dscrmnate Analyss and Dmenson Reducton Methods of

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Gender Classification using Interlaced Derivative Patterns

Gender Classification using Interlaced Derivative Patterns Gender Classfcaton usng Interlaced Dervatve Patterns Author Shobernejad, Ameneh, Gao, Yongsheng Publshed 2 Conference Ttle Proceedngs of the 2th Internatonal Conference on Pattern Recognton (ICPR 2) DOI

More information

Random Kernel Perceptron on ATTiny2313 Microcontroller

Random Kernel Perceptron on ATTiny2313 Microcontroller Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

An efficient method to build panoramic image mosaics

An efficient method to build panoramic image mosaics An effcent method to buld panoramc mage mosacs Pattern Recognton Letters vol. 4 003 Dae-Hyun Km Yong-In Yoon Jong-Soo Cho School of Electrcal Engneerng and Computer Scence Kyungpook Natonal Unv. Abstract

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

A Posteriori Multi-Probe Locality Sensitive Hashing

A Posteriori Multi-Probe Locality Sensitive Hashing A Posteror Mult-Probe Localty Senstve Hashng Alexs Joly INRIA Rocquencourt Le Chesnay, 78153, France alexs.joly@nra.fr Olver Busson INA, France Bry-sur-Marne, 94360 obusson@na.fr ABSTRACT Effcent hgh-dmensonal

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity ISSN(Onlne): 2320-9801 ISSN (Prnt): 2320-9798 Internatonal Journal of Innovatve Research n Computer and Communcaton Engneerng (An ISO 3297: 2007 Certfed Organzaton) Vol.2, Specal Issue 1, March 2014 Proceedngs

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros. Fttng & Matchng Lecture 4 Prof. Bregler Sldes from: S. Lazebnk, S. Setz, M. Pollefeys, A. Effros. How do we buld panorama? We need to match (algn) mages Matchng wth Features Detect feature ponts n both

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information