Knowledge-Based Systems

Size: px

Start display at page:

Download "Knowledge-Based Systems"

Brooke Garrett
5 years ago
Views:

Knowledge-Based Systems 54 (13) 137 146 Contents lsts avalable at ScenceDrect Knowledge-Based Systems journal homepage: www.elsever.

Aeronautcs and Astronautcs, Nanjng 2116, P.R.

1 Knowledge-Based Systems 54 (13) Contents lsts avalable at ScenceDrect Knowledge-Based Systems journal homepage: Mult-vew classfcaton wth cross-vew must-lnk and cannot-lnk sde nformaton Qang Qan a, Songcan Chen a,, udong Zhou b a Department of Computer Scence and Engneerng, Nanjng Unversty of Aeronautcs and Astronautcs, Nanjng 2116, P.R. Chna b Informaton Engneerng College, Yangzhou Unversty, Yangzhou 2259, P.R. Chna artcle nfo abstract Artcle hstory: Receved 22 October 12 Receved n revsed form 1 September 13 Accepted 5 September 13 Avalable onlne 9 October 13 Keywords: Classfcaton Mult-vew learnng Wthout correspondence Unpared mult-vew data Cross-vew sde nformaton Sde nformaton, lke must-lnk (ML) and cannot-lnk (CL), has been wdely used n sngle-vew classfcaton tasks. However, so far such nformaton has never been appled n mult-vew classfcaton tasks. In many real world stuatons, data wth multple representatons or vews are frequently encountered, and most proposed algorthms for such learnng stuatons requre that all the mult-vew data should be pared. Yet ths requrement s dffcult to satsfy n some settngs and the mult-vew data could be totally unpared. In ths paper, we propose an learnng framework to desgn the mult-vew classfers by only employng the weak sde nformaton of cross-vew must-lnks (CvML) and cross-vew cannotlnks (CvCL). The CvML and the CvCL generalze the tradtonal sngle-vew must-lnk (SvML) and sngle-vew cannot-lnk (SvCL), and to the best of our knowledge, are frst defntely ntroduced and appled nto the mult-vew classfcaton stuatons. Fnally, we demonstrate the effectveness of our method n our experments. Ó 13 Elsever B.V. All rghts reserved. 1. Introducton Tradtonal learnng only nvolves data wth sngle vew. However, n many real world crcumstances, data wth multple natural feature representatons or vews are frequently encountered. For example, web pages can be represented by both ts own content and hyperlnks pontng to t. To tackle ths data type, mult-vew learnng has been developed snce the poneer works [7,31]. So far many approaches [7,8,24,] have been proposed and acheved emprcal and theoretcal successes. All of those approaches rely on two common assumptons, compatblty and ndependence between vews [7]. However, to make the two assumptons work, one requrement that should be fulflled s such mult-vew data should be pared. Specfcally, for the representaton of a sample n one vew, there s a correspondng representaton pared n the other vew. Sometmes ths requrement s over-rgorous n some crcumstances. For nstance, n a wreless sensor network, collected data could be mssed or polluted durng data transmsson due to devce malfuncton or malcous attacks. Thus only partal data are pared whle the rest are unpared [15]. In[19,15,18,25], some methods have been proposed to deal wth ths scenaro. More extremely, n some crcumstances where all data are even unpared, for Correspondng author. Tel.: E-mal addresses: qan.qang.yx@gmal.com (Q. Qan), s.chen@nuaa.edu.cn (S. Chen), xdzhou@nuaa.edu.cn (. Zhou). example, web pages from Englsh Routers and French Routers are unpared. We may not easly know whch Englsh web page corresponds to whch French web page. Ths paper focuses on the most dffcult totally-unpared extreme crcumstance. Snce no parng nformaton between multple vews exsts, we ntroduce a new type of sde nformaton, called cross-vew must-lnk and cross-vew cannot-lnk, to help learnng. Must-lnk and cannot-lnk sde nformaton s usually used n the classfcaton [3,34] and clusterng learnng [33,28] on sngle-vew data (called SvML and SvCL n ths paper). Two samples belongng to the SvML set share the same label, whle n the SvCL set possess dfferent labels. Compared wth commonly-used supervsed labels, such SvMLs and SvCLs are weaker n characterzng supervson nformaton. Vrtually, we can nfer both the SvML and SvCL between samples f knowng ther label nformaton, but cannot reversely. The SvML and the SvCL only provde the label relatons between samples wthn vew, thus cannot help the totally-unpared mult-vew learnng. To acheve ths goal, some cross-vew relatons are needed. Consequently, n ths paper, we ntroduce the cross-vew must-lnk (CvML) and cross-vew cannot-lnk (CvCL). Two representatons n dfferent vews n CvML set ndcates that ther labels are the same, whle n CvCL set ndcates that ther labels are dfferent. Unlke SvML and SvCL, CvML and CvCL buld the mplct label relatons across dfferent vews. As a result, we can transfer mutually the learnng nformaton between dfferent vews through these CvMLs and CvCLs. Compared wth explct label nformaton, CvML and CvCL are lkewse also weaker /$ - see front matter Ó 13 Elsever B.V. All rghts reserved.

2 138 Q. Qan et al. / Knowledge-Based Systems 54 (13) n supervson lke SvML and SvCL for the same reason. Moreover, the pared nformaton belongs to the CvML because pared representatons must own the same label, but the CvML does not mean pared representatons because two representatons lnked by CvML could come from dfferent samples. Intutvely, by forcng the outputs of the target classfcaton functons n each vew to obey the CvML and the CvCL constrants, the outputs learned n one vew can be shfted to that n the other vew through both of them and ad the classfcaton learnng n that vew. To the best of our knowledge, such sde nformaton has never defntely been ntroduced and appled n mult-vew classfer desgn. The proposed framework s based on the classcal regularzaton frameworks [27,2] and the new regularzaton terms whch encode the CvMLs and the CvCLs sde nformaton,.e. forcng the outputs of the representatons n the CvMLs to be the same and the outputs of the representatons n the CvCLs to be dfferent. Snce the true (strongly-supervsed) labels are unknown, the classcal regularzaton framework has to be modfed by ntroducng probablstc ndcators whch ndcate how possble the sample belongs to a gven class. Such a modfed framework leads to a block-wse convex optmzaton problem whch can be teratvely solved effectvely by the classc block coordnate descent method wth guarantee of teratve convergence to a statonary pont [5]. Our experments demonstrate ts effectveness as well. We summarze our contrbuton as follows: We ntroduce and deepen the concepts of the CvML and the CvCL, whch are extensons of the SvML and the SvCL, to ad the jont classfcaton learnng n dfferent vews. We develop a classfcaton learnng framework whch utlzes the cross-vew sde nformaton to learn classfers n the tough unpared mult-vew settngs. The rest of the paper s organzed as follows. In Secton 2, we revew some related work. Then we ntroduce our framework n Secton 3. Next we llustrate our experment results n Secton 4. And fnally, we conclude ths paper and present the future work n Secton Related work Our work s related to both classfcaton learnng wth the SvML and SvCL sde nformaton and mult-vew classfcaton. Thus we revew the two parts respectvely. Snce the ML and CL sde nformaton has never been used n mult-vew classfcaton, we manly revew the related work on the sngle-vew crcumstance SvMLs and SvCLs for classfcaton The ML and CL sde nformaton n sngle vew has demonstrated ts value n classfcaton tasks. Yan et al. [3] formulated both MLs and CLs nto a convex parwse loss functon and ntegrated t nto the tradtonal margn-based learnng framework. Thus the proposed framework can handle both the label and MLs/CLs together. Nguyen and Caruana [22] ncorporated both MLs and CLs nto the margn-based learnng framework and proposed PCSVM algorthm. Zhang and Yan [34] frst transformed both the ML and the CL pars of samples nto a new space and learned an estmator there, then transformed the estmator back nto the orgnal sample space. They proved that the fnal estmator s sgnnsenstvely consstent wth the optmal decson boundary and gave out ts asymptotc varance. Rather than those drectly ncorporatng both the MLs and the CLs nto classfcaton models, metrc learnng goes along another lne. It frst learns a Mahalanobs metrc whch obeys MLs and CLs constrants, then uses the dstance-based classfers lke the k nearest neghbor to classfy the test data. Typcal works nclude [1,29,13,23]. The ML and CL sde nformaton s also used to learn proper kernel matrces for the later kernel machne algorthms. L et al. [21] forced the entres of kernel matrx correspondng to MLs and CLs to be 1 and respectvely and developed a kernel learnng algorthm PCP. PCP s computatonally ntensve because t s solved by Semdefnte Programmng (SDP). Hu et al. [17] proposed kernel propagaton method to avod solvng SDP on full kernel matrx. The man dea s frst to learn a small kernel matrx then propagate t nto full kernel matrx. The SvMLs and the SvCLs are also appled n other tasks lke clusterng, mage segmentaton et al. For user s reference we name a few works n typcal applcaton domans lke mage segmentaton [33], vdeo survellance [16], clusterng [28,3,32]. Despte so many works on the SvMLs and the SvCLs, the CvMLs and the CvCLs are almost never touched to the best of our knowledge Mult-vew learnng Mult-vew learnng s a very natural learnng settngs. It was frst touched n Yarowsky s [31] and Blum et al. s [7] works. Blum et al. proposed the renowned co-tranng algorthm. It alternatvely trans the predctor n one vew and uses the predcted labels to ad the tranng n another vew. Dasgupta et al. [9] proved a theoretcal PAC-style generalzaton bound of the co-tranng. Sndhwan et al. [24] ntroduced the co-regularzaton algorthm. The co-regularzaton algorthm drectly models the cross-vew agreement and ncorporates t nto a regularzaton framework. They ntroduced a famly of algorthms wth dfferent regularzaton frameworks (the classcal regularzaton framework and manfold regularzaton framework). The formulaton s a convex optmzaton problem rather than the style of alternatvely learnng on each vew lke the co-tranng. The formulaton s related to our framework to some extent. We wll compare wth t n Secton 3.4. Snce full pared mult-vew data are over-rgorous n some applcatons, some methods were proposed for partally pared crcumstance. Kmura et al. [18] consdered the stuaton where addtonal unpared data are provded and developed Sem Canoncal Correlaton Analyss (SemCCA) algorthm. They used both the pared and unpared data to regularze CCA through PCA-type penalty. Lampert and Kromer [19] proposed a modfed Maxmum Covarance Analyss algorthm for weakly-pared multmodal data. They guessed the parng between data vews and optmzed t along wth dmenson reducton parameter matrx. Blaschko et al. [6] modfed Kernel Canoncal Correlaton Analyss wth Laplacan regularzaton by usng unpared data and propose SemLRKCCA algorthm. However, they only embedded the local structure on constrants but not n ts objectve and had too many model parameters. The PPLCA algorthm propsed by Gu et al. [15] overcomes the shortcomngs of SemLRKCCA. PPLCA smultaneously embeds the local structure nto both objectves and constrants and has less model parameters. Sun et al. [25] developed dscrmnatve canoncal correlaton analyss n partally pared stuatons. They proposed DCCAM by estmatng the wthn-class and between-class correlaton on both the pared and unpared data. 3. Mult-vew learnng under cross-vew MLs and CLs In ths secton, we ntroduce our framework. Later, we compare our framework wth the co-regularzaton framework [24] at last. Our tranng process can formally be conducted n two steps: n the 1st step, we employ the avalable CvMLs and CvCLs to desgn a (sgn) classfer whch s used to decde whether any gven two

Q. Qan et al. / Knowledge-Based Systems 54 (13) 137 146 139 samples are from the same class or not.

3 Q. Qan et al. / Knowledge-Based Systems 54 (13) samples are from the same class or not. In fact, ths step does not nvolve any label, thus the classfer s not used to decde a real class of a sample but just returns the sgn label of ether +1 or 1 to ndcate whether the sample shares the same real class or not wth some tranng sample. In order to determne ts real label fnally, we need formally the 2nd step n whch we adopt a few labeled samples provded to determne whch real label the prevous sgn label corresponds to. Fg. 1 shows the two steps Learnng classfers determnng sgn labels Our framework s based on the classcal regularzaton framework [27,2] for supervsed learnng. It solves the followng optmzaton problem: 1 N mn Vðx ; c ; f Þþkkfk 2 H f 2H k 2 k ð1þ ¼1 where H k s an Reproducng Kernel Hlbert Space (RKHS) nduced by a kernel k, c s the label of sample x, and V s a loss functon, such as the squared loss or the hnge loss. In our settngs, only the CvMLs and CvCLs sde nformaton are at hand and the sample labels are unavalable. Consequently, we are not certan of whch classes the samples belong to. The brought uncertanty s handled by ntroducng the probablstc ndcators just lke fuzzy c-means clusterng algorthm [11], and Eq. (1) s modfed as follows 1 mn f 2H k ;u 2 N ¼1 r2fþ; g u 2 r Vðx ; c r ; f Þþkkf k 2 H k where u for each x s the probablstc ndcator vector whch subjects to non-negatvty and untary summaton constrants. +, are the postve and negatve labels. We let c + = 1 and c = 1 respectvely. Usually n order to avod hard assgnment of labels, the exponents of u r are set to 2 rather than 1 [11]. Eq. (2) looks lke the fuzzy c-means (FCM) formulaton, however essentally there s a major dfference: n FCM the clusterng performs n sample space, thus the clusterng centers are a set of vector prototypes to be optmzed and have the same dmensonalty as the sample space, whle our algorthm performs n (label) output space wth dfferent dmensonalty from the sample space, and the centers are fxed to 1 and 1. The CvMLs and the CvCLs are used to regularze the learnng n both vews. The underlyng prncple s to force the classfers outputs n separate vews to obey both the CvMLs and the CvCLs constrants. For CvMLs, we smply use the square of dfference of the correspondng outputs. It s a convex formulaton as follows: ðf x ðx Þ f y ðy j ÞÞ 2 ð3þ ð;jþ2m where M denotes the CvML set. It mples that large output dfferences would ncur large penaltes. But for the CvCLs, t s not so easy to formulate [14,26]. Here we employ Goldberg et al. s [14] method and formulate the CvCLs nto a convex penalty: ðf x ðx Þþf y ðy j ÞÞ 2 ð4þ ð;jþ2c Fg. 1. Formal two-step tranng process. ð2þ where C denotes the CvCL set. Note that the mnus n Eq. (3) s substtuted for the plus n Eq. (4). The penalty s zero f f x (x ) and f y (y j ) have the same absolute value but opposte sgns, thus mnmzng the penalty mples dfferent output labels. The trval case f x (x )=f y (y j ) = s avoded because t wll rase classfcaton error. Ths dea could also be appled n mult-vew clusterng and dmenson reducton tasks f we can properly penalze the outputs whch volate the CvMLs and the CvCLs constrants. For example, f the outputs of cluster algorthm are multnomal random varables, we can consder penalze large (small) Kullback Lebler dvergences of the representatons n MðCÞ. Integratng them together suggests the followng optmzaton problem: mn J ¼ 1 Nx f x;f y;u x ;u y 2 ¼1 r2fþ; g Ny þ 1 2 j¼1 r2fþ; g þ k 3ðN x þ N y Þ 2jMj þ k 3ðN x þ N y Þ 2jCj u x 2 r Vðx ; c k ; f x Þþ k 1 2 kf xk 2 H kx u y 2 jr Vðy j ; c k ; f y Þþ k 2 2 kf yk 2 H ky ðf x ðx Þ f y ðy j ÞÞ 2 ð;jþ2m ðf x ðx Þþf y ðy j ÞÞ 2 ð;jþ2c s:t: u x r >¼ ; uy jr >¼ ; for r n fþ; g u x þ þ ux ¼ 1; uy þ jþ uy ¼ 1 j where x 1 ;...; x Nx and y 1 ;...; y Ny are tranng representatons n two vews respectvely and N x, N y are the number of them. jmj and jcj are the cardnalty of M and C respectvely. We dvde the CvMLs and the CvCLs penaltes by jmj and jcj to balance the dfferent numbers of CvMLs and CvCLs. In ths formulaton, the frst two lnes are the probablstc classfcaton regularzaton framework n each vews, and the last two lnes are the CvMLs and the CvCLs penaltes. Note that besdes CvMLs and CvCLs, no label nformaton s used here. Thus the learned classfers can only gve out +1 and 1 sgn labels. Later, a few labeled samples wll be used to determne whch real label the sgn label corresponds to Optmzaton Wthout loss of generalty, we use the square loss and propose the regularzed least square (RLS) under Cross-Vew MLs and CLs (RLSCVMC) algorthm. It s easy to see that the representer theorem holds (see appendx for the proof), thus the mnmzer f H x ; f H y have the followng forms: f H x f H y ¼ Nx ¼ Ny j a k x ðx; x Þ b j k y ðy; y j Þ By substtutng them for f x, f y n Eq. (5), we get the followng optmzaton problem: mn J ¼ 1 Nx a;b;u x ;u y 2 ¼1 r2fþ; g Ny þ 1 2 j¼1 r2fþ; g u x 2 k ða T k x c k Þ 2 þ k 1 þ k 3ðN x þ N y Þ 2jMj þ k 3ðN x þ N y Þ 2jCj 2 at K x a u y 2 k ðb T k yj c k Þ 2 þ k 2 2 bt K y b ða T k x b T k yj Þ 2 ð;jþ2m ða T k x þ b T k yj Þ 2 ð;jþ2c s:t: u x r >¼ ; uy jr >¼ ; for r n fþ; g u x þ þ ux ¼ 1; uy þ jþ uy ¼ 1 j ð5þ ð6þ ð7þ

4 1 Q. Qan et al. / Knowledge-Based Systems 54 (13) where matrces K x, K y are kernel matrx wth ther every entry be k x (x, x j ) and k y (y, y j ) respectvely. For notaton convenence, we let k x ¼½kðx ; x 1 Þ;...; kðx ; x Nx ÞŠ T and k yj ¼½kðy j ; y 1 Þ;...; kðy ; y Ny ÞŠ T. Note that the objectve s convex wth respect to each component though s nonconvex per se. Classc block coordnate descent method could be used to solve ths problem [5]. The basc dea s optmzng one block varable whle keepng other varable blocks fxed and repeatng ths step untl meetng some stop crtera. Because the objectve functon value decreases constantly after each step, ths procedure guarantees to converge to a statonary pont [5]. We optmze (u x, u y ), a and b teratvely whle keepng the rest varable blocks fxed. When a and b are fxed, optmzng u x and u y can be decoupled nto two ndependent but analogous problems smlar to Fuzzy c-means. Eq. (8) calculates current optmal soluton u x, and u y has an analogous formula. u x r ¼ 1=ða T k x c r Þ 2 Pr2þ; 1=ðaT k x c r Þ 2 ð8þ When u x, u y and b s fxed, the objectve s convex and quadratc wth respect to a. The current optmal soluton could be obtaned by settng ts dervatve to zero. Eqs. (1) and (9) are a s coeffcents of lnear and quadratc terms. By solvng the lnear equatons H a a = g a, we get the current optmal a. H a ¼ Nx g a ¼ Nx r2fþ; g þ k 3ðN x þ N y Þ jcj r2fþ; g k 3ðN x þ N y Þ jcj u x 2 r k x k T x þ k 1 K x þ k 3ðN x þ N y Þ jmj k x k T x ð;jþ2c u x 2 r c k k x k 3ðN x þ N y Þ jmj b T k yj k x ð;jþ2c k x k T x ð;jþ2m b T k yj k x ð;jþ2m ð9þ ð1þ When u x, u y and a s fxed, b can be optmzed by analogy wth the updatng formula of a. We omt the related formula of b here. Note that we have closed-form solutons for updatng each varable block. We summarze the whole algorthm nto Algorthm 1. Algorthm 1. RLS under the CvMLs and the CvCLs Input: data matrx,y, number of class, maxmum teraton number MaxIter ntalze a, b, whle ter < MaxIter do step 1: update u x, u y wth Eq. (8) and ts analogy for u y. step 2: update a = H a ng a by solvng a lnear equaton. step 3: update b by analogy wth updatng a end whle 3.3. Determnng real labels of the sgn labels In tranng data, besdes the CvMLs and CvCLs avalable, a few labeled samples are also provded. Snce the sgn labels of those samples have already been known after the classfers are learned, the sgn labels and the real labels from the same samples can be connected. For example, f a sample has the +1 sgn label as well as the frst real class label, we may say the +1 sgn label corresponds to the frst real class label. Note that we can only need as few as two labeled samples, one for each class, to connect the sgn labels to real classes. Table 1 Used supervson n Co-RLS and RLSCVMC. Here the cross-vew data correspondence n Co-RLS s categorzed as ML. Y w : unlabeled data s only used n co-regularzaton term not n least square loss. Labeled Unlabeled CvML CvCL Co-RLS Y Y w Y N RLSCVMC N Y Y Y 3.4. Comparson wth co-regularzaton RLSCVMC s related to Sndhwan et al. s co-regularzaton framework [24]. They proposed a famly of algorthms n the co-regularzaton framework: the Co-Regularzed Least Squares (Co-RLS), the Co-Regularzed Laplacan SVM and Least Squares (Co-LapSVM, Co-LapRLS). Co-LapSVM and Co-LapRLS place ther root on Manfold Regularzaton framework [4] whle our RLSCVMC s bult on classcal regularzaton framework. Thus we do not compare wth them and only compare wth Co-RLS algorthm lsted as follows: mn fx;f y l ðf x ðx Þ c Þ 2 þ l l þ c 2 kf y k 2 H ky þ cc ðl þ uþ lþu 2 f y ðy Þ c þ c 1 kf x k 2 H kx ðf x ðx Þ f y ðy ÞÞ 2 ð11þ where l, u mean the numbers of labeled and unlabeled samples respectvely. The formulaton conssts of two classcal regularzaton framework for each data vew and a co-regularzaton term. We can see that both Co-RLS and our framework are based on the classcal regularzaton framework. However our framework estmates the loss on unlabeled data by ntroducng probablstc ndcator vectors whle Co-RLS only estmates the loss on the labeled data. From the perspectve that the pared data can be treated as the CvML, the co-regularzaton term s exactly the same as the CvML penalty term n Eq. (3). Apparently Co-RLS does not employ the CvCL supervson explctly whle our framework explctly penalze the volaton of the CvCLs. We summarze the dfferent knds of nformaton used n Co-RLS and our framework n Table Experment In ths secton, we show the emprcal study on RLSCVMC algorthm. We frst ntroduce the datasets used n our experments, then show the performance of RLSCVMC under dfferent numbers of the CvMLs and the CvCLs supervson. In the next, we llustrate the classfcaton performance under dfferent parameter settngs. Fnally, we compare RLSCVMC wth Co-RLS algorthm. In all of our experments, lnear kernel s used as n Sndhwan et al. s co-regularzaton paper [24] Dataset descrpton In our experments, four mult-vew datasets lsted below are used. The Multple Feature (handwrtten) dgt data set (MFD). 1 Ths dataset comes from UCI machne learnng repostory [12]. It conssts of features of handwrtten numerals ( 9 ) extracted from a collecton of Dutch utlty maps. patterns per class (for a total of patterns) have been dgtzed n bnary mages. These dgts have sx feature sets. Here we only choose two feature sets as two vews used n our experments. They 1

5 Q. Qan et al. / Knowledge-Based Systems 54 (13) Table 2 Expermental dataset. NumP: #postve, NumN: #negatve, VName: Vew Name. Dataset Vew 1 Vew 2 Vew Dm Num NumP NumN Vew Dm Num NumP NumN MFD Px 2 Mor 6 Course Page Lnk ORL Px 124 LBP 124 MRC EN FR are px (2 pxel averages n 2 3 wndows) and mor (6 morphologcal features). We use as postve class and 1 as negatve class. Course dataset. Ths dataset conssts of 151 web pages collected from Computer Scence department webstes at four unverstes: Cornell, Unversty of Washngton, Unversty of Wsconsn and Unversty of Texas. These web pages are categorzed nto two classes: course and non-course. Two vews are web page content and text on the lnks to the web page. ORL. Ths dataset s a face dataset. It contans two feature sets. One s the cropped face mage (32 32) and the other s the LBP feature extracted from the mage. The two feature sets are treated as two vews n experments. We use the frst two persons as postve and negatve classes. Multlngual Reuters Collecton (MRC). 2 Ths dataset s defned and provded by [1]. It contans collectons of fve languages (EN,FR,GE,SP,IT) from sx large Reuters categores (CCAT, C15, ECAT, E21, GCAT and M11) extracted from RCV1 and RCV2. Ths dataset s totally unpared. In our experments, we use Englsh and French webpages as two vews and the CCAT/C15 categores as the postve and negatve class. Detals are lsted n Tabel 2. Note that, the frst three datasets have pared data whle the last does not. In all our experments, we reduce the dmensons of Course and ORL dataset to 1 and by PCA, and the dmenson of MRC to 1 by LSA Performance examnaton of RLSCVMC In ths experment, we show the performance of RLSCVMC on dfferent numbers of the CvMLs and the CvCLs. Snce CvMLs and CvCLs are weak supervson, we want to know how many CvMLs and CvCLs are need to gve out a satsfed performance. So we tran our framework on dfferent number of tranng set and see the predcton results. So we use the followng expermental settngs. The numbers of the CvMLs and the CvCLs grow from %(N x + N y )/2 to 1%(N x + N y )/2 at an nterval of 1%(N x + N y )/2. For each CvMLs and CvCLs number combnaton, ten trals are run and the mean accuraces are reported. For each tral, we randomly and evenly splt the dataset nto tranng set and testng set. The CvML s constructed by randomly choosng two representatons wth the same label n the tranng set of two vews, and the CvCL s constructed by randomly choosng two representatons wth dfferent labels n the tranng set of two vews. The a, b varables are randomly ntalzed, and n our experments we fnd out that our framework s nsenstve to the ntal values. Tradtonal parameter selecton method Cross-valdaton s not applcable here due to the absence of labeled data. So we tune the parameter heurstcally. For all experments we set k 3 = 1 whch approxmately balances the loss term and cross-vew sde nformaton regularzaton term. k 1 s set to 1 for dataset MFD and 1 for the rest datasets and k 2 s 2 set as the same wth k 1 for convenence. We draw the 3D-bar graph of the mean accuraces n Fg. 2. All subfgures n Fg. 2 demonstrate that the performance s ncreasng wth the MLs and the CLs numbers. However the ncrease of performance s not monotonous wth respect to the number of the CvMLs and the CvCLs especally for the ORL dataset. That may be caused by the unstable learned predctors on these datasets. Table 3 lsts the average standard devatons on the four dataset. Among them, the ORL dataset gets a hgh average standard devaton and obtans a vbratng ncrease of performance, whle the Course and the MRC datasets have very small average standard devaton and ther ncreasng are almost monotonous. Fg. 4 plots the dagonal bars n Fg. 2 for each dataset. On the Course and the MRC datasets, ther accuraces both rse quckly to about at % supervson, then keep almost stable. Whle on the ORL dataset, the accuracy ncreases approxmately lnearly. In addton, on all but the MFD datasets, the accuraces on ther two vews assume smlar trend whch may be partly due to the co-regularzaton among dfferent vews. An nterestng phenomenon s that the accuracy does not ncrease when the number of CvMLs (CvCLs) ncreases and the number of CvCLs (CvMLs) s zero, especally on the Course, the MRC and the ORL datasets. Actually t s a degenerated soluton of RLSCVMC. We demonstrate ths phenomenon by a toy problem n Fg. 3. It s a two-vew two-class dataset. Each class n each vew s generated from a Gaussan dstrbuton. We draw the classfcaton hyperplane (the blue lne), and label postve and negatve area on the 2D plane on both vews. Fg. 3(a)/(b) depcts the stuaton when CvCLs/CvMLs does not exst. When only CvMLs exst n Fg. 3(a), the classfers n both vews gve all data the same label. Apparently ths soluton ncurs lttle penalty on the CvMLs regularzaton because of the same label. Furthermore, due to adapton of the probablstc ndcator vector u x, u y n Eq. (2), t also ncurs small classfcaton loss. When only the CvCLs exst, the stuaton s smlar. The classfers gve data n dfferent vews opposte labels, thus ncur lttle penalty on the CvCLs regularzaton. And the classfcaton loss n Eq. (2) s also small. To avod ths knd of degenerated soluton, both the CvMLs and the CvCLs supervson are needed. As we see n Fg. 2(c), (f), (g), and (h), the accuraces dramatcally ncrease from %(N x + N y )/2 to 1%(N x + N y )/2 CvMLs (CvCLs) Parameter study In ths experment, we study the accuraces under dfferent parameter settngs. The experment settng follows the prevous experment. The parameters k 1, k 2 are set to be the same and are chosen from [1e1, 1e2, 1e3, 1e4, 1e5]. The parameter k 3 s chosen from [1e, 1e1, 1e2, 1e3, 1e4, 1e5]. The generaton of tranng data and testng data s the same wth the prevous experment. For each parameter settng, ten trals are run and the mean accuraces are reported. Due to the space lmtaton, we only llustrate the results when the number of the CvMLs and the CvCLs are set to be 3%(N + N y )/2. We also check the results of dfferent

6 142 Q. Qan et al. / Knowledge-Based Systems 54 (13) Testng of Px, ORL Testng of LBP, ORL Number of Must lnk Number of Must lnk Number of Cannot lnk Number of Cannot lnk (a) ORL-Px (b) ORL-LBP Testng of Px, MFD Testng of Mor, MFD Number of Must lnk Number of Must lnk Number of Cannot lnk Number of Cannot lnk (c) MFD-Px (d) MFD-Mor Testng of Page, Course Testng of Lnk, Course Number of Must lnk Number of Must lnk Number of Cannot lnk Number of Cannot lnk (e) Course-Page (f) Course-Lnk Testng of En, MRC Testng of Fr, MRC Number of Must lnk Number of Must lnk Number of Cannot lnk Number of Cannot lnk (g) MRC-EN (h) MRC-FR Fg. 2. Performance of LSCVMC. numbers of the CvMLs and the CvCLs and observe the smlar results. Fg. 5 shows the heat map of the accuraces of dfferent parameter settngs. On general, the accuraces do not vares very much under dfferent parameter settngs. It mples that our framework s not qute senstve to parameters. The ORL dataset has a relatvely unstable accuraces, whch could be caused by

7 Q. Qan et al. / Knowledge-Based Systems 54 (13) Table 3 Average standard devatons ORL MFD Course MRC Vew Vew the unstable classfers learned from too lttle tranng data rather than by dfferent parameter settngs. The accuraces on the MFD dataset keep above 9% for the most of the parameters and only drop on two group of parameters. On the Course dataset, the accuraces keep hgh when nether parameters are too large or too small Comparson wth Co-RLS In ths experments we compare RLSCVMC wth Co-RLS to test the effectveness of RLSCVMC. We dd not conduct more comparson experments wth other algorthms, manly because the ntroduced CvMLs and CvCLs concepts are relatvely new. So far as we know, at present there have not had related work based on such sde nformaton yet. However, loosely speakng, Co-RLS can be vewed as a related work. Snce Co- RLS only works only on full pared data, only the ORL, the MFD and the Course dataset are used n ths experment. In ths experment, the parameters n RLSCVMC are set to the same as n the above experment. For Co-RLS, we set l = 1 and k 1 = k 2 n Eq. (11) to make the parameter settng n Co-RLS smlar 1 Vew 1 1 Vew 2 2 Vew 1 3 Vew Negatve 1 Postve Negatve 5 4 Postve 25 Postve 3 5 Negatve Negatve Must lnks, Cannot lnks (a) CvMLs=1,CvCLs= 4 Postve Must lnks, 1 Cannot lnks (b) CvMLs=,CvCLs=1 Fg. 3. Degenerated soluton of RLSCVMC when the CvMLs or the CvCLs does not exst % 1% % 3% % 5% 6% 7% 8% 9% 1% Number of MLs and CLs ORL Px MFD Px Course Page MRC En ORL LBP MFD Mor Course Lnk MRC Fr Fg. 4. Trend of ncreasng from small to large set of the CvMLs and the CvCLs.

Course-Lnk Fg. 5. Accuraces under dfferent parameter settngs.

Red color means a hgh accuracy, whle blue color means a low accuracy.

ths artcle.) to that n RLSCVMC. We select the two parameters n Co-RLS by fvefold cross valdaton both n {.1,.

We test the performance on dfferent portons of labeled tranng data.

Then we choose a part of tranng set, whch ncreases from 1%N to 1%N at an nterval of 1%N, where N s the

8 144 Q. Qan et al. / Knowledge-Based Systems 54 (13) (a) ORL-Px (b) ORL-LBP (c) MFD-Px (d) MFD-Mor (e) Course-Page (f) Course-Lnk Fg. 5. Accuraces under dfferent parameter settngs. The x and y axes ndcates the two parameters n our framework. The color ndcates the accuraces. Red color means a hgh accuracy, whle blue color means a low accuracy. (For nterpretaton of the references to color n ths fgure legend, the reader s referred to the web verson of ths artcle.) to that n RLSCVMC. We select the two parameters n Co-RLS by fvefold cross valdaton both n {.1,.1, 1, 1, 1}. The means and standard devatons of accuraces are lsted n Table 4. We test the performance on dfferent portons of labeled tranng data. We frst randomly select half as tranng set and the rest as testng set. Then we choose a part of tranng set, whch ncreases from 1%N to 1%N at an nterval of 1%N, where N s the number of tranng samples, as the labeled set. The rest s used as unlabeled set. In the next, we create every possble the CvMLs and the CvCLs from the labeled tranng set for RLSCVMC algorthm. Thus the supervsed nformaton used n Co-RLS and RLSCVMC s the same. And the dfference s that Co-RLS drectly uses the labels whle RLSCVMC frst converts the labels nto the CvMLs and the CvCLs and uses them nstead. Note that, besdes

9 Q. Qan et al. / Knowledge-Based Systems 54 (13) Table 4 comparson between RLSCVMC and Co-RLS. The boldface means t-test s passed. RLSCVMC Co-RLS ORL-Px ORL-LBP ORL-Px ORL-LBP 1%.58 ± ± ± ±.144 %.55 ±.31.6 ± ± ±.148 3%.68 ± ± ± ±.119 %.71 ± ± ± ±.91 5%.737 ± ± ± ±.57 6%.87 ± ± ± ±.153 7%.827 ± ± ± ±.65 8%.882 ±.77.8 ± ± ±.55 9%.882 ± ± ±.81.8 ±.92 1%.917 ± ±.48.9 ± ±.42 MFD-Px MFD-Mor MFD-Px MFD-Mor 1%.992 ± ± ± ±.8 %.991 ± ± ± ±.7 3%.99 ± ± ± ±.7 %.988 ± ± ± ±.7 5%.988 ± ± ± ±.8 6%.988 ± ± ± ±.5 7%.991 ± ± ± ±.6 8%.991 ± ± ±.1.99 ±.6 9%.986 ± ± ± ±.7 1%.991 ± ± ± ±.6 Course-Page Course-Lnk Course-Page Course-Lnk 1%.923 ± ± ± ±.19 %.943 ± ± ±.13.9 ±.16 3%.953 ± ± ± ±.14 %.954 ± ± ± ±.13 5%.96 ± ± ± ±.12 6%.962 ± ± ± ±.8 7%.964 ± ± ± ±.11 8%.963 ± ±.1.96 ± ±.12 9%.967 ± ± ± ±.6 1%.964 ± ± ± ±.5 the label set, Co-RLS also makes full use of all the unlabeled samples and the parng nformaton on the unlabeled sample set whle RLSCVMC does not. Thus n fact, Co-RLS uses much more nformaton than RLSCVMC. In such a crcumstance, the current comparsons wll naturally be much more prone to Co-RLS than our model. Through comparng the results of both Co-RLS and RLSCVMC n Table 4, t can be wtnessed that our RLSCVMC exhbts comparable performance on two datasets, and sgnfcant performance on MFD dataset. Though so, we can fnd that current RLSCVMC can stll leave a further room for ts performance promoton snce the present expermental settng s more favorable for Co-RLS than for our model. In addton, how to utlze CvMLs and CvCLs more effectvely s stll needed. On the ORL dataset, the performance of RLSCVMC and Co-RLS s comparable. Most of the results are statstcally not sgnfcant except two (% ORL-Px, and 7% ORL-LBP). On the Px vew under % labeled samples, the accuracy of Co- RLS s hgher than RLSCVMC by over 1 percent, however ts varance s also hgh (.128) thus s not so convncng. RLSCVMC outperforms Co-RLS on the MFD dataset especally on the Px vew. On ths vew, all the results are sgnfcantly better than Co-RLS s. On the Mor vew, three of ten results are better whle the rest are comparable. Note that on the Px vew, RLSCVMC acheves hgh accuraces even under a small labeled set whle Co- RLS does not. Wth less than or equal to 3% labeled set, RLSCVMC yelds an average 4.2% hgher accuracy. On the Course dataset, RLSCVMC stll beats Co-RLS. On the Page vew, half of the accuraces of RLSCVMC are sgnfcantly hgher than Co-RLS, and the rest are comparable. On the Lnk vew, each of both algorthms obtans a sgnfcantly hgher accuracy. So, RLSCVMC, whch only uses CvMLs and CvCLs, demonstrates ts learnng ablty, and t acheves better results on the MFD and the Course datasets, and comparatve results on the ORL dataset by usng less nformaton than Co-RLS. As dscussed n Secton 3.4, RLSCVMC estmates the classfcaton loss on unlabeled data and employs the CvCLs explctly, whch makes t acheve better relatvely performance than Co-RLS. 5. Concluson and future work In ths paper, we develop a framework whch utlzes the crossvew sde nformaton, specfcally the CvMLs and the CvCLs, to learn classfers n mult-vew crcumstance where vew data are totally not pared. We show the effectveness of our framework and demonstrate why the solutons are degenerated when only the CvMLs (CvCLs) are avalable. In our comparatve experments, we observe that our framework acheves better performance than Co-RLS algorthm under the same supervson. There are stll some problems deserved to study n the future. So far, our framework only works for two-vew dataset because of the lmtaton of modelng. How to extend t to the dataset wth more than two vews s a practcal and mportant queston. Furthermore, our CvMLs and CvCLs are general mult-vew sde nformaton. They are not just lmted to classfcaton tasks. How to apply them nto mult-vew clusterng and dmenson reducton tasks under totally-unpared dataset s also deserve examnng. The performance of RLSCVML only gets one sgnfcant mprovement comparng wth Co-RLS, although the experment s more favorable for Co-RLS. There s stll a further room for the performance promoton, and ths s our next work. Acknowledgments Ths work was supported n part by the NSFC of Chna Grant Nos , Natural Scence Foundaton of the Hgher Educaton Insttutons of Jangsu under Grant No. 12KJB518 and Sponsored by Jangsu QngLan Project. Appendx A of Eq. (5) admt the represen- Theorem 5.1. The optmzers fx H; f y H tatons of the form respectvely f H x f H y ¼ Nx ¼ Ny j a k x ðx; x Þ b j k x ðy; y j Þ ð12þ ð13þ Proof. We decompose f x 2H Kx (f y 2H Ky ) nto two parts. The frst part s n the subspace spanned by kernel functons k x ðx 1 ; Þ;...; k x ðx Nx ; Þ ðk y ðy 1 ; Þ;...; k y ðy Ny ; ÞÞ, and the second part s n ts orthogonal complement. f x ¼ f xk þ f x? ¼ Nx a kðx ; xþþf x? ðf y ¼ f yk þ f y? ¼ Ny b j kðy j ; yþþf y? Þ Then we may wrte f x (x k ) and f y (y k )as f x ðx k Þ¼ Nx a kðx ; x k Þþf x? ðx k Þ¼ Nx a kðx ; x k Þþ < f x? ; kðx k ; xþ > ¼ Nx a kðx ; x k Þ ð14þ ð15þ ð16þ

10 146 Q. Qan et al. / Knowledge-Based Systems 54 (13) f y ðy k Þ¼ Ny b j kðy j ; y k Þþf y? ðy k Þ¼ Ny b j kðy j ; y k Þþ < f y? ; kðy k ; yþ > ¼ Ny j b j kðy ; y k Þ And for all f x\ and f y\ we have ð17þ kf x k 2 H x ¼kf xk þ f x? k 2 H x ¼kf xk k 2 H x þkf x? k 2 H x P kf xk k 2 H x ð18þ kf y k 2 H y ¼kf yk þ f y? k 2 H y ¼kf yk k 2 H y þkf y? k 2 H y P kf yk k 2 H y ð19þ Thus for any fxed a, b j, the functon value of objectve s mnmzed for f x\ = and f y\ =. Snce these are also solutons, the theorem holds. h References [1] Massh-Reza Amn, Ncolas Usuner, Cyrl Goutte, Learnng from multple partally observed vews an applcaton to multlngual text categorzaton, n: Advances n Neural Informaton Processng Systems (NIPS 9), 1. [2] B. Schoelkopf, A.J. Smola, Learnng wth Kernels, MIT Press, Cambrdge, MA, 2. [3] S. Basu, A. Banerjee, R.J. Mooney, Actve sem-supervson for parwse constraned clusterng, n Proc. of the SIAM nternatonal conference on data mnng (SDM 4), 4. [4] M. Belkn, P. Nyog, V. Sndhwan, Manfold regularzaton: a geometrc framework for learnng from labeled and unlabeled examples, The Journal of Machne Learnng Research 7 (6) [5] D.P. Bertsekas, W.W. Hager, O.L. Mangasaran, Nonlnear Programmng, Athena Scentfc Belmont, MA, [6] M. Blaschko, C. Lampert, A. Gretton, Sem-supervsed Laplacan regularzaton of kernel canoncal correlaton analyss, Machne Learnng and Knowledge Dscovery n Databases (8) [7] A. Blum, T. Mtchell, Combnng labeled and unlabeled data wth co-tranng, n: Proc. of the 11th Annual Conference on Computatonal Learnng Theory (COLT 1998), [8] U. Brefeld, T. Scheffer, Co-em support vector learnng, n: Proc. of the 21st Internatonal Conference on Machne Learnng (ICML 4), 4. [9] S. Dasgupta, M.L. Lttman, D. McAllester, Pac generalzaton bounds for cotranng, n: Advances n Neural Informaton Processng Systems (NIPS 2), 2. [1] J.V. Davs, B. Kuls, P. Jan, S. Sra, I.S. Dhllon, Informaton-theoretc metrc learnng, n: Proceedngs of the 24th Internatonal Conference on Machne learnng (ICML7), 7. [11] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classfcaton, Wley, New York, 1. [12] A. Frank, A. Asuncon, UCI Machne Learnng Repostory, 1. [13] A. Globerson, S. Rowes, Metrc learnng by collapsng classes, n: Advances n Neural Informaton Processng Systems (NIPS6), 5. [14] A.B. Goldberg,. Zhu, S. Wrght, Dssmlarty n graph-based sem-supervsed classfcaton, n: Proc. of the 11th Artfcal Intellgence and Statstcs (AISTATS 7), 7. [15] Jngjng Gu, Songcan Chen, Tngka Sun, Localzaton wth ncompletely pared data n complex wreless sensor network, IEEE Transactons on Wreless Communcatons (11). [16] T. Hertz, N. Shental, A. Bar-Hllel, D. Wenshall, Enhancng mage and vdeo retreval: learnng va equvalence constrants, n: IEEE Computer Socety Conference on Computer Vson and Pattern Recognton (CVPR 3), 3. [17] E. Hu, S. Chen, D. Zhang,. Yn, Semsupervsed kernel matrx learnng by kernel propagaton, IEEE Transactons on Neural Networks 21 (1) [18] A. Kmura, H. Kameoka, M. Sugyama, T. Nakano, E. Maeda, H. Sakano, K. Ishguro, Semcca: Effcent sem-supervsed learnng of canoncal correlatons, n: Proc. of the th Internatonal Conference on Pattern Recognton (ICPR 1), 1. [19] C. Lampert, O. Kromer, Weakly-pared maxmum covarance analyss for multmodal dmensonalty reducton and transfer learnng, n: Proc. of the 11th European Conference on Computer Vson (ECCV 1), 1. [] G. L, S.C.H. Ho, K. Chang, Two-vew transductve support vector machnes, n: Proc. of the SIAM Internatonal Conference of Data Mnng (SDM 1), 1. [21] Z. L, J. Lu,. Tang, Parwse constrant propagaton by semdefnte programmng for sem-supervsed classfcaton, n: Proc. of the 25th Internatonal Conference on Machne Learnng (ICML 8), 8. [22] N. Nguyen, R. Caruana, Improvng classfcaton wth parwse constrants: a margn-based approach, Machne Learnng and Knowledge Dscovery n Databases (8) [23] S. Shalev-Shwartz, Y. Snger, A.Y. Ng, Onlne and batch learnng of pseudometrcs, n: Proceedngs of the 21st Internatonal Conference on Machne Learnng (ICML4), 4. [24] V. Sndhwan, P. Nyog, M. Belkn, A co-regularzaton approach to semsupervsed learnng wth multple vews, n: Workshop on Learnng wth Multple Vews at ICML, 5. [25] T. Sun, S. Chen, J. Yang,. Hu, P. Sh, Dscrmnatve canoncal correlaton analyss wth mssng samples, n: Computer Scence and Informaton Engneerng (CSIE 9), 9. [26] W. Tong, R. Jn, Sem-supervsed learnng by mxed label propagaton, n: Proc. of the 22th Natonal Conference on Artfcal Intellgence (AAAI 7), 7. [27] V.N. Vapnk, Statstcal Learnng Theory, Wley-Interscence, [28] K. Wagstaff, C. Carde, S. Rogers, S. Schrodl, Constraned k-means clusterng wth background knowledge, n: Proc. of the 18th Internatonal Conference on Machne Learnng (ICML 1), 1. [29] E.P. ng, A.Y. Ng, M.I. Jordan, S. Russell, Dstance metrc learnng, wth applcaton to clusterng wth sde-nformaton, n: Advances n Neural Informaton Processng Systems (NIPS2), 2. [3] R. Yan, J. Zhang, J. Yang, A.G. Hauptmann, A dscrmnatve learnng framework wth parwse constrants for vdeo object classfcaton, IEEE Transactons on Pattern Analyss and Machne Intellgence (6) [31] D. Yarowsky, Unsupervsed word sense dsambguaton rvalng supervsed methods, n: Proc. of the 33rd Annual Meetng on Assocaton for Computatonal Lngustcs (ACL 1995), [32] T. Yoshda, K. Okatan, A graph-based projecton approach for sem-supervsed clusterng, Knowledge Management and Acquston for Smart Systems and Servces (11) [33] S. Yu, J. Sh, Groupng wth drected relatonshps, n: Energy Mnmzaton Methods n Computer Vson and Pattern Recognton, 1. [34] J. Zhang, R. Yan, On the value of parwse constrants n classfcaton and consstency, n: Proc. of the 24th Internatonal Conference on Machne Learnng (ICML 7), 7.

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.