A Semi-Supervised Approach Based on k-nearest Neighbor
|
|
- Richard Nicholson
- 5 years ago
- Views:
Transcription
1 768 JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL 03 A Sem-Supervsed Approach Based on k-nearest Neghbor Zhlang Lu School of Automaton Engneerng, Unversty of Electronc Scence and Technology of Chna, Chengdu, P. R. Chna Emal: zhlang_lu@uestc.edu.cn Xaomn Zhao Department of Mechancal Engneerng, Unversty of Alberta, Edmonton, Canada Emal: xaomn@ualberta.ca Janxao Zou and Hongbng Xu School of Automaton Engneerng, Unversty of Electronc Scence and Technology of Chna, Chengdu, P. R. Chna Emal: {jxzou, hbxu}@uestc.edu.cn Abstract k-nearest neghbor (k-nn) needs suffcent labeled nstances for tranng n order to get robust performance; otherwse, performance deteroraton occurs f tranng sets are small. In ths paper, a novel algorthm, namely ordnal sem-supervsed k-nn, s proposed to handle cases wth a few labeled tranng nstances. Ths algorthm conssts of two parts: nstance rankng and sem-supervsed learnng (SSL). Usng SSL, the performance of k-nn wth small tranng sets can be mproved because SSL enlarges the tranng set by ncludng unlabeled nstances wth ther predcted labels. Instance rankng s used to pck up the unlabeled nstances that are ncluded to the tranng set. It gves prorty to the unlabeled nstances that are closer to class boundares because they are more lkely to be correctly predcted (these nstances are called hgh confdence predcton nstances). Thus, SSL benefts from hgh confdence predcton nstances f they are added nto the tranng set early. Expermental results demonstrate that the proposed algorthm outperforms the sem-supervsed k- NN and the conventonal k-nn algorthms f the tranng set s small. Index Terms Instance rankng; sem-supervsed learnng; k-nearest neghbor; weghted mean NOTATION U the gven dataset, ncludng the labeled dataset U l and the unlabeled dataset U u n ths paper, U=U l U U u ; n the number of features that descrbe an nstance n U; x (or z) an nstance (a pont) whch s an n-dmensonal column vector n U, and each dmenson represents a feature; N j the number of nstances of the j th class n U l ; N l the number of nstances n U l ; N u the number of nstances n U u ; Manuscrpt receved May 5, 0; revsed August 3, 0; accepted August 9, 0. L the number of classes (labels) n U l ; (j) x the th nstance wthn the j th class; k the number of nearest neghbors used for predcton n k-nn; d(x,z) the Eucldean dstance between two nstances x and z; NN (x) the th nearest neghbor wth respect to an unlabeled nstance x, =,,, k; CF mn the predefned threshold of confdence factor for hgh confdence predcton nstances; class(x) returns the true class of nstance x, and a hat on t ndcates a predcton class; κ kernel functons; n ths paper, Gaussan radal bass functon s adopted; σ the parameter, namely the wdth of features, n the Gaussan radal bass functon; R * the optmal Bayes probablty of error; K the number of folds, that s a parameter n K- fold cross-valdaton. I. INTRODUCTION The nearest neghbor rule (NNR) was orgnally proposed by Fx and Hodges [] for solvng dscrmnaton problems n 95. The theory behnd the NNR s that an unlabeled nstance n the nput space s lkely to have the same class as ts close nstances. Under ths rule, the nearest neghbor (-NN) classfer, a supervsed learnng algorthm, dentfes a class of an unlabeled nstance by fndng ts closest tranng nstance, takng note of ts class, and predctng ths class for the unlabeled nstance. Instead of lookng at one nearest nstance only, the k-nearest neghbor (k-nn) classfer ams to fnd k closest nstances n the tranng set and carres out the k nearest neghbor rule (knnr), typcally by the majorty votng for the conventonal k-nn, to make a predcton decson. In fact, the nearest neghbor classfer s a specal case of k-nn f k=. In references [] and [], t s concluded that k-nn reaches the optmal do:0.4304/jsw
2 JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL Bayes probablty of error R * n such a manner that k/n l 0 f underlyng probablty dstrbutons are assumed on the tranng set. Later on Cover and Hart [3] proved that the asymptotc optmalty of the -NN s n the followng nterval, * * L * R RNN R R L, () where R NN s the -NN probablty of error. They also ponted out that the nearest neghbor contans half classfcaton nformaton wthn an nfnte tranng set. From Eq. (), -NN reaches asymptotc optmalty the same as the optmal Bayes probablty of error f R * =0. In such condton, class boundares are clearly separated wth each other, and no nstances from dfferent classes overlap. Consderng jont dstrbuton cases, k-nn was proposed to address the sub-optmalty problem of NNR [4]. Mathematcally, the algorthm of k-nn can be expressed as ^ class( x) = k, () arg max f ( x, NN ( x)) δ ( class( NN ( x)), j) j=,,..., L = where δ(, j) s the Kronecker Delta Functon defned as, = j δ (, j) =, (3) 0, otherwse whch returns one f the two nputs are dentcal and zero; otherwse, f(x,z) s a votng weght functon that defnes how much mpact of a nearest neghbor z on x. For example, f(x,z) s specfed as f(x,z)=δ(class(x), class(z)), whch means that k-nn assgns an equal weght of one to each of the k nearest neghbors for the majorty votng. As a very popular classfcaton technque, k-nn has been well studed n the past and wdely appled to felds such as text processng [5] and fault dagnoss [6]. Reported studes on k-nn can be roughly grouped nto three topcs: on smlarty measurement, on k nearest neghbor rule (knnr), and on computaton cost. Conventonal metrcs for smlarty measurement of k- NN [7] nclude the Mnkowsk dstance, Eucldean dstance, Manhattan dstance, and Chebyshev dstance. A few new methods of smlarty measurement have been proposed recently. Wang et al. [8] developed an adaptve dstance metrc to mprove the performance of k-nn. Yu et al. [9] extended the kernel method nto the k-nn and termed t as kernel dstance metrc. Le et al. [6] proposed a feature-weghted method (TFSWT) that weghts each feature dfferently to overcome the shortcomngs of feature scalng senstvty. The knnr, the second topc on k-nn, s a strategy of predcton usng k nearest neghbors. It s so crtcal to the performance of k-nn that attracts attentons from many researchers. The majorty votng, one of the most popular and ntutve knnrs, assgns an unlabeled nstance a class that has a majorty vote among k nearest neghbors. Dudan [0] proposed a dstance-weghted functon that weghts the nearest neghbors closer to the unlabeled nstance more heavly than others. The functon s expressed as d( x, NNk( x)) = d( x, NN( x)) f ( xz, ) = d( x, NNk ( x)) d( x, z). (4) otherwse d( x, NNk( x)) d( x, NN( x)) Shepard [] further utlzed the exponental functon on Eucldean dstances to weght the votng for the nearest neghbors. Zeng et al. [] proposed the pseudo NNR that combnes the dstance weghted knnr wth the local mean learnng [3]. Because k-nn s an nstance-based learner and tme consumng, the thrd topc s also valuable to focus on. Instance selecton s an opton to address such a problem. Brghton and Mellsh [4] revewed nstance selecton methods over the past thrty years. A condensed NNR proposed by Hart [5] tends to retan tranng nstances along the class boundares and abandon nstances that are nsde a tranng set. By removng certan tranng nstances, computaton cost s mproved, to a certan extent. Nowadays, the k-nn s partcularly popular because poor run-tme performance s allevated wth the today s avalable computatonal power [7]. The varetes of k-nn revewed above seem robust f there are suffcent tranng data; but challenges such as classfcaton deteroraton, arse f a few tranng data avalable because labeled tranng data are sometmes dffcult, expensve or tme-consumng to obtan [4]. As a result, performance of classfers deterorates [] [6]. The reason of deteroraton s that small sze tranng sets contan only partal nformaton and are not suffcent to represent an ntrnsc structure of classes. That s to say classfers tend to be under-fttng wth a few tranng data. The sem-supervsed learnng (SSL) provdes a possble soluton to ths problem because t enlarges the tranng set wth qualfed unlabeled nstances, namely hgh confdence predcton nstances [7], and ther predcted labels. Xe et al. [4] proposed a sem-supervsed k- nearest neghbor classfer named De-YATSI. Lv [5] proposed a sem-supervsed learnng approach usng local regularzer and unt crcle class label representaton. Wajeed and Adlakshm [6] proposed a sem-supervsed method to ncrease accuracy of k-nn for text classfcaton. Gannakopoulos and Petrds [7] appled a sem-supervsed verson of Fsher dscrmnant analyss together wth k-nearest neghbor to a darzaton system. In ths paper, we utlze nstance rankng to sequence unlabeled nstances accordng a certan crteron. It s beleved that the performance could be mproved further by usng an ordnal/sequental unlabeled set rather than an arbtrary one. The dea behnd nstance rankng s that nstance closer to the class boundares are more lkely to be n the same class, that s, we have a hgher confdence n classfyng them correctly, and t s better to predct them as early as possble. These predcted nstances (more lkely to be correct) can then be utlzed by SSL for further tranng. Smply speakng, nstance rankng gves those mportant nstances prorty of beng predcted frst; by dong SSL sequentally rather than randomly, the tranng set s enlarged teratvely from the ntal class boundares and becomes more and more representatve of
3 770 JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL 03 the ntrnsc structure of the gven problem. The performance of classfers, therefore, can be mproved. The rest of the paper s structured as follows. Secton II detaled ntroduces nstance rankng and how t works wth SSL n the proposed ordnal sem-supervsed k-nn. Secton III llustrates the dea of the proposed algorthm usng smulated datasets and valdates the proposed algorthm usng benchmark datasets; Conclusons are made n Secton IV. II. THE ORIDINAL SEMI-SUPERVISED K-NN As stated n Secton I, k-nn suffers from poor classfcaton performance f the tranng data s lmted. To allevate ths problem, the ordnal sem-supervsed k- NN s proposed, as shown n Fg.. The proposed method contans two parts: nstance rankng and semsupervsed learnng. In the frst part, nstance rankng, unlabeled nstances are sorted by ther dstances to class boundares that are estmated on the tranng set. The second part called sem-supervsed learnng contnues to predct the top one unlabeled nstance untl all the unlabeled nstances are classfed. In another words, the nstance of the top one ranked n the frst part s classfed by SSL and updates the tranng set wth ts predcted label f t s consdered as the hgh confdence nstance. The tranng set may ncrease teratvely, and the procedure termnates untl all the unlabeled nstances are processed. ω( xx, ) ( j) ( j) κ( xx, ) = N j ( j) κ xx t = t, (6) (, ) where κ(x,z), the Gaussan radal bass functon (Gaussan RBF), denotes an evaluaton functon for the dstance between x and z, and t s expressed as d( xz, ) κ ( xz, ) = exp( ), (7) σ where 0< σ <, and 0< κ(x,z). An example n Fg. llustrates how to calculate the weghted mean. Class and class are marked wth red and green colors, respectvely. Each class has four () labeled nstances for tranng, denoted as x and x (), respectvely, where 4. x s one unlabeled nstance. Frst, we calculate dstance d(x, x () ) between x and every labeled nstance x (), where =,,3,4. Second, the weght ω(x, x () ) s avalable after evaluatng dstances between x and every nstance n class by κ(x,z) wth a predefned σ. At last, the weghted mean M (x) s computed accordng to Eq. (5). From Fg., we notce that the dstance between x and M (x) s knd of orthogonal dstance to the correspondng class boundary. The weghted mean of x wth respect to class, M (x), s obtaned followng the same procedure. () x M ( x) x () x M ( x) () x () () x () () Fgure. Scheme of the ordnal sem-supervsed k-nn () Class Class A. Instance Rankng Instance rankng ams to rank unlabeled nstances by ther relatve dstances to class boundares. Weghted mean s a tool to represent the class boundary. That s, the dstance between an unlabeled nstance and ts correspondng weghted mean of class j s an approxmate measurement of the dstance to the boundary of class j. Compared wth the nearest neghbor, the weghted mean s more robust to detect class boundares partcularly f the nearest neghbor s an outler, because t consders not only the nearest neghbor but also other nstances n a class. The weghted mean was frst used n [8] and [9] for the nonparametrc weghted feature extracton. Consder a gven dataset U havng L classes, and n U l class j ( j L) has N j nstances. The weghted mean of an unlabeled nstance x n U u wth respect to class j s defned as N j ( j) ( j) M j ( x) = ω(, ) = x x x, (5) where ω(x,x (j) ) s the weght of x (j) wth respect to x, and t s defned as Fgure. An example of how to calculate the weghted mean n a twoclass case Sgma, σ, n the Gaussan RBF s mportant to the weghted mean because t changes weght dstrbuton among nstances. Exceptonally, the weghted mean becomes the arthmetc mean, that s, N j ( j) M j ( x) = x, (8) = N j f σ. It s because κ(x,z) f σ, and then ω(x, x (j) ) tends to be dentcal of /N j for N j. Let us take an example to llustrate the mpact of sgma n the weghted mean. Suppose x () =(,3), x () =(,), () =(,), () =(3,), x () =(5,4), x () =(4,3), () =(5,), () =(6,3), and x=(.5,4) n Fg.. The weght dstrbuton and Eucldean dstances between an unlabeled nstance x and eght tranng nstances are shown n Fg. 3. Three dfferent values, sgma=0.5,.0, 5.0, are consdered n computng the weght dstrbuton of x. No matter what the value of sgma s, the nearest neghbor s always weghted heavest because κ(x,z) s monotonc wth respect to d(x,z), such as x () n class
4 JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL and x () n class. κ(x,z) wth a smaller sgma weghts the nearest neghbor more than that wth a bgger sgma, such as x () wth sgma = 0.5 and sgma = 5.0. In a word, a small sgma comes wth dscrmnant weghts among a class whle a bg sgma tends to evaluate nstances n a class dentcally. ω( x,z) d( x,z) x () x () x () x () () () σ=.0 σ=.0 σ=5.0 () () ω( x,z) d( x,z) x () x () x () x () () () σ=0.5 σ=.0 σ=5.0 Fgure 3. The weght dstrbutes of nstances wth respect to three dfferent sgma: 0.5,.0, and 5.0 A dstance factor (DF) s then proposed to represent the relatve closeness of the unlabeled nstance x to ts closest class boundary among all the boundares measured by the weghted means. It s defned as d( x, M j ( x)) DF( x) = mn, (9) j=,, LL L d( x, M ( x)) t = where 0 DF /L. There are L dstances between x and L class boundares. The dstance factor s the mnmum one after normalzed by the summaton of L dstances wth respect to x. So far we have a measurement of DF that ndcates how close of an nstance to ts nearest class boundary. A smaller DF mples that the nstance s closer to ts nearest class boundary. If all DFs of unlabeled nstances are ready, they are sorted n ascendng order and ready to be used n the next step. B. Sem-supervsed Learnng Sem-supervsed learnng s a framework of algorthms proposed to mprove the performance of supervsed algorthms through the use of both labeled and unlabeled data. Sem-supervsed learnng leads to be developed manly because n some cases wth lmted labeled data, t needs great effort and cost to label large amount of tranng data [0]. Generally, sem-supervsed learnng may be acheved by ether co-tranng or self-tranng []. Co-tranng assumes that features can be dvded nto two ndependent sets, and each subset s suffcent to tran a classfer. Two classfers are traned on two vews separately so that classfers can help wth each other by addng one s most confdence predcton nstance nto the other s tranng set. Assumptons of co-tranng are too strct to be satsfed n practce partcularly n cases wth a few tranng data. Moreover, co-tranng s complcated because more than one classfer s nvolved n classfcaton. On the other hand, self-tranng s smple to acheve because t uses only one classfer to tran and select hgh confdence nstances n the unlabeled set. In t () () ths work, we mplement SSL by the self-tranng approach. The only classfer used s k-nn. Followng the prncples of self-tranng, we defne a confdence measure n predcton, confdence factor (CF), that s mathematcally expressed as k ^ δ class( NN ( )), ( ) (, ( )) class d NN = x x x x CF( x) =,(0) k d( x, NN ( x)) ^ = where class( x ) s the label predcted by Eq. () usng the majorty votng; 0 CF. In other words, CF s a summaton of the dstances between the unlabeled nstance (x) and those same-class neghbors wthn the group of the k nearest dvded by a summaton of dstances between the unlabeled nstance and all those k nearest neghbors whether n the same class or not. If CF has a large value, there s a strong evdence to trust the predcted class of the unlabeled nstance. The rule of self-learnng s that an unlabeled nstance s consdered as a hgh confdence nstance and added to the tranng set f ts CF s greater than or equal to a predefned threshold denoted by CF mn. Once the hgh confdence nstances are added nto the tranng set, t s consdered the same as the ntal tranng nstances. By nvolvng more unlabeled nstances and ther predcted labels, the tranng set s updated teratvely, and class boundares are thus updated as well. In the next teraton, nstance rankng s repeated on the current unlabeled and tranng sets. k-nn makes a predcton for the next unlabeled nstance of top rankng on the latest tranng set. C. Algorthm Summary TABLE I. PSEUDO CODES OF THE ORDINAL SEMI-SUPERVISED K-NN INPUT: Specfy the values of parameters: σ, CF mn, and k; OUTPUT: The predcted labels of unlabeled nstances; MAIN ALGORITHM For ndex of unlabeled nstances =,,, N u % N u refers to the number of the ntal U u Module ; Module ; End Module : Instance Rankng. For class ndex j =,,, L.. Calculate d(x, x (j) ), the Eucldean dstance between each unlabeled nstance x n U u and each nstance of class j n U l ;.. Calculate the correspondng weghts ω(x, x (j) );..3 Calculate the weghted mean for each nstance x n U u wth respect to class j; End. Compute the dstance factor DF(x) by Eq. (9);.3 Sort all the DF values n ascendng order and return the top rankng nstance possesses the smallest DF; Module : Sem-Supervsed Learnng. Predct the class of current nstance x of top rankng n Module by Eq. ();. Compute confdence factor of x, CF(x) usng Eq. (0);.3 If CF(x) CF mn.3. Ul = Ul U x ;.4 U u = U u \x; % remove x from U u
5 77 JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL 03 The pseudo codes of the proposed method are summarzed n TABLE I. Bascally nstance rankng and sem-supervsed learnng are mplemented by Module and Module, respectvely. Instance rankng fnds the top one nstance of unlabeled accordng to the dstance factor. Then the top one nstance s classfed by SSL and s determned whether t s added to the tranng set wth ts predcted label. The number of teratons s the number of unlabeled nstances, that s, N u. In the next secton, we llustrate the algorthm vsually on smulated datasets and compare t wth other methods. 398 th teraton. The decson boundary evolves by ad of the ordnal sem-supervsed k-nn and unlabeled nstances. It s eventually representatve enough for the structure of the smulated dataset # n the last teraton shown n Fg. 4 (f). III. EXPERIMENTAL RESULTS In ths secton the proposed method s appled to two groups of datasets. The frst group contans two smulated datasets of two-dmensonal whch are used to vsually llustrate the mechansm of the ordnal sem-supervsed k- NN. The second group contans four benchmark datasets whch are used for valdaton of our algorthm n the practcal applcaton. Expermental results and dscussons are gven below. A. Experment : The Smulated Datasets Two smulated datasets are artfcally generated to vsually llustrate the mechansm of the proposed method. One s a two-class and Gaussan dstrbuted dataset, where the mean of two classes are [, 5] T and [6, 5] T, respectvely, and the standard devatons of two classes are both [, ] T. The other s the well-known two-moon dataset [] whch s non-gaussan dstrbuted. We use two half crcles to smulate the dataset: one s wth a center of [6, 5.5] T and radus of 5, the other s wth a center of [, 6.5] T and radus of 5. Two datasets are summarzed n TABLE II. TABLE II. SUMMARY OF SIMULATED DATASETS No. Dataset Classes Instances Features Smulated dataset # 400 Smulated dataset # 0 Fg. 4 shows sx ntermedate teratons on the smulated dataset #: 0 th teraton, 80 th teraton, 60 th teraton, 40 th teraton, 30 th teraton, and 398 th teraton (the last teraton). Whte curves are decson boundares of the current teraton. Colors ndcate the class of nstances, for example, red s of class whle green s of class. Instances wth bg sze dots are used as tranng nstances whle others wth small dots are consdered as unlabeled nstances. A crcle around a dot means the nstance s predcted already, and the color of crcles ndcates the predcted class. At 0 th teraton, two nstances are n the tranng set: one red nstance s for class, and one green nstance s for class. The rest 398 (400-) nstances are consdered to be unlabeled. -NN s the classfer used here. Accordng to the class boundary n Fg. 4 (a), there are lots of nstances wrongly predcted snce the decson boundary of -NN s exactly n the mddle of the two tranng nstances. After 80 teratons, n Fg. 4 (b), the decson boundary s slghtly changed, and predcton s conducted to nstances along class boundares. In Fg. 4 sequental predcton contnues untl (a) 0 th teraton (c) 60 th teraton (b) 80 th teraton (d) 40 th teraton (e) 30 th teraton (f) 398 th teraton Fgure 4. Intermedate boundary change on the smulated dataset #, classfer: -NN Fg. 5 shows ntermedate results on the smulated dataset # that s the two-moon dataset. Unlke the smulated dataset #, dstrbutons n ths dataset are non- Gaussan. The same as the case of the smulated dataset #, only two tranng nstances are selected n the 0 th teraton, so the rest 00 (0-) nstances are used as the unlabeled nstances. The decson boundary, smlarly n the smulated dataset #, contnues to update tself by extractng and absorbng nformaton from unlabeled sets. At the last teraton, the 00 th teraton shown n Fg. 5 (f), the decson boundary could be good enough for the followng predcton. That s, a good model s generated then. (a) 0 th teraton (c) 40 th teraton (b) 0 th teraton (d) 60 th teraton
6 JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL (e) 80 th teraton (f) 00 th teraton Fgure 5. Intermedate boundary change on the smulated dataset #, classfer: -NN From two cases studed above, t demonstrates that the ordnal sem-supervsed k-nn has the ablty to mprove the performance of k-nn by usng unlabeled sets f a few tranng nstances are avalable, such as n the smulated dataset # only one tranng nstance avalable for one class. Its effectveness s proved not only on Gaussan dstrbuted cases, such as the smulated dataset #, but also on non-gaussan dstrbuted cases, such as the smulated dataset #. Before applyng our algorthm, assumptons [] are made on t lke other technques of SSL, and they are: () The Smooth Assumpton (SA). It states nstances n hgh densty regon should have smlar labels, that s, labels should change n low densty regons; () The Cluster Assumpton (CA). It says two nstances from the same cluster should have smlar labels; (3) The Manfold Assumpton (MA). It presumes data les roughly on a low-dmensonal manfold. We demonstrate the effectveness of the proposed method n ths secton, and n next secton the proposed method s appled to practcal benchmark datasets. B. Experment : The Benchmark Datasets The ordnal sem-supervsed k-nn, namely method, s compared wth two other varetes of k-nn: the semsupervsed k-nn and the conventonal k-nn, namely method and method 3, respectvely. The dfference between method and method s that method s mplemented wth nstance rankng whle no nstance rankng, or random rankng, s mplemented n method. The k-nn classfer s mplemented by knnclassfy from the Bonformatcs Toolbox of MATLAB. The parameters n the three methods are predefned as follows: σ=, k=, CF mn =. These three methods are compared wth each other on four benchmark datasets downloaded from the Unversty of Calforna Irvne (UCI) repostory [3]. They are summarzed n TABLE III. TABLE III. SUMMARY OF BENCHMARK DATASETS No. Dataset Classes Instances Features Vehcle Ionosphere Parknson 95 4 Wne We splt each dataset n TABLE III nto two subsets because of lmted nstances avalable n the datasets, one s consdered as a labeled set for tranng; the other, although t s labeled, s consdered as an unlabeled set for SSL. In method 3, unlabeled sets are drectly used to test method 3. In method and method 3, after applyng SSL on the unlabeled set, unlabeled sets, consdered as test sets, are re-used to test these two methods. The K- fold cross-valdaton technque s used to do data parttonng. The captal K s a specfed nteger from to 0. For each K value, one fold s used for unlabeled sets and the other K- folds are used for ntal tranng sets, totally there are K combnatons of selectng one out of K. We run the computer programs for each combnaton, obtan the classfcaton accuracy of each run, and then average them. Classfcaton accuracy s the only measurement for evaluatng performance of methods. It s calculated wth the test set by 0- loss functon. That s to say, classfcaton accuracy s the rato between the total number of truly predcted nstances n the test set and the total number of the test unlabeled set. A labeled rato s defned by the sze of tranng set dvded by all nstances,.e. (K )/K n ths case. Snce K ranges from to 0, the rato ranges from / to 9/0, namely, /, /3, 3/4,, 9/0. Next, we do the same procedure as above except consderng one fold as the tranng set and the rest K folds as the unlabeled set. The labeled rato s therefore calculated by /K and has a range from /0 to /, namely, /0, /9, /8,, /. If all calculatons are completed, we have 7 average values of classfcaton accuracy wth respect to 7 ratos from /0 to 9/0. Classfcaton Accuracy (%) Classfcaton Accuracy (%) method method method Rato (labeled/total) Fgure 6. Classfcaton accuracy on the Vehcle dataset method method method 3 /0 /9 /8 /7 /6 /5 /4 /3 / /3 3/4 4/5 5/6 6/7 7/8 8/9 9/0 Rato (labeled/total) Fgure 7. Classfcaton accuracy on the Ionosphere dataset
7 774 JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL 03 Classfcaton Accuracy (%) Classfcaton Accuracy (%) method 8 method method 3 80 /0 /9 /8 /7 /6 /5 /4 /3 / /3 3/4 4/5 5/6 67/ 7/8 8/9 9/0 Rato (unlabeled/total) Fgure 8. Classfcaton accuracy on the Parknson dataset method 9 method method 3 90 /0 /9 /8 /7 /6 /5 /4 /3 / /3 3/4 4/5 5/6 6/7 7/8 8/9 9/0 Rato (unlabeled/total) Fgure 9. Classfcaton accuracy on the Wne dataset Results are provded n Fg. 6 to Fg. 9 n terms of classfcaton accuracy of three methods at dfferent ratos. We need to compare the results on the case of the labeled rato>/ (large tranng datasets) and the case of the rato / (small tranng datasets). From Fg. 6 to Fg. 9, t can be sad that method outperforms other two methods f the rato ranges from /0 to / (.e. small tranng sets). The conventonal k-nn s senstve to the rato and deterorates a lot as mentoned n secton I f the rato s smaller than /. Wth a small rato ( /), the tranng sets are nsuffcent and are not capable of provdng enough knowledge for a robust performance n classfyng the test set. Comparng wth the conventonal k-nn, use of the sem-supervsed k-nn mproves the classfcaton accuracy. In general, classfcaton accuracy of the sem-supervsed k-nn, however, s nferor n comparson wth the proposed method, as the latter benefted from the use of a sequental unlabeled set. If the rato>/, classfcaton accuracy of three methods s qute comparable. Ths clearly shows that the proposed method s advantageous for small tranng sets and not for large tranng sets. TABLE IV provdes two statstcal measures of classfcaton accuracy for nsuffcent cases (.e. rato /), namely the mean (MEAN) and the standard devaton (STD). Both the mean and the standard devaton are calculated over the nne averaged values of classfcaton accuracy correspondng to nne ratos less or equal to 0.5. It can be seen that the proposed method possesses not only the hghest mean of the accuracy but also the smallest standard devaton on any of the four datasets, whch also demonstrates the effectveness of the proposed method n nsuffcent cases. TABLE IV. SUMMARY OF EXPERIMENT RESULTS Dataset Method Method Method 3 MEAN STD MEAN STD MEAN STD Vehcle Ionosphere Parknson Wne Note: Italc and shadng text denotes ether the largest value of mean or the smallest value of standard devaton. IV. CONCLUSIONS In ths work, we propose a classfcaton method named the ordnal sem-supervsed k-nn to deal wth nsuffcent tranng nstances. The proposed method s a nonparametrc classfcaton approach and ncludes two parts, namely nstance rankng and sem-supervsed learnng. By nstance rankng, a sequental unlabeled set s obtaned. By sem-supervsed learnng, the top rankng nstance s classfed and ncluded nto the tranng set wth ts predcted label f t s consdered as the hgh confdence nstance. The proposed method works for both Gaussan and non-gaussan dstrbuted cases. Expermental results on four benchmark datasets demonstrate that the proposed method outperforms the conventonal k-nn and the sem-supervsed k-nn especally f the tranng nstances are nsuffcent (rato /), and t s comparable wth the other two methods f tranng sets are suffcent (rato>/). But the proposed method requres more computatonal tme than the conventonal k-nn because the tranng set contnuously ncreases whle t s fxed n the conventonal k-nn. REFERENCES [] E. Fx, and J. L. Hodges, Jr., Dscrmnatory analyss. nonparametrc dscrmnaton: consstency propertes, U.S. Ar Force Sch. Avaton Medcne, Randolf Feld, Tex., Project , Contract AF 4 (8)-3, Rep.4, 95. [] E. Fx, and J. L. Hodges Jr., Dscrmnatory analyss: small sample performance, U.S. Ar Force Sch. Avaton Medcne, Randolph Feld, Tex., Project , Rept., 95. [3] T. M. Cover, and P. E. Hart, Nearest neghbor pattern classfcaton, IEEE Transacton on Informaton Theory, vol. 3, no., pp. 7, 967. [4] Y. Wang, X. Xu, H. Zhao, and Z. Hua, Sem-supervsed learnng based on nearestneghbor rule and cut edges, Knowledge-Based Systems, vol. 3, no. 6, pp , August 00. [5] Reynaldo Gl-García and Aurora Pons-Porrata, A new nearest neghbor rule for text categorzaton, Lecture Notes n Computer Scence, vol. 45, pp , 006. [6] Y. Le, and M. J. Zuo, Gear crack level dentfcaton based on weghted k nearest neghbour classfcaton algorthm. Mechancal Systems and Sgnal Processng, vol. 3, no. 5, pp , July 009. [7] P. Cunnngham, and S. J. Delany, k-nearest neghbor classfers, Techncal Report UCD-CSI-007-4, March 007.
8 JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL [8] J. Wang, P. Neskovc, and L. N. Cooper, Improvng nearest neghbor rule wth a smple adaptve dstance measure, Pattern Recognton Letters, vol. 8, no., pp. 07-3, Janurary 007. [9] K. Yu, L. J, and X. Zhang, Kernel nearest-neghbor algorthm, Neural Processng Letter, vol. 5, no., pp , 00. [0] S. A. Dudan, The dstance-weghted k-nearest neghbor rule, IEEE Transactons on Systems, Man and Cybernetcs, vol. 6, no. 4, pp , Aprl 976. [] R. N. Shepard, Toward a unversal law of generalzaton for psychologcal scence, Scence, vol. 37, no. 480, pp.37 33, September 987. [] Y. Zeng, Y. Yang, and L. Zhao, Pseudo nearest neghbor rule for pattern classfcaton, Expert Systems wth Applcatons, vol. 36, no., pp , March 009. [3] Y. Mtan, and Y. Hamamoto, A local mean-based nonparametrc classfer, Pattern Recognton Letters, vol. 7, no. 0, pp. 5 59, July 006. [4] H. Brghton, and C. Mellsh, Advances n nstance selecton for nstance-based learnng algorthms, Data Mnng and Knowledge Dscovery, vol. 6, no., pp. 53-7, Aprl 00. [5] P. E. Hart, The condensed nearest neghbor rule, IEEE Transactons on Informaton Theory, vol. 6, no. 3, pp , May 968. [6] A. M. Martnez, and A. C. Kak, PCA versus LDA, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol.3, no., pp.8-33, Feb. 00. [7] X. Zhu, Sem-Supervsed Learnng Lterature Survey, Computer Scences TR 530. Unversty of Wsconsn Madson, July 008. [8] B. C. Kuo, and D. A. Landgrebe, Nonparametrc weghted feature extracton for classfcaton, IEEE Transactons on Geoscence and Remote Sensng, vol. 4, no. 5, pp , May 004. [9] B. C. Kuo, C. H. L, and J. M. Yang, Kernel nonparametrc weghted feature extracton for Hyperspectral Image Classfcaton, IEEE Transactons on Geoscence and Remote Sensng, vol. 47, no. 4, pp , Aprl 009. [0] A. Albalate, and W. Mnker, Sem-supervsed and unsupervsed machne learnng: novel strateges, John Wley & Sons Inc, Hoboken, USA, 0. [] D. Guan, W. Yuan, Y. K. Lee, and S. Lee, Nearest neghbor edtng aded by unlabeled data, Informaton Scences, vol. 79, no. 3, pp. 73-8, June 009. [] Z. Bodó, and Z. Mner, On Supervsed and Semsupervsed k-nearest Neghbor Algorthms, Proceedngs of the 7th Jont Conference on Mathematcs and Computer Scence, vol. LIII, pp Studa Unverstats Babe s- Bolya, Seres Informatca, 008. [3] A. Frank, and A. Asuncon. UCI Machne Learnng Repostory [ Irvne, CA: Unversty of Calforna, School of Informaton and Computer Scence, 00. [4] Y. Xe, Y. Jang, and M. Tang, A sem-supervsed k- nearest neghbor algorthm based on data edtng, Proceedngs of Chnese control and decson conference, 0. [5] J. Lv, Sem-supervsed learnng usng local regularzer and unt crcle class label representaton, Journal of Software, vol. 7, no. 5, pp , 0. [6] M. A. Wajeed, and T. Adlakshm, Sem-supervsed text classfcaton usng henhanced KNN algorthm, Proceedngs of World Congress on Informaton and Communcaton Technologes, 0. [7] T. Gannakopoulos, and S. Petrds, Fsher lnear sem- Dscrmnant analyss for speaker darzaton, IEEE Transacton on Audo, Speech, and Laguage Processng, vol. 0, no. 7, pp. 93-9, 0. Zhlang Lu receved the B.S. degree n school of electrcal and nformaton engneerng, Southwest Unversty for Natonaltes, Chengdu, n 006. He vsted Unversty of Alberta as a vstng scholar from 009 to 0. He s currently a Ph.D canddate n the school of automaton engeerng, Unversty of Electronc Scence and Technology of Chna, Chengdu. Hs research nterests manly nclude pattern recognton, data mnng, fault dagnoss and prognoss.
Learning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationThe Research of Support Vector Machine in Agricultural Data Classification
The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationBOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET
1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School
More informationMachine Learning 9. week
Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below
More informationSHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE
SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro
More informationHelsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)
Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationMULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION
MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and
More informationClassifying Acoustic Transient Signals Using Artificial Intelligence
Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)
More informationCollaboratively Regularized Nearest Points for Set Based Recognition
Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,
More informationAn Image Fusion Approach Based on Segmentation Region
Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More information12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification
Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero
More informationOutline. Type of Machine Learning. Examples of Application. Unsupervised Learning
Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationFeature Selection as an Improving Step for Decision Tree Construction
2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationLecture 5: Multilayer Perceptrons
Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationAnnouncements. Supervised Learning
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples
More informationBAYESIAN MULTI-SOURCE DOMAIN ADAPTATION
BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationUnsupervised Learning
Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and
More informationA New Approach For the Ranking of Fuzzy Sets With Different Heights
New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays
More informationOutline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:
Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A
More informationSupport Vector Machines
Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned
More informationX- Chart Using ANOM Approach
ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationTsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for
More informationClassification / Regression Support Vector Machines
Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationNAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics
Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson
More informationEdge Detection in Noisy Images Using the Support Vector Machines
Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona
More informationSVM-based Learning for Multiple Model Estimation
SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:
More informationA MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS
Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung
More informationAn Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment
IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 www.ijcsi.org 374 An Evolvable Clusterng Based Algorthm to Learn Dstance Functon for Supervsed
More informationClassification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM
Classfcaton of Face Images Based on Gender usng Dmensonalty Reducton Technques and SVM Fahm Mannan 260 266 294 School of Computer Scence McGll Unversty Abstract Ths report presents gender classfcaton based
More informationFace Recognition University at Buffalo CSE666 Lecture Slides Resources:
Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural
More informationCS 534: Computer Vision Model Fitting
CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd
More informationThree supervised learning methods on pen digits character recognition dataset
Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationFast Computation of Shortest Path for Visiting Segments in the Plane
Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang
More informationThe Shortest Path of Touring Lines given in the Plane
Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He
More informationCHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION
48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue
More informationIncremental Learning with Support Vector Machines and Fuzzy Set Theory
The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and
More informationMachine Learning. Topic 6: Clustering
Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationPruning Training Corpus to Speedup Text Classification 1
Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan
More informationLearning-Based Top-N Selection Query Evaluation over Relational Databases
Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationTECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.
TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of
More informationA New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1
A New Feature of Unformty of Image Texture Drectons Concdng wth the Human Eyes Percepton Xng-Jan He, De-Shuang Huang, Yue Zhang, Tat-Mng Lo 2, and Mchael R. Lyu 3 Intellgent Computng Lab, Insttute of Intellgent
More informationAssociative Based Classification Algorithm For Diabetes Disease Prediction
Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha
More informationA Novel Term_Class Relevance Measure for Text Categorization
A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationA MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES
A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES Aram AlSuer, Ahmed Al-An and Amr Atya 2 Faculty of Engneerng and Informaton Technology, Unversty of Technology, Sydney, Australa
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationOptimizing Document Scoring for Query Retrieval
Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng
More informationVirtual Machine Migration based on Trust Measurement of Computer Node
Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on
More informationBIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING
An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College
More informationMeta-heuristics for Multidimensional Knapsack Problems
2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,
More informationA classification scheme for applications with ambiguous data
A classfcaton scheme for applcatons wth ambguous data Thomas P. Trappenberg Centre for Cogntve Neuroscence Department of Psychology Unversty of Oxford Oxford OX1 3UD, England Thomas.Trappenberg@psy.ox.ac.uk
More informationPositive Semi-definite Programming Localization in Wireless Sensor Networks
Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15
CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc
More informationGeneral Vector Machine. Hong Zhao Department of Physics, Xiamen University
General Vector Machne Hong Zhao (zhaoh@xmu.edu.cn) Department of Physcs, Xamen Unversty The support vector machne (SVM) s an mportant class of learnng machnes for functon approach, pattern recognton, and
More informationFace Recognition Based on SVM and 2DPCA
Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty
More informationJournal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray
More informationElliptical Rule Extraction from a Trained Radial Basis Function Neural Network
Ellptcal Rule Extracton from a Traned Radal Bass Functon Neural Network Andrey Bondarenko, Arkady Borsov DITF, Rga Techncal Unversty, Meza ¼, Rga, LV-1658, Latva Andrejs.bondarenko@gmal.com, arkadjs.borsovs@cs.rtu.lv
More informationFast Feature Value Searching for Face Detection
Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also
More informationImpact of a New Attribute Extraction Algorithm on Web Page Classification
Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationData Mining: Model Evaluation
Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct
More informationFuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System
Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105
More informationFace Recognition Method Based on Within-class Clustering SVM
Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class
More informationBioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented
More informationSolving two-person zero-sum game by Matlab
Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by
More informationA Robust Method for Estimating the Fundamental Matrix
Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationSupport Vector Machines. CS534 - Machine Learning
Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators
More informationModular PCA Face Recognition Based on Weighted Average
odern Appled Scence odular PCA Face Recognton Based on Weghted Average Chengmao Han (Correspondng author) Department of athematcs, Lny Normal Unversty Lny 76005, Chna E-mal: hanchengmao@163.com Abstract
More informationHybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton
More informationInvestigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers
Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,
More informationLECTURE : MANIFOLD LEARNING
LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors
More informationGSLM Operations Research II Fall 13/14
GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are
More information