A novel feature selection algorithm based on hypothesis-margin

Size: px
Start display at page:

Download "A novel feature selection algorithm based on hypothesis-margin"

Transcription

1 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER A novel feature selecton algorthm based on hypothess-margn Mng Yang* Fe Wang and Png Yang Department of Computer Scence, Nanjng Normal Unversty, Nanjng, P.R.Chna Emal: {m.yang, yangpng}@njnu.edu.cn, f0701@163.com Abstract Iteratve search margn based algorthm(smba) has been proven effectve for feature selecton. Hoever, t stll has the follong dsadvantages: (1) the prevously proposed model stll lacks enough robust to noses; and () the gven model does not use any global nformaton, n ths ay some useful dscrmnaton nformaton may be lost and the convergence speed s also nfluenced n some cases. In ths paper, by ncorporatng global nformaton, a novel margn based feature selecton frameork s ntroduced. Accordng to the nely desgned model, an mproved margn based feature selecton algorthm(ismba) s proposed. By effectvely adjustng the contrbuton of the global nformaton, Ismba can effcently reduce the computatonal cost and at the same tme obtan more effectve feature subsets as compared to Smba. The experments on 6 artfcal and 8 real-lfe benchmark datasets sho that Ismba s effectve and effcent. Index Terms Feature selecton, Dmensonalty reducton, Hypothess-margn, Margn I. INTRODUCTION Dmensonalty reducton(dr) s one commonly appled approach[1]. There are a number of DR technque, and accordng to the adopted reducton strategy, they are usually dvded nto feature extracton[-4] and feature selecton[5] approaches. The key dfference beteen feature extracton and feature selecton s that the former one s based on generaton of a completely ne feature space through a functonal transformng, hle the latter one s to select a relevant subset of orgnal features. The classcal feature extracton methods are generally classfed nto lnear and nonlnear methods. Lnear approaches, such as Prncpal Component Analyss (PCA)[], and Lnear Dscrmnant Analyss (LDA)[1] and Localty Preservng Projectons(LPP)[3], am to project the hgh-dmensonal data to a loer-dmensonal space by lnear transformatons accordng to some crtera. On the other hand, nonlnear methods, such as Locally Lnear Embeddng(LLE)[4] ams to project the orgnal data by nonlnear transformatons hle preservng certan local nformaton accordng to some crtera. As authors of Ref.[5] ponted out, feature extracton s *Correspondng author. Tel.: ; fax: Emal:m.yang@njnu.edu.cn(Mng Yang). generally effectve. Hoever, the effectveness of the feature extracton algorthms may be obvously degraded hen processng large-scale data sets. In addton, ne varables usually concern th all orgnal features, so formng ne varables may contan lots of nformaton orgnated from those redundant features n the orgnal space, such as PCA[]. Unlke feature extracton, feature selecton can be veed as one of the most fundamental problems n machne learnng feld. It s defned as a process of selectng relevant features out of the larger set of canddate features. The relevant features are defned as features that descrbe the target task. As Lu ponted out n [6], the motvaton of feature selecton(also called attrbute reducton or feature reducton) n data mnng and machne learnng s to: reduce the dmensonalty of feature space, speed up and reduce the cost of a learnng algorthm, mprove the predctve accuracy of a classfcaton algorthm, and to mprove the vsualzaton and the comprehensblty of the nduced concepts. Especally, the authors of [6] have emphaszed that not every feature selecton method can serve all purposes. Generally, supervsed feature reducton methods can be categorzed nto to classes: the flter model [7-9][1][13]and the rapper model[10][11][14]. In the rapper model the feature selecton methods try to drectly optmze the performance of a specfc predctor. Along ths, the predctor generalzaton performance (e.g. by cross valdaton) needs to be estmated for the selected feature subset n each step. So, hgh computatonal cost s ts man dsadvantage. Currently, there are many flter methods avalable, ncludng Relef [7],FCBF [8], C-tree based feature selecton algorthm[9] and Smba[1], etc. Among them, Smba s the recently proposed margn based feature selecton approach, hch uses the so-called large margn prncple[15-16] as ts theoretcal foundaton to guarantee good performance for any feature selecton scheme hch selects small set of feature hle keepng the margn large. Meanhle, by the smoothness of the hypothess-margn based evaluaton functon, Smba uses a stochastc gradent ascent over the evaluaton functon to accelerate the feature selecton process. Roughly speakng, the man dea of Smba s to obtan an effectve subset of features such that the relatvely sgnfcant features have relatvely large eghts by usng hypothess-margn crteron. In 008 ACADEMY PUBLISHER

2 8 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 essence, smlar to Ref.[14], Smba s also a eghted method, that s, those features th relatvely larger eghts form the subset of features, but the key dfference s that Smba s flter method and most of the other eghted methods are related to a concrete classfer. Moreover, theoretcal analyss and experments sho that Smba can effectvely reduce the computatonal complexty and s more effectve than the classcal flter approach, such as Relef[7]. Hoever, one dsadvantage of Smba s ts nonrobustness to noses, that s, the eghts of some features by the effect of noses may become relatvely larger or may not converge to a relatvely small value or zero by usng 1NN crteron[17][18], snce noses may ncrease the contrbuton of some features to the hypothessmargn of samples. On the other hand, Smba only uses local nformaton for choosng a small set of features to make the hypothess-margn of samples large, n ths ay t may lose some useful dscrmnaton nformaton or global structure hdden n the global nformaton. Thus, the performance of classfers nduced by post-analyss algorthms n the ne feature space ll be degraded. In ths paper, e ntroduce a novel margn based feature selecton model called Ismba_FS hch ncorporates the global nformaton nto the recently proposed margn based feature selecton model[1] to elmnate the dsadvantages of Smba algorthm hle mantanng ts merts. In Ismba_FS, the man motvaton ncorporatng global nformaton attempts to make the dstance beteen a sample and the center pont th same class as small as possble and the dstance beteen a sample and the center pont th dfferent classes as large as possble, and meanhle a balance factor λ s ntroduced for dynamcally adjustng the contrbuton of global nformaton. Along the nely desgned model Ismba_FS, e ntroduce an mproved margn based feature selecton algorthm (Ismba). By adjustng the contrbuton of the global nformaton, Ismba can effcently reduce the computatonal cost and meanhle get a more effectve feature subset as compared to Smba. In summary, Ismba possesses several attractve characterstcs as follos: (1) the computatonal complexty can be effcently reduced, snce the centers of each class and remanng classes can be computed n advance, and ther contrbutons to the eght vector can be reflected at each teraton; () the classfcaton performance of classfers nduced by the selected small set of features can be effectvely mproved n some cases because the embedded global nformaton can guarantee both dscrmnaton nformaton-preservng and noses-resstng; and (3) the contrbuton of global nformaton can be dynamcally adjusted effectvely by the tradeoff parameter λ. The rest of ths paper s organzed as follos. In Secton, some basc concepts on margn(hypothessmargn and sample-margn) and Smba are brefly ntroduced. In Secton 3, by ncorporatng global nformaton nto the exstng feature selecton model based on hypothess-margn, a novel feature selecton model and correspondng feature selecton algorthm are presented. Some expermental comparsons are ntroduced n Secton 4. Fnally, Secton 5 gves our conclusons and several ssues for future orks. II. PRELIMINARIES A. SAMPLE-MARGIN AND AND HYPOTHESIS-MARGIN As authors of Ref.[1] ponted out, margns play a crucal role n modern machne learnng research. They measure the classfer confdence hen makng ts decson. Margns are used for theoretc generalzaton bounds and as gudelnes for algorthm desgn. As descrbed n Ref.[15], there are to natural ay of defnng the margn of a sample th respect to a classfcaton rule. The more common type, samplemargn, measures the dstance beteen the sample and the decson boundary nduced by the classfer, e.g., Support Vector Machne (SVM)[16] fnds the separatng hype-plane th the largest sample-margn. Obvously, those feature selecton methods based on sample-margn need hgh computatonal cost for large-scale or/and hghdmensonal data sets. So, as an alternatve defnton, the hypothess-margn as ntroduced n [1][15]. The margn of a hypothess th respect to a sample s the dstance beteen the hypothess and the closest hypothess that assgns alternatve label to the gven sample. The hypothessmargn of a sample x for 1NN th respect to a set to samples P s defned as follos. 1 θ P( x) = ( x nearmss( x) x nearht( x ) ) (1) here nearht(x) and nearmss(x) denote the nearest sample to x n P th the same and dfferent label, respectvely. By (1), e hope to choose a subset of orgnal features such that the hypothess-margn becomes as large as possble. Based on hypothessmargn, the effectve feature subset can be effcently obtaned by correspondng feature selecton algorthms, snce n the case of Nearest Neghbor large hypothessmargn can ensures large sample-margn, and hypothessmargn s easy to compute as comparng to samplemargn. B. EVALUATION FUNCTION In order to obtan the more effectve subset of orgnal features, an evaluaton functon hch assgns a score to sets of features accordng to the hypothess-margn they nduce as ntroduced n Ref.[1], the hypothess-margn as a functon of the chosen set of features s formulated as follong Defnton 1. Defnton 1[1]. Let P be a set of samples and x be a sample. Let be a eght vector over the feature set, then the hypothess-margn of x s 1 θ P = ( x nearmss ( x ) x nearht ( x ) ) () here z = z. 008 ACADEMY PUBLISHER

3 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER Further, by (), the authors of Ref.[1] provde a strategy for computng the hypothess-margn of all the gven samples by follong Defnton. Defnton [1]. Gven a tranng set S and a eght vector, the evaluaton functon s e( ) = θ( S { x}) ( x ) (3) x S By (3), the feature set can be found by maxmzng the hypothess-margn drectly, that s, the eght vector that maxmzes e() as defned n (3) s frst found. Then, let max =1, the correspondng normalzaton eght vector can be obtaned, hence a subset of features can be naturally gotten by usng a threshold. C. ITERATIVE SEARCH MARGIN BASED ALGORITHM In order to quckly and effectvely obtan a subset of orgnal features, the so-called gradent ascent strategy as employed n [1] for maxmzng e() as defned n (3), snce e() s smooth almost everyhere. The gradent of e() hen evaluated on the set of samples S s e( ) θ ( x) ( e( )) = = x S 1 ( x nearmss( x) ) ( x nearht( x) ) = ( ) x S x nearmss( x) x nearht( x) As descrbed above, the descrpton of the teratve search margn based algorthm for feature selecton s as follos. Algorthm 1. Smba[1] 1. ntalze =(1,1,,1);. for t=1,,,t; (a) pck randomly a sample x from S; (b) calculate nearmss(x) and nearht(x) th respect to (S {x}) and the eght vector ; (c) for =1,,,N calculate 1 ( x ( )) ( ( )) ( nearmss x x nearht x = ) x nearmss( x) x nearht( x) (d) = + 3. / hen ( ) : = ( ). Snce ncreases, the relatve effect of the correcton term decreases and the algorthm typcally convergence. The computatonal complexty of Smba s O(TNm), here T s the number of teratons, N s the number of features and m s the sze of the samples S. Obvously, Smba s n hgh effcency. Further, the numercal experments sho that Smba outperforms Relef. Hoever, Smba only uses the a fe local nformaton to calculate the hypothess-margn of the gven samples accordng to the margn measure of a hypothess of a sample, n ths ay some rrelevant features may have relatvely larger eghts due to the nfluence of noses, hence Smba s stll non-robust. At the same tme, Smba cannot use the useful dscrmnaton nformaton hdden n the global nformaton, ths may lead to loss of some useful features. III. ISIMBA_FS AND ISIMBA (4) In order to overcome the dsadvantages of Smba and stll retan ts characterstcs, e propose a novel margn based feature selecton model called Ismba_FS ncorporatng global nformaton. Further, based on Ismba_FS, e ntroduce an mproved margn based feature selecton algorthm(ismba). A. HYPOTHESIS-MARGIN INCORPORATING GLOBAL INFORMATION To remedy the shortcomngs of Smba algorthm, e ntroduce a novel margn based feature selecton model( Ismba_FS ) ncorporatng global nformaton as follos. ˆ 1 θp ( x ) = ( x nearmss ( x ) x nearht ( x ) ) λ + ( x centermss( x) x centerht( x) ) here nearht(x) and nearmss(x) denote the nearest sample to x n P th the same and dfferent label, respectvely; centerht(x) and centermss(x) denote the centers to x n P th the same and dfferent label, respectvely; λ (Lambda) s an adjustable parameter, t s used to control the contrbuton of global nformaton to the hypothess-margn descrbed n Secton. To balance the contrbutons of local and global nformaton to the hypothess-margn, n ths paper e employ a nonoptmzed but effectve strategy, let λ be selected n {0,0.001,0.01,0.1,0.3,0.5,1,5}. The novel hypothess-margn ncorporatng global nformaton naturally has the follong merts: (1) hen λ =0, Eq.(5) degrades to Eq.(1), that s, the ne hypothess-margn model s an extenson of the orgnal hypothess-margn. () hen λ >0, all margn based feature selecton algorthms nduced by Eq.(5) can effcently reduce the number of teratons to some extent, snce n a gven samples the centers to any samples th the same and dfferent label can be computed n advance. Moreover, hen λ gradually ncreases, the convergence speed of those feature selecton algorthms based on Eq.(5) can be naturally accelerated. (3) the adjustable parameter λ can effectvely adjust the tradeoff beteen the contrbutons of local and global nformaton to the ne hypothess-margn. By tunng λ to a relatvely larger value, those features preservng the dscrmnaton nformaton or global structure hdden n data and holdng larger hypothess-margn can effectvely obtaned. (4) the robustness of the feature selecton algorthms nduced by Eq.(5) can be effectvely enhanced because the embedded global nformaton can constran the nfluence of noses to some extent. B. A NOVEL EVALUATION FUNCTION BASED ON THE NEW HYPOTHESIS-MARGIN Based on Eq.(5), e propose a ne evaluaton functon usng the same strategy as Defnton 1, hch assgns a score to sets of features accordng to the ne hypothessmargn, ts defnton s as follos. (5) 008 ACADEMY PUBLISHER

4 30 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 Defnton 3. Let P be a set of samples and x be a sample, and be a eght vector over the feature set, then the ne hypothess-margn of x s ˆ 1 θp = (( x nearmss( x) x nearht( x) ) (6) + λ( x centermss( x) x centerht( x) )) Further, based on ths evaluaton functon, e ntroduce a strategy for computng the ne hypothessmargn of all the gven samples as follos. Defnton 4. Gven a tranng set S and a eght vector, the ne evaluaton functon s eˆ( ) = ˆ θ ( x) x S ( S { x}) Accordng to Eq.(7), t s also nature to consder the evaluaton functon only for eght vector such that max = 1. Hoever, smlar to (3), e can also gnore the constrant = 1 hen computng the eght vector, snce eˆ( β) = βeˆ( ). After fndng, let (7) max =1, e normalze the eght vector and easly obtan a subset of features by usng a threshold. C. IMPROVED ITERATIVE SEARCH MARGIN BASED ALGORITHM As analyzed n Secton 3., e can use the eghts drectly by usng the nduced dstance measure nstead for maxmzng eˆ( ) as defned n (7). Also, e can drectly use gradent ascent n order to maxmze t, snce eˆ( ) s smooth almost everyhere. Smlar to (4), the gradent of eˆ( ) hen evaluated on the set of samples S s as follos. ( eˆ ( )) eˆ( ) ˆ( θ x ) = = x S 1 ( x nearmss( x)) ( x nearht( x)) = ( ( ) x S x nearmss( x) x nearht( x) (8) ( x centermss( x)) ( x centerht( x)) + λ ( )) x S x centermss( x) x centerht( x) By Eq.(8), an mproved teratve search margn based algorthm(ismba) for feature selecton s as follos. Algorthm. Ismba 1.let λ be a proper non-negatve number;. ntalze =(1,1,,1); 3. for each x S do calculate centermss(x) and centerht(x), here centermss(x) and centerht(x) are the centers to x n P th the same and dfferent label, respectvely; 4. for t=1,,,t; (a) pck randomly a sample x from S; (b) calculate nearmss(x) and nearht(x) th respect to (S {x}) and the eght vector ; (c) for =1,,,N calculate ˆ 1 ( x nearmss( x)) ( x nearht( x)) = (( ) x nearmss( x) x nearht( x) ( x centermss( x) ) ( x centerht( x) ) + λ( )) x centermss( x) x centerht( x) (d) = + ˆ 5. / hen ( ) : = ( ). In Ismba algorthm, e also use a stochastc gradent ascent over eˆ( ) hle gnorng the constrant = 1, the normalzaton on the constrant s done only at the step 5, snce eˆ( β) = βeˆ( ). Moreover, n each teraton e only evaluate one term n the sum n (8) and add t to the eght vector. In addton, the term ˆ n step 4(d) s nvarant to scalar scalng of, snce ˆ( ) = ˆ( β ). So, hen ncreases gradually, the relatve effect of the correcton term ˆ decreases and Ismba typcally convergence. Also, the second term of Eq.(8) can be drectly nduced by the obtaned centermss(x) and centerht(x) because both centermss(x) and centerht(x) are computed n advance for any gven sample x, hence t s easy to see that the contrbuton of global nformaton to eght vector almost need not spend any computatonal cost. Also, n ntutve, hen λ gradually ncreases, the effect of global nformaton to the eght vector also ncreases naturally, ths drectly leads to reducton of the number of teratons. So, Ismba algorthm can effcently reduce the computatonal cost. Further, n most cases Ismba algorthm can obtan the more effectve subset of features as compared to Smba, snce by dynamcally adjustng the controlled parameter λ, Ismba makes the global structure or dscrmnaton nformaton hdden n data to be preserved n the ne feature space. Especally, hen λ =0, Ismba degrades to Smba, hence they can obtan a consstent subset of features n ths case, but ths s not the optmzed result of Ismba. Moreover, by ncorporatng relevant global nformaton, the robustness of Ismba can be effectvely enhanced to some extent. In addton, to select a relatvely small feature set, e can stll use the same threshold strategy as Ref.[13], namely all features th a relevance score less than the specfed threshold are removed, e.g., f the relevant threshold δ (delta) s set to 0.01, all features th the eght less than δ are removed. In general, by ncorporatng the global nformaton, Ismba should outperform Smba ntutvely. To test the performance of Ismba the related experments on 6 artfcal and 8 real-lfe datasets ll be presented n Secton 4. IV. EXPERIMENTAL RESULTS In order to evaluate the performance of Smba and Ismba, e carry out the experments on 6 artfcal and 8 real-lfe datasets, and then compare ther performance ncludng the robustness to noses, the computatonal cost 008 ACADEMY PUBLISHER

5 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER and the classfcaton accuraces of the classfers knn and SVM based on the ne feature subspace. A. ARTIFICIAL DATASETS In ths subsecton, the robustness of Smba and Ismba s testfed, here e call a feature selecton algorthm s robust enough f the feature subset obtaned from the dataset th noses s almost consstent th that generated from the orgnally pure dataset. For ths purpose, e generate one artfcal dataset th no noses, dataset1, hch s a to-dmensonal artfcal dataset composed of to classes and 40 samples as shon n Fg. 1(a). The 10 samples n class 1 of the dataset1 are randomly generated from a Gaussan dstrbuton th mean [3,5] and varance dag[0.5,0.5], and the 10 samples n class of the dataset1 are randomly generated from a Gaussan dstrbuton th mean [3,1] and varance dag[0.5,0.5]. Further, n order to examne the robustness of Smba and Ismba, e generate 5 artfcal datasets dataset, dataset3, dataset4, and dataset5 and dataset6 th 5%,10%,0%,5% and 30% noses as seen n Fg.1(b-f) respectvely, n hch the nose samples n class 1 are randomly generated from a Gaussan dstrbuton th mean [6,-1] and varance dag[10,10], and the nose samples n class are randomly generated from a Gaussan dstrbuton th mean [8,6] and varance dag[10,10]. As can be seen n Fg., for the gven dataset1 th no noses, Smba can effectvely flter out the frst feature due to ts eght less than the gven threshold δ (hereδ s set to 0.01). Hoever, from the Fg., e can also observe that Smba s non-robust hen the nose samples n orgnal dataset s over 5%. Meanhle, as seen from Fgs.3 and 4, hen the controlled parameter λ ncreases gradually, the robustness of Ismba becomes more stronger. That s, the robustness of Ismba can be enhanced by properly adjustng the balance parameter λ. Further, to analyze the connecton beteen the computatonal cost and global nformaton, e say the algorthm can not effectvely elmnate the nfluence of noses hen t repeatedly terates the gven maxmum number of teratons but those rrelevant features also holds relatvely large eghts. In Fg. 5, the maxmum number of teratons s set to From Fg. 5, e observe that the computatonal cost s effcently reduced th large λ. At the same tme, t also can be seen that e need to ncorporate more global nformaton to copy th the nfluence of noses. So, usng the margn based feature selecton model as baselne, the nely developed model can effectvely and effcently mprove the performance of the correspondng feature selecton algorthms by ncorporatng reasonably global nformaton. 008 ACADEMY PUBLISHER

6 3 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 Fgure 4. The eghts ISmba assgns to the features under 0%noses Fgure 1. Artfcal datasets(dataset1-6) Fgure 5. The number of teratons of Ismba Fgure. The eghts Smba assgns to the features Fgure 6. The classfcaton accuracy of KNN(k=5) on Ionosphere Fgure 3. The eghts ISmba assgns to the features under 10%noses Fgure 7. The classfcaton accuracy of KSVM on Ionosphere 008 ACADEMY PUBLISHER

7 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER TABLE 1. EXPERIMENTAL RESULTS(δ =0.01) FS Parameter dataset algorthm λ s N f 3NN 5NN 7NN KSVM Wne Ismba BLD Ismba 0, Tyrod Ismba CMC Smba Ismba Sonar Smba Ismba Dabete Smba Ismba Ion Smba Ismba Wave Smba Ismba Notes: - means that no feature selecton s done; the cells labeled usng blue represent the accuracy of the SVM-based and knn-based classfers on orgnal features; the bold face cells denote that knn-based and/or SVM-based classfers nduced by the chosen feature subsets have more better or comparable performance. B. REAL-LIFE DATASETS In ths paper, those employed datasets are the publcly avalable datasets from UCI database (donloaded at A bref descrpton for the UCI datasets s gven at frst: (1) Wne recognton data (Wne) :178 objects, 3-class, 13-features;For short,(178, 3C,13F) () BUPA Lver Dsorders(BLD): (345,C,6F); (3)Thyrod: (15,3C,5F); (4) Contraceptve Method Coce (CMC): (1473,C, 9F); (5) Sonar: (08,C,60F); (6) Pma Indans Dabetes (Dabete): (768,C,8F); (7) Ionosphere (Ion):(351,C, 34F); (8) Waveform doman data (Wave): (5000,3C, 1F). In our experments, every dataset s randomly parttoned nto to halves: one half s used for tranng and the other for testng. It s orth notng that all features are normalzed to the range beteen 0 and 1. the balance parameter λ s selected n {0,0.001,0.01, 0.1,0.3,0.5,1,5} for smplcty. To select a relatvely small feature set, n ths paper the relevant threshold δ s set {0.001, 0.01, 0.05}, all features th ther eght less than δ are removed. In addton, let λ s denotes the set of the values of λ that can get the same feature subset hen δ s fxed, N f the number of the selected features. Also, both Smba and Ismba are ndependent of post-analyss algorthms (predctors), here e choose ell knon k- Nearest-Neghbor (knn)[16][17] and SVM(Support Vector Mahne) th kernels[15] as evaluatng crtera for testng the classfcaton accuracy of the chosen feature subset. In the KSVM algorthm, e employ C-SVM model and use RBF kernel as the kernel functon. The expermental results are shon n Table 1 hen δ s set to From Table 1, e can fnd that the effectve feature subsets could be obtaned, and correspondng classfers have better or comparable performance than those based on orgnal features on all datasets. Further, for those datasets dabete, onosphere and ave, by adjustng the balance parameter λ Ismba algorthm can get more effectve feature subsets as compared to Smba. So, ncorporatng approprately global nformaton s of great beneft to enhancng the performance of the nely developed model. At the same tme, from Table 1, e also fnd that those effectve feature subsets are obtaned hen λ s relatvely small, namely t vares thn a range around 0.1. Ths s consstent th our ntutve observaton, because for any sample x, t s near to the center pont th same label but far to the center pont th dfferent label. In other ords, as comparng to the contrbuton of the local nformaton, the global nformaton may have a relatvely great effect to the eght vector n most cases. So, a relatvely small λ can effectvely adjust the trade-off beteen the local nformaton and global nformaton for gettng more effectve feature subset. Further, from the expermental result, e also observe that those very mportance features are alays kept n the results no matter λ s 0 or not 0. Ths further ndcates that Smba s ndeed a very mportant feature selecton algorthm. Based on Ismba, by tunng the parameter λ, the eghts of some effectve features hch are fltered out by Smba should be ncreased. Hence, Ismba s naturally an extenson and mprovement of Smba. It s orth notng that the parameters n Table 1 s not optmzed results, snce δ s set to a fxed value 0.01, λ s selected n {0,0.001,0.01,0.1,0.3,0.5,1,5}. Hoever, e stll need to pont out that even so, on all 8 datasets, the classfers nduced by Ismba stll acheve better or comparable classfcaton performance as compared to the classfers establshed by all features. C. INFLUENCE OF THE PARAMETERS δ AND λ ON CLASSIFICATION PERFORMANCE In the above experment, for a fxed threshold parameter δ, although Ismba can get a more effectve feature subset n most cases, ths s not optmzed results. In order to get a relatvely approprate value of δ, e need to record the performance of the classfers hen δ and λ change. For smplcty of llustraton, e just present the value of λ s mpact on the performance of Ismba on the dataset Ionosphere under the dfferent δ, here let δ be set three dfferent values n {0.001,0.01, 0.05}. Fgs.6 and 7 llustrate the changng curves of KNN(K=5) and KSVM on the dataset Ionosphere th both δ and λ values changng ncrementally. From to fgures, e can see that the more effectve feature subset can be obtaned as adjustng both δ and λ to some proper values, e.g., from Fgs.6 and 7, e can get a more effectve feature subset hen λ = 0.5, snce the accuracy of KNN(K=5) nduced by ths feature subset ncreases from 78.41% to 91.48% hen and δ =0.05, hle the accuracy of KSVM nduced by ths feature subset ncreases from 81.5% to 94.89%(93.75%) hen δ =0.001(0.01). Ths further ndcates that ncorporatng sutably global nformaton s of great beneft to gettng the more feature subset. 008 ACADEMY PUBLISHER

8 34 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 It s knon that the more features are removed as δ ncreases, hle the more features are chosen hen δ decreases. From Fgs.6 and 7, e also observe that determnng a proper value of δ s also a challengng ssue, snce the classfcaton accuracy vares unorderly hen δ changes orderly under λ s fxed, e.g., as can be seen from Fg.6, the classfcaton accuracy ncreases from 78.98% to 91.48% as δ varyng from to 0.05 under λ =0.001, but the classfcaton accuracy decreases from 9.05% to 87.5% as δ varyng from to 0.05 under λ =0.5. So, e can fnd that selectng a feature only accordng to ts eght s only relatvely effectve but not optmzed method, because those feature subset th relatvely larger eghts may contan some rrelevant features. Usng the eghts of features as baselne, seekng the ne strategy to select those more effectve features s our ongong ork. V. CONCLUSIONS AND FUTURE WORK In ths paper, e ntroduce a novel margn based feature selecton model, hch effectvely ncorporates global nformaton nto the orgnal hypothess-margn based feature selecton model. In the ne model, the contrbuton of global nformaton can be dynamcally adjusted, hence hch s an extenson of the orgnal model. Further, based on the ne hypothess-margn, a novel feature selecton algorthm(ismba) s ntroduced. By properly ncorporatng global nformaton, the nely developed algorthm not only can enhance effectvely ts robustness to noses, but also can preserve the global structure nformaton hdden n the gven data and meanhle reduce effcently the computatonal cost. Consequently, the classfcaton performance of the classfers folloed by establshng the nely proposed algorthm s mproved on almost all datasets used here. On the other hand, expermental results on 8 real-lfe datasets are summarzed as follos: (1) the classfers nduced by the obtaned feature subset usng Ismba consstently outperform the correspondng classfers obtaned by the all features n classfcaton performance on all the datasets; () the balance and threshold parameters λ and δ nfluence the qualty of the obtaned feature subset, adjustng sutably ther values can guarantee to obtan a more effectve feature subset; and (3) a relatvely large tradeoff parameter λ can effcently reduce the tme complextes of Ismba. Our further and ongong orks nclude the adaptve determnaton of both parameters λ and δ, and ho to more reasonably use global nformaton. ACKNOWLEDGMENT Ths ork as supported n part by Natonal Natural Scence Foundaton of P.R.Chna and Jangsu Provnce under Grant Nos and BK008430, respectvely. REFERENCES [1] K.Fukunaga. Introducton of Statstcal Pattern Recognton. Second ed. Academc Press, 1991 [] I.T.Jollffe. Prncpal Component Analyss. Second ed. Wley,00. [3] X.He, S.Yan, Y.Hu, et al. Face Recognton Usng Laplacanfaces. IEEE TPAMI, 005,7(3): [4] S.Roes and L.Saul. Nonlnear Dmensonalty Reducton by Locally Lnear Embeddng. Scence, 000,9(): [5] J.Yan, B.Y.Zhang, N.Lu, et al. Effectve and Effcent Dmensonalty Reducton for Large-Scale and Streamng Data Preprocessng. IEEE TKDE, 006,18(3): [6] H.Lu and H.Motoda. Feature selecton for knoledge dscovery and data mnng. Kluer, Boston,1998. [7] K.Kra and L.Rendell. A practcal approach to feature selecton. Proc. 9 th Inte. Workshop on Machne Learnng, [8] L.Yu and H.Lu. Feature selecton for hgh-dmensonal data: a fast correlaton-based flter soluton. In proceedngs of ICML 003. [9] Mng Yang, Png Yang. A Novel Condensng Tree Structure for Rough Set feature selecton. Neurocomputng, 008,71(4-6): [10] G.H.John, R.Kohav and K.Pfleger. Irrelevant feature and the subset selecton problem. Proc. of the 11 th ICML, Morgan Kaufmann Publshers,San Francsco,CA, [11] R.Kohav & G.John. Wrappers for feature subset selecton. Artfcal Intellgence,1997,97(1-):73-34 [1] Ran Glad-Bachrach, Amr Navot and Naftal Tshby. Margn Based Feature Selecton-Theory and Algorthms. In Proc. of the 1 st ICML, Banff, Canada, [13] I.Kononenko. Estmatng attrbutes: Analyss and extensons of relef. In Proceedngs of the Seventh ECML, Sprnger-Verlag, [14] Isabelle Guyon and André Elsseeff. An Introducton to Varable and Feature Selecton. JMLR, 003(3): [15] K.Crammer, R.Glad-Bachrach, A.Navot, N.Tshby. Margn analyss of the lvq algorthm. Proc. of 17 th CNIPS, 00. [16] V.Vapnk. The nature of statstcal learnng theory. Ne York :Sprnger-Verlag,1995 [17] C.Domencon, J.Peng, D.Gunopulos. Locally daptve metrc nearest-neghbor classfcaton. IEEE TPAMI,00, 4(9): [18] T. Haste and R.Tbshran. Dscrmnant Adaptve Nearest Neghbor Classfcaton. IEEE TPAMI, 1996, 18(6): Yang Mng receved hs Ph.D. degree n the department of computer scence and engneerng for Southeast Unversty at Nanjng n 004. He receved hs M.S. degree n the department of mathematcs from Unversty of Scence & Technology of Chna, and hs B.S. degree n the department of mathematcs from Anhu Normal Unversty, n 1990 and He s currently a Professor n the department of computer scence at Nanjng Normal Unversty. Hs research nterests nclude data mnng and knoledge dscovery, machne learnng, rough sets theory and ts applcatons. He s the members of Chnese Assocaton of Artfcal Intellgence(CAAI) Machne Learnng Socety and rough sets & Soft Computng Socety, respectvely. 008 ACADEMY PUBLISHER

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Human Face Recognition Using Generalized. Kernel Fisher Discriminant Human Face Recognton Usng Generalzed Kernel Fsher Dscrmnant ng-yu Sun,2 De-Shuang Huang Ln Guo. Insttute of Intellgent Machnes, Chnese Academy of Scences, P.O.ox 30, Hefe, Anhu, Chna. 2. Department of

More information

An Improved Spectral Clustering Algorithm Based on Local Neighbors in Kernel Space 1

An Improved Spectral Clustering Algorithm Based on Local Neighbors in Kernel Space 1 DOI: 10.98/CSIS110415064L An Improved Spectral Clusterng Algorthm Based on Local Neghbors n Kernel Space 1 Xnyue Lu 1,, Xng Yong and Hongfe Ln 1 1 School of Computer Scence and Technology, Dalan Unversty

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines. CS534 - Machine Learning Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Combination of Local Multiple Patterns and Exponential Discriminant Analysis for Facial Recognition

Combination of Local Multiple Patterns and Exponential Discriminant Analysis for Facial Recognition Sensors & ransducers 203 by IFSA http://.sensorsportal.com Combnaton of Local Multple Patterns and Exponental Dscrmnant Analyss for Facal Recognton, 2 Lfang Zhou, 2 Bn Fang, 3 Wesheng L, 3 Ldou Wang College

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Discriminative classifiers for object classification. Last time

Discriminative classifiers for object classification. Last time Dscrmnatve classfers for object classfcaton Thursday, Nov 12 Krsten Grauman UT Austn Last tme Supervsed classfcaton Loss and rsk, kbayes rule Skn color detecton example Sldng ndo detecton Classfers, boostng

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1 200 2th Internatonal Conference on Fronters n Handwrtng Recognton Incremental MQDF Learnng for Wrter Adaptve Handwrtng Recognton Ka Dng, Lanwen Jn * School of Electronc and Informaton Engneerng, South

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 www.ijcsi.org 374 An Evolvable Clusterng Based Algorthm to Learn Dstance Functon for Supervsed

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

Learning a Locality Preserving Subspace for Visual Recognition

Learning a Locality Preserving Subspace for Visual Recognition Learnng a Localty Preservng Subspace for Vsual Recognton Xaofe He *, Shucheng Yan #, Yuxao Hu, and Hong-Jang Zhang Mcrosoft Research Asa, Bejng 100080, Chna * Department of Computer Scence, Unversty of

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Supervised Nonlinear Dimensionality Reduction for Visualization and Classification

Supervised Nonlinear Dimensionality Reduction for Visualization and Classification IEEE Transactons on Systems, Man, and Cybernetcs Part B: Cybernetcs 1 Supervsed Nonlnear Dmensonalty Reducton for Vsualzaton and Classfcaton Xn Geng, De-Chuan Zhan, and Zh-Hua Zhou, Member, IEEE Abstract

More information

Image Feature Selection Based on Ant Colony Optimization

Image Feature Selection Based on Ant Colony Optimization Image Feature Selecton Based on Ant Colony Optmzaton Lng Chen,2, Bolun Chen, Yxn Chen 3, Department of Computer Scence, Yangzhou Unversty,Yangzhou, Chna 2 State Key Lab of Novel Software Tech, Nanng Unversty,

More information

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM Classfcaton of Face Images Based on Gender usng Dmensonalty Reducton Technques and SVM Fahm Mannan 260 266 294 School of Computer Scence McGll Unversty Abstract Ths report presents gender classfcaton based

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY JUN-HAI ZHAI, NA LI, MENG-YAO

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

The Discriminate Analysis and Dimension Reduction Methods of High Dimension

The Discriminate Analysis and Dimension Reduction Methods of High Dimension Open Journal of Socal Scences, 015, 3, 7-13 Publshed Onlne March 015 n ScRes. http://www.scrp.org/journal/jss http://dx.do.org/10.436/jss.015.3300 The Dscrmnate Analyss and Dmenson Reducton Methods of

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

Fuzzy Rough Neural Network and Its Application to Feature Selection

Fuzzy Rough Neural Network and Its Application to Feature Selection 70 Internatonal Journal of Fuzzy Systems, Vol. 3, No. 4, December 0 Fuzzy Rough Neural Network and Its Applcaton to Feature Selecton Junyang Zhao and Zhl Zhang Abstract For the sake of measurng fuzzy uncertanty

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

A new Unsupervised Clustering-based Feature Extraction Method

A new Unsupervised Clustering-based Feature Extraction Method A new Unsupervsed Clusterng-based Feature Extracton Method Sabra El Ferchch ACS Natonal School of Engneerng at Tuns, Tunsa Salah Zd AGIS lle Unversty of Scence and Technology, France Kaouther aabd ACS

More information

Fingerprint matching based on weighting method and SVM

Fingerprint matching based on weighting method and SVM Fngerprnt matchng based on weghtng method and SVM Ja Ja, Lanhong Ca, Pnyan Lu, Xuhu Lu Key Laboratory of Pervasve Computng (Tsnghua Unversty), Mnstry of Educaton Bejng 100084, P.R.Chna {jaja}@mals.tsnghua.edu.cn

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis Assgnment and Fuson of Multple Learnng Methods Appled to Remote Sensng Image Analyss Peter Bajcsy, We-Wen Feng and Praveen Kumar Natonal Center for Supercomputng Applcaton (NCSA), Unversty of Illnos at

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

A Lazy Ensemble Learning Method to Classification

A Lazy Ensemble Learning Method to Classification IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al

More information

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation Tranng of Kernel Fuzzy Classfers by Dynamc Cluster Generaton Shgeo Abe Graduate School of Scence and Technology Kobe Unversty Nada, Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract We dscuss kernel fuzzy classfers

More information

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults 1 An Improved Neural Network Algorthm for Classfyng the Transmsson Lne Faults S. Vaslc, Student Member, IEEE, M. Kezunovc, Fellow, IEEE Abstract--Ths study ntroduces a new concept of artfcal ntellgence

More information

An Anti-Noise Text Categorization Method based on Support Vector Machines *

An Anti-Noise Text Categorization Method based on Support Vector Machines * An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,

More information

Modular PCA Face Recognition Based on Weighted Average

Modular PCA Face Recognition Based on Weighted Average odern Appled Scence odular PCA Face Recognton Based on Weghted Average Chengmao Han (Correspondng author) Department of athematcs, Lny Normal Unversty Lny 76005, Chna E-mal: hanchengmao@163.com Abstract

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification Research of Neural Network Classfer Based on FCM and PSO for Breast Cancer Classfcaton Le Zhang 1, Ln Wang 1, Xujewen Wang 2, Keke Lu 2, and Ajth Abraham 3 1 Shandong Provncal Key Laboratory of Network

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches Proceedngs of the Internatonal Conference on Cognton and Recognton Fuzzy Flterng Algorthms for Image Processng: Performance Evaluaton of Varous Approaches Rajoo Pandey and Umesh Ghanekar Department of

More information

A Multivariate Analysis of Static Code Attributes for Defect Prediction

A Multivariate Analysis of Static Code Attributes for Defect Prediction Research Paper) A Multvarate Analyss of Statc Code Attrbutes for Defect Predcton Burak Turhan, Ayşe Bener Department of Computer Engneerng, Bogazc Unversty 3434, Bebek, Istanbul, Turkey {turhanb, bener}@boun.edu.tr

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Learning General Gaussian Kernels by Optimizing Kernel Polarization

Learning General Gaussian Kernels by Optimizing Kernel Polarization Chnese Journal of Electroncs Vol.18, No.2, Apr. 2009 Learnng General Gaussan Kernels by Optmzng Kernel Polarzaton WANG Tnghua 1, HUANG Houkuan 1, TIAN Shengfeng 1 and DENG Dayong 2 (1.School of Computer

More information