A novel feature selection algorithm based on hypothesis-margin

Size: px

Start display at page:

Download "A novel feature selection algorithm based on hypothesis-margin"

Brett Powell
5 years ago
Views:

1 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER A novel feature selecton algorthm based on hypothess-margn Mng Yang* Fe Wang and Png Yang Department of Computer Scence, Nanjng Normal Unversty, Nanjng, P.R.Chna Emal: {m.yang, yangpng}@njnu.edu.cn, f0701@163.com Abstract Iteratve search margn based algorthm(smba) has been proven effectve for feature selecton. Hoever, t stll has the follong dsadvantages: (1) the prevously proposed model stll lacks enough robust to noses; and () the gven model does not use any global nformaton, n ths ay some useful dscrmnaton nformaton may be lost and the convergence speed s also nfluenced n some cases. In ths paper, by ncorporatng global nformaton, a novel margn based feature selecton frameork s ntroduced. Accordng to the nely desgned model, an mproved margn based feature selecton algorthm(ismba) s proposed. By effectvely adjustng the contrbuton of the global nformaton, Ismba can effcently reduce the computatonal cost and at the same tme obtan more effectve feature subsets as compared to Smba. The experments on 6 artfcal and 8 real-lfe benchmark datasets sho that Ismba s effectve and effcent. Index Terms Feature selecton, Dmensonalty reducton, Hypothess-margn, Margn I. INTRODUCTION Dmensonalty reducton(dr) s one commonly appled approach[1]. There are a number of DR technque, and accordng to the adopted reducton strategy, they are usually dvded nto feature extracton[-4] and feature selecton[5] approaches. The key dfference beteen feature extracton and feature selecton s that the former one s based on generaton of a completely ne feature space through a functonal transformng, hle the latter one s to select a relevant subset of orgnal features. The classcal feature extracton methods are generally classfed nto lnear and nonlnear methods. Lnear approaches, such as Prncpal Component Analyss (PCA)[], and Lnear Dscrmnant Analyss (LDA)[1] and Localty Preservng Projectons(LPP)[3], am to project the hgh-dmensonal data to a loer-dmensonal space by lnear transformatons accordng to some crtera. On the other hand, nonlnear methods, such as Locally Lnear Embeddng(LLE)[4] ams to project the orgnal data by nonlnear transformatons hle preservng certan local nformaton accordng to some crtera. As authors of Ref.[5] ponted out, feature extracton s *Correspondng author. Tel.: ; fax: Emal:m.yang@njnu.edu.cn(Mng Yang). generally effectve. Hoever, the effectveness of the feature extracton algorthms may be obvously degraded hen processng large-scale data sets. In addton, ne varables usually concern th all orgnal features, so formng ne varables may contan lots of nformaton orgnated from those redundant features n the orgnal space, such as PCA[]. Unlke feature extracton, feature selecton can be veed as one of the most fundamental problems n machne learnng feld. It s defned as a process of selectng relevant features out of the larger set of canddate features. The relevant features are defned as features that descrbe the target task. As Lu ponted out n [6], the motvaton of feature selecton(also called attrbute reducton or feature reducton) n data mnng and machne learnng s to: reduce the dmensonalty of feature space, speed up and reduce the cost of a learnng algorthm, mprove the predctve accuracy of a classfcaton algorthm, and to mprove the vsualzaton and the comprehensblty of the nduced concepts. Especally, the authors of [6] have emphaszed that not every feature selecton method can serve all purposes. Generally, supervsed feature reducton methods can be categorzed nto to classes: the flter model [7-9][1][13]and the rapper model[10][11][14]. In the rapper model the feature selecton methods try to drectly optmze the performance of a specfc predctor. Along ths, the predctor generalzaton performance (e.g. by cross valdaton) needs to be estmated for the selected feature subset n each step. So, hgh computatonal cost s ts man dsadvantage. Currently, there are many flter methods avalable, ncludng Relef [7],FCBF [8], C-tree based feature selecton algorthm[9] and Smba[1], etc. Among them, Smba s the recently proposed margn based feature selecton approach, hch uses the so-called large margn prncple[15-16] as ts theoretcal foundaton to guarantee good performance for any feature selecton scheme hch selects small set of feature hle keepng the margn large. Meanhle, by the smoothness of the hypothess-margn based evaluaton functon, Smba uses a stochastc gradent ascent over the evaluaton functon to accelerate the feature selecton process. Roughly speakng, the man dea of Smba s to obtan an effectve subset of features such that the relatvely sgnfcant features have relatvely large eghts by usng hypothess-margn crteron. In 008 ACADEMY PUBLISHER

2 8 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 essence, smlar to Ref.[14], Smba s also a eghted method, that s, those features th relatvely larger eghts form the subset of features, but the key dfference s that Smba s flter method and most of the other eghted methods are related to a concrete classfer. Moreover, theoretcal analyss and experments sho that Smba can effectvely reduce the computatonal complexty and s more effectve than the classcal flter approach, such as Relef[7]. Hoever, one dsadvantage of Smba s ts nonrobustness to noses, that s, the eghts of some features by the effect of noses may become relatvely larger or may not converge to a relatvely small value or zero by usng 1NN crteron[17][18], snce noses may ncrease the contrbuton of some features to the hypothessmargn of samples. On the other hand, Smba only uses local nformaton for choosng a small set of features to make the hypothess-margn of samples large, n ths ay t may lose some useful dscrmnaton nformaton or global structure hdden n the global nformaton. Thus, the performance of classfers nduced by post-analyss algorthms n the ne feature space ll be degraded. In ths paper, e ntroduce a novel margn based feature selecton model called Ismba_FS hch ncorporates the global nformaton nto the recently proposed margn based feature selecton model[1] to elmnate the dsadvantages of Smba algorthm hle mantanng ts merts. In Ismba_FS, the man motvaton ncorporatng global nformaton attempts to make the dstance beteen a sample and the center pont th same class as small as possble and the dstance beteen a sample and the center pont th dfferent classes as large as possble, and meanhle a balance factor λ s ntroduced for dynamcally adjustng the contrbuton of global nformaton. Along the nely desgned model Ismba_FS, e ntroduce an mproved margn based feature selecton algorthm (Ismba). By adjustng the contrbuton of the global nformaton, Ismba can effcently reduce the computatonal cost and meanhle get a more effectve feature subset as compared to Smba. In summary, Ismba possesses several attractve characterstcs as follos: (1) the computatonal complexty can be effcently reduced, snce the centers of each class and remanng classes can be computed n advance, and ther contrbutons to the eght vector can be reflected at each teraton; () the classfcaton performance of classfers nduced by the selected small set of features can be effectvely mproved n some cases because the embedded global nformaton can guarantee both dscrmnaton nformaton-preservng and noses-resstng; and (3) the contrbuton of global nformaton can be dynamcally adjusted effectvely by the tradeoff parameter λ. The rest of ths paper s organzed as follos. In Secton, some basc concepts on margn(hypothessmargn and sample-margn) and Smba are brefly ntroduced. In Secton 3, by ncorporatng global nformaton nto the exstng feature selecton model based on hypothess-margn, a novel feature selecton model and correspondng feature selecton algorthm are presented. Some expermental comparsons are ntroduced n Secton 4. Fnally, Secton 5 gves our conclusons and several ssues for future orks. II. PRELIMINARIES A. SAMPLE-MARGIN AND AND HYPOTHESIS-MARGIN As authors of Ref.[1] ponted out, margns play a crucal role n modern machne learnng research. They measure the classfer confdence hen makng ts decson. Margns are used for theoretc generalzaton bounds and as gudelnes for algorthm desgn. As descrbed n Ref.[15], there are to natural ay of defnng the margn of a sample th respect to a classfcaton rule. The more common type, samplemargn, measures the dstance beteen the sample and the decson boundary nduced by the classfer, e.g., Support Vector Machne (SVM)[16] fnds the separatng hype-plane th the largest sample-margn. Obvously, those feature selecton methods based on sample-margn need hgh computatonal cost for large-scale or/and hghdmensonal data sets. So, as an alternatve defnton, the hypothess-margn as ntroduced n [1][15]. The margn of a hypothess th respect to a sample s the dstance beteen the hypothess and the closest hypothess that assgns alternatve label to the gven sample. The hypothessmargn of a sample x for 1NN th respect to a set to samples P s defned as follos. 1 θ P( x) = ( x nearmss( x) x nearht( x ) ) (1) here nearht(x) and nearmss(x) denote the nearest sample to x n P th the same and dfferent label, respectvely. By (1), e hope to choose a subset of orgnal features such that the hypothess-margn becomes as large as possble. Based on hypothessmargn, the effectve feature subset can be effcently obtaned by correspondng feature selecton algorthms, snce n the case of Nearest Neghbor large hypothessmargn can ensures large sample-margn, and hypothessmargn s easy to compute as comparng to samplemargn. B. EVALUATION FUNCTION In order to obtan the more effectve subset of orgnal features, an evaluaton functon hch assgns a score to sets of features accordng to the hypothess-margn they nduce as ntroduced n Ref.[1], the hypothess-margn as a functon of the chosen set of features s formulated as follong Defnton 1. Defnton 1[1]. Let P be a set of samples and x be a sample. Let be a eght vector over the feature set, then the hypothess-margn of x s 1 θ P = ( x nearmss ( x ) x nearht ( x ) ) () here z = z. 008 ACADEMY PUBLISHER

3 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER Further, by (), the authors of Ref.[1] provde a strategy for computng the hypothess-margn of all the gven samples by follong Defnton. Defnton [1]. Gven a tranng set S and a eght vector, the evaluaton functon s e( ) = θ( S { x}) ( x ) (3) x S By (3), the feature set can be found by maxmzng the hypothess-margn drectly, that s, the eght vector that maxmzes e() as defned n (3) s frst found. Then, let max =1, the correspondng normalzaton eght vector can be obtaned, hence a subset of features can be naturally gotten by usng a threshold. C. ITERATIVE SEARCH MARGIN BASED ALGORITHM In order to quckly and effectvely obtan a subset of orgnal features, the so-called gradent ascent strategy as employed n [1] for maxmzng e() as defned n (3), snce e() s smooth almost everyhere. The gradent of e() hen evaluated on the set of samples S s e( ) θ ( x) ( e( )) = = x S 1 ( x nearmss( x) ) ( x nearht( x) ) = ( ) x S x nearmss( x) x nearht( x) As descrbed above, the descrpton of the teratve search margn based algorthm for feature selecton s as follos. Algorthm 1. Smba[1] 1. ntalze =(1,1,,1);. for t=1,,,t; (a) pck randomly a sample x from S; (b) calculate nearmss(x) and nearht(x) th respect to (S {x}) and the eght vector ; (c) for =1,,,N calculate 1 ( x ( )) ( ( )) ( nearmss x x nearht x = ) x nearmss( x) x nearht( x) (d) = + 3. / hen ( ) : = ( ). Snce ncreases, the relatve effect of the correcton term decreases and the algorthm typcally convergence. The computatonal complexty of Smba s O(TNm), here T s the number of teratons, N s the number of features and m s the sze of the samples S. Obvously, Smba s n hgh effcency. Further, the numercal experments sho that Smba outperforms Relef. Hoever, Smba only uses the a fe local nformaton to calculate the hypothess-margn of the gven samples accordng to the margn measure of a hypothess of a sample, n ths ay some rrelevant features may have relatvely larger eghts due to the nfluence of noses, hence Smba s stll non-robust. At the same tme, Smba cannot use the useful dscrmnaton nformaton hdden n the global nformaton, ths may lead to loss of some useful features. III. ISIMBA_FS AND ISIMBA (4) In order to overcome the dsadvantages of Smba and stll retan ts characterstcs, e propose a novel margn based feature selecton model called Ismba_FS ncorporatng global nformaton. Further, based on Ismba_FS, e ntroduce an mproved margn based feature selecton algorthm(ismba). A. HYPOTHESIS-MARGIN INCORPORATING GLOBAL INFORMATION To remedy the shortcomngs of Smba algorthm, e ntroduce a novel margn based feature selecton model( Ismba_FS ) ncorporatng global nformaton as follos. ˆ 1 θp ( x ) = ( x nearmss ( x ) x nearht ( x ) ) λ + ( x centermss( x) x centerht( x) ) here nearht(x) and nearmss(x) denote the nearest sample to x n P th the same and dfferent label, respectvely; centerht(x) and centermss(x) denote the centers to x n P th the same and dfferent label, respectvely; λ (Lambda) s an adjustable parameter, t s used to control the contrbuton of global nformaton to the hypothess-margn descrbed n Secton. To balance the contrbutons of local and global nformaton to the hypothess-margn, n ths paper e employ a nonoptmzed but effectve strategy, let λ be selected n {0,0.001,0.01,0.1,0.3,0.5,1,5}. The novel hypothess-margn ncorporatng global nformaton naturally has the follong merts: (1) hen λ =0, Eq.(5) degrades to Eq.(1), that s, the ne hypothess-margn model s an extenson of the orgnal hypothess-margn. () hen λ >0, all margn based feature selecton algorthms nduced by Eq.(5) can effcently reduce the number of teratons to some extent, snce n a gven samples the centers to any samples th the same and dfferent label can be computed n advance. Moreover, hen λ gradually ncreases, the convergence speed of those feature selecton algorthms based on Eq.(5) can be naturally accelerated. (3) the adjustable parameter λ can effectvely adjust the tradeoff beteen the contrbutons of local and global nformaton to the ne hypothess-margn. By tunng λ to a relatvely larger value, those features preservng the dscrmnaton nformaton or global structure hdden n data and holdng larger hypothess-margn can effectvely obtaned. (4) the robustness of the feature selecton algorthms nduced by Eq.(5) can be effectvely enhanced because the embedded global nformaton can constran the nfluence of noses to some extent. B. A NOVEL EVALUATION FUNCTION BASED ON THE NEW HYPOTHESIS-MARGIN Based on Eq.(5), e propose a ne evaluaton functon usng the same strategy as Defnton 1, hch assgns a score to sets of features accordng to the ne hypothessmargn, ts defnton s as follos. (5) 008 ACADEMY PUBLISHER

4 30 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 Defnton 3. Let P be a set of samples and x be a sample, and be a eght vector over the feature set, then the ne hypothess-margn of x s ˆ 1 θp = (( x nearmss( x) x nearht( x) ) (6) + λ( x centermss( x) x centerht( x) )) Further, based on ths evaluaton functon, e ntroduce a strategy for computng the ne hypothessmargn of all the gven samples as follos. Defnton 4. Gven a tranng set S and a eght vector, the ne evaluaton functon s eˆ( ) = ˆ θ ( x) x S ( S { x}) Accordng to Eq.(7), t s also nature to consder the evaluaton functon only for eght vector such that max = 1. Hoever, smlar to (3), e can also gnore the constrant = 1 hen computng the eght vector, snce eˆ( β) = βeˆ( ). After fndng, let (7) max =1, e normalze the eght vector and easly obtan a subset of features by usng a threshold. C. IMPROVED ITERATIVE SEARCH MARGIN BASED ALGORITHM As analyzed n Secton 3., e can use the eghts drectly by usng the nduced dstance measure nstead for maxmzng eˆ( ) as defned n (7). Also, e can drectly use gradent ascent n order to maxmze t, snce eˆ( ) s smooth almost everyhere. Smlar to (4), the gradent of eˆ( ) hen evaluated on the set of samples S s as follos. ( eˆ ( )) eˆ( ) ˆ( θ x ) = = x S 1 ( x nearmss( x)) ( x nearht( x)) = ( ( ) x S x nearmss( x) x nearht( x) (8) ( x centermss( x)) ( x centerht( x)) + λ ( )) x S x centermss( x) x centerht( x) By Eq.(8), an mproved teratve search margn based algorthm(ismba) for feature selecton s as follos. Algorthm. Ismba 1.let λ be a proper non-negatve number;. ntalze =(1,1,,1); 3. for each x S do calculate centermss(x) and centerht(x), here centermss(x) and centerht(x) are the centers to x n P th the same and dfferent label, respectvely; 4. for t=1,,,t; (a) pck randomly a sample x from S; (b) calculate nearmss(x) and nearht(x) th respect to (S {x}) and the eght vector ; (c) for =1,,,N calculate ˆ 1 ( x nearmss( x)) ( x nearht( x)) = (( ) x nearmss( x) x nearht( x) ( x centermss( x) ) ( x centerht( x) ) + λ( )) x centermss( x) x centerht( x) (d) = + ˆ 5. / hen ( ) : = ( ). In Ismba algorthm, e also use a stochastc gradent ascent over eˆ( ) hle gnorng the constrant = 1, the normalzaton on the constrant s done only at the step 5, snce eˆ( β) = βeˆ( ). Moreover, n each teraton e only evaluate one term n the sum n (8) and add t to the eght vector. In addton, the term ˆ n step 4(d) s nvarant to scalar scalng of, snce ˆ( ) = ˆ( β ). So, hen ncreases gradually, the relatve effect of the correcton term ˆ decreases and Ismba typcally convergence. Also, the second term of Eq.(8) can be drectly nduced by the obtaned centermss(x) and centerht(x) because both centermss(x) and centerht(x) are computed n advance for any gven sample x, hence t s easy to see that the contrbuton of global nformaton to eght vector almost need not spend any computatonal cost. Also, n ntutve, hen λ gradually ncreases, the effect of global nformaton to the eght vector also ncreases naturally, ths drectly leads to reducton of the number of teratons. So, Ismba algorthm can effcently reduce the computatonal cost. Further, n most cases Ismba algorthm can obtan the more effectve subset of features as compared to Smba, snce by dynamcally adjustng the controlled parameter λ, Ismba makes the global structure or dscrmnaton nformaton hdden n data to be preserved n the ne feature space. Especally, hen λ =0, Ismba degrades to Smba, hence they can obtan a consstent subset of features n ths case, but ths s not the optmzed result of Ismba. Moreover, by ncorporatng relevant global nformaton, the robustness of Ismba can be effectvely enhanced to some extent. In addton, to select a relatvely small feature set, e can stll use the same threshold strategy as Ref.[13], namely all features th a relevance score less than the specfed threshold are removed, e.g., f the relevant threshold δ (delta) s set to 0.01, all features th the eght less than δ are removed. In general, by ncorporatng the global nformaton, Ismba should outperform Smba ntutvely. To test the performance of Ismba the related experments on 6 artfcal and 8 real-lfe datasets ll be presented n Secton 4. IV. EXPERIMENTAL RESULTS In order to evaluate the performance of Smba and Ismba, e carry out the experments on 6 artfcal and 8 real-lfe datasets, and then compare ther performance ncludng the robustness to noses, the computatonal cost 008 ACADEMY PUBLISHER

JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 31 and the classfcaton accuraces of the classfers knn and SVM based on the ne feature subspace. A.

5 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER and the classfcaton accuraces of the classfers knn and SVM based on the ne feature subspace. A. ARTIFICIAL DATASETS In ths subsecton, the robustness of Smba and Ismba s testfed, here e call a feature selecton algorthm s robust enough f the feature subset obtaned from the dataset th noses s almost consstent th that generated from the orgnally pure dataset. For ths purpose, e generate one artfcal dataset th no noses, dataset1, hch s a to-dmensonal artfcal dataset composed of to classes and 40 samples as shon n Fg. 1(a). The 10 samples n class 1 of the dataset1 are randomly generated from a Gaussan dstrbuton th mean [3,5] and varance dag[0.5,0.5], and the 10 samples n class of the dataset1 are randomly generated from a Gaussan dstrbuton th mean [3,1] and varance dag[0.5,0.5]. Further, n order to examne the robustness of Smba and Ismba, e generate 5 artfcal datasets dataset, dataset3, dataset4, and dataset5 and dataset6 th 5%,10%,0%,5% and 30% noses as seen n Fg.1(b-f) respectvely, n hch the nose samples n class 1 are randomly generated from a Gaussan dstrbuton th mean [6,-1] and varance dag[10,10], and the nose samples n class are randomly generated from a Gaussan dstrbuton th mean [8,6] and varance dag[10,10]. As can be seen n Fg., for the gven dataset1 th no noses, Smba can effectvely flter out the frst feature due to ts eght less than the gven threshold δ (hereδ s set to 0.01). Hoever, from the Fg., e can also observe that Smba s non-robust hen the nose samples n orgnal dataset s over 5%. Meanhle, as seen from Fgs.3 and 4, hen the controlled parameter λ ncreases gradually, the robustness of Ismba becomes more stronger. That s, the robustness of Ismba can be enhanced by properly adjustng the balance parameter λ. Further, to analyze the connecton beteen the computatonal cost and global nformaton, e say the algorthm can not effectvely elmnate the nfluence of noses hen t repeatedly terates the gven maxmum number of teratons but those rrelevant features also holds relatvely large eghts. In Fg. 5, the maxmum number of teratons s set to From Fg. 5, e observe that the computatonal cost s effcently reduced th large λ. At the same tme, t also can be seen that e need to ncorporate more global nformaton to copy th the nfluence of noses. So, usng the margn based feature selecton model as baselne, the nely developed model can effectvely and effcently mprove the performance of the correspondng feature selecton algorthms by ncorporatng reasonably global nformaton. 008 ACADEMY PUBLISHER

The number of teratons of Ismba Fgure. The eghts Smba assgns to the features Fgure 6.

6 3 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 Fgure 4. The eghts ISmba assgns to the features under 0%noses Fgure 1. Artfcal datasets(dataset1-6) Fgure 5. The number of teratons of Ismba Fgure. The eghts Smba assgns to the features Fgure 6. The classfcaton accuracy of KNN(k=5) on Ionosphere Fgure 3. The eghts ISmba assgns to the features under 10%noses Fgure 7. The classfcaton accuracy of KSVM on Ionosphere 008 ACADEMY PUBLISHER

7 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER TABLE 1. EXPERIMENTAL RESULTS(δ =0.01) FS Parameter dataset algorthm λ s N f 3NN 5NN 7NN KSVM Wne Ismba BLD Ismba 0, Tyrod Ismba CMC Smba Ismba Sonar Smba Ismba Dabete Smba Ismba Ion Smba Ismba Wave Smba Ismba Notes: - means that no feature selecton s done; the cells labeled usng blue represent the accuracy of the SVM-based and knn-based classfers on orgnal features; the bold face cells denote that knn-based and/or SVM-based classfers nduced by the chosen feature subsets have more better or comparable performance. B. REAL-LIFE DATASETS In ths paper, those employed datasets are the publcly avalable datasets from UCI database (donloaded at A bref descrpton for the UCI datasets s gven at frst: (1) Wne recognton data (Wne) :178 objects, 3-class, 13-features;For short,(178, 3C,13F) () BUPA Lver Dsorders(BLD): (345,C,6F); (3)Thyrod: (15,3C,5F); (4) Contraceptve Method Coce (CMC): (1473,C, 9F); (5) Sonar: (08,C,60F); (6) Pma Indans Dabetes (Dabete): (768,C,8F); (7) Ionosphere (Ion):(351,C, 34F); (8) Waveform doman data (Wave): (5000,3C, 1F). In our experments, every dataset s randomly parttoned nto to halves: one half s used for tranng and the other for testng. It s orth notng that all features are normalzed to the range beteen 0 and 1. the balance parameter λ s selected n {0,0.001,0.01, 0.1,0.3,0.5,1,5} for smplcty. To select a relatvely small feature set, n ths paper the relevant threshold δ s set {0.001, 0.01, 0.05}, all features th ther eght less than δ are removed. In addton, let λ s denotes the set of the values of λ that can get the same feature subset hen δ s fxed, N f the number of the selected features. Also, both Smba and Ismba are ndependent of post-analyss algorthms (predctors), here e choose ell knon k- Nearest-Neghbor (knn)[16][17] and SVM(Support Vector Mahne) th kernels[15] as evaluatng crtera for testng the classfcaton accuracy of the chosen feature subset. In the KSVM algorthm, e employ C-SVM model and use RBF kernel as the kernel functon. The expermental results are shon n Table 1 hen δ s set to From Table 1, e can fnd that the effectve feature subsets could be obtaned, and correspondng classfers have better or comparable performance than those based on orgnal features on all datasets. Further, for those datasets dabete, onosphere and ave, by adjustng the balance parameter λ Ismba algorthm can get more effectve feature subsets as compared to Smba. So, ncorporatng approprately global nformaton s of great beneft to enhancng the performance of the nely developed model. At the same tme, from Table 1, e also fnd that those effectve feature subsets are obtaned hen λ s relatvely small, namely t vares thn a range around 0.1. Ths s consstent th our ntutve observaton, because for any sample x, t s near to the center pont th same label but far to the center pont th dfferent label. In other ords, as comparng to the contrbuton of the local nformaton, the global nformaton may have a relatvely great effect to the eght vector n most cases. So, a relatvely small λ can effectvely adjust the trade-off beteen the local nformaton and global nformaton for gettng more effectve feature subset. Further, from the expermental result, e also observe that those very mportance features are alays kept n the results no matter λ s 0 or not 0. Ths further ndcates that Smba s ndeed a very mportant feature selecton algorthm. Based on Ismba, by tunng the parameter λ, the eghts of some effectve features hch are fltered out by Smba should be ncreased. Hence, Ismba s naturally an extenson and mprovement of Smba. It s orth notng that the parameters n Table 1 s not optmzed results, snce δ s set to a fxed value 0.01, λ s selected n {0,0.001,0.01,0.1,0.3,0.5,1,5}. Hoever, e stll need to pont out that even so, on all 8 datasets, the classfers nduced by Ismba stll acheve better or comparable classfcaton performance as compared to the classfers establshed by all features. C. INFLUENCE OF THE PARAMETERS δ AND λ ON CLASSIFICATION PERFORMANCE In the above experment, for a fxed threshold parameter δ, although Ismba can get a more effectve feature subset n most cases, ths s not optmzed results. In order to get a relatvely approprate value of δ, e need to record the performance of the classfers hen δ and λ change. For smplcty of llustraton, e just present the value of λ s mpact on the performance of Ismba on the dataset Ionosphere under the dfferent δ, here let δ be set three dfferent values n {0.001,0.01, 0.05}. Fgs.6 and 7 llustrate the changng curves of KNN(K=5) and KSVM on the dataset Ionosphere th both δ and λ values changng ncrementally. From to fgures, e can see that the more effectve feature subset can be obtaned as adjustng both δ and λ to some proper values, e.g., from Fgs.6 and 7, e can get a more effectve feature subset hen λ = 0.5, snce the accuracy of KNN(K=5) nduced by ths feature subset ncreases from 78.41% to 91.48% hen and δ =0.05, hle the accuracy of KSVM nduced by ths feature subset ncreases from 81.5% to 94.89%(93.75%) hen δ =0.001(0.01). Ths further ndcates that ncorporatng sutably global nformaton s of great beneft to gettng the more feature subset. 008 ACADEMY PUBLISHER

8 34 JOURNAL OF COMPUTERS, VOL. 3, NO. 1, DECEMBER 008 It s knon that the more features are removed as δ ncreases, hle the more features are chosen hen δ decreases. From Fgs.6 and 7, e also observe that determnng a proper value of δ s also a challengng ssue, snce the classfcaton accuracy vares unorderly hen δ changes orderly under λ s fxed, e.g., as can be seen from Fg.6, the classfcaton accuracy ncreases from 78.98% to 91.48% as δ varyng from to 0.05 under λ =0.001, but the classfcaton accuracy decreases from 9.05% to 87.5% as δ varyng from to 0.05 under λ =0.5. So, e can fnd that selectng a feature only accordng to ts eght s only relatvely effectve but not optmzed method, because those feature subset th relatvely larger eghts may contan some rrelevant features. Usng the eghts of features as baselne, seekng the ne strategy to select those more effectve features s our ongong ork. V. CONCLUSIONS AND FUTURE WORK In ths paper, e ntroduce a novel margn based feature selecton model, hch effectvely ncorporates global nformaton nto the orgnal hypothess-margn based feature selecton model. In the ne model, the contrbuton of global nformaton can be dynamcally adjusted, hence hch s an extenson of the orgnal model. Further, based on the ne hypothess-margn, a novel feature selecton algorthm(ismba) s ntroduced. By properly ncorporatng global nformaton, the nely developed algorthm not only can enhance effectvely ts robustness to noses, but also can preserve the global structure nformaton hdden n the gven data and meanhle reduce effcently the computatonal cost. Consequently, the classfcaton performance of the classfers folloed by establshng the nely proposed algorthm s mproved on almost all datasets used here. On the other hand, expermental results on 8 real-lfe datasets are summarzed as follos: (1) the classfers nduced by the obtaned feature subset usng Ismba consstently outperform the correspondng classfers obtaned by the all features n classfcaton performance on all the datasets; () the balance and threshold parameters λ and δ nfluence the qualty of the obtaned feature subset, adjustng sutably ther values can guarantee to obtan a more effectve feature subset; and (3) a relatvely large tradeoff parameter λ can effcently reduce the tme complextes of Ismba. Our further and ongong orks nclude the adaptve determnaton of both parameters λ and δ, and ho to more reasonably use global nformaton. ACKNOWLEDGMENT Ths ork as supported n part by Natonal Natural Scence Foundaton of P.R.Chna and Jangsu Provnce under Grant Nos and BK008430, respectvely. REFERENCES [1] K.Fukunaga. Introducton of Statstcal Pattern Recognton. Second ed. Academc Press, 1991 [] I.T.Jollffe. Prncpal Component Analyss. Second ed. Wley,00. [3] X.He, S.Yan, Y.Hu, et al. Face Recognton Usng Laplacanfaces. IEEE TPAMI, 005,7(3): [4] S.Roes and L.Saul. Nonlnear Dmensonalty Reducton by Locally Lnear Embeddng. Scence, 000,9(): [5] J.Yan, B.Y.Zhang, N.Lu, et al. Effectve and Effcent Dmensonalty Reducton for Large-Scale and Streamng Data Preprocessng. IEEE TKDE, 006,18(3): [6] H.Lu and H.Motoda. Feature selecton for knoledge dscovery and data mnng. Kluer, Boston,1998. [7] K.Kra and L.Rendell. A practcal approach to feature selecton. Proc. 9 th Inte. Workshop on Machne Learnng, [8] L.Yu and H.Lu. Feature selecton for hgh-dmensonal data: a fast correlaton-based flter soluton. In proceedngs of ICML 003. [9] Mng Yang, Png Yang. A Novel Condensng Tree Structure for Rough Set feature selecton. Neurocomputng, 008,71(4-6): [10] G.H.John, R.Kohav and K.Pfleger. Irrelevant feature and the subset selecton problem. Proc. of the 11 th ICML, Morgan Kaufmann Publshers,San Francsco,CA, [11] R.Kohav & G.John. Wrappers for feature subset selecton. Artfcal Intellgence,1997,97(1-):73-34 [1] Ran Glad-Bachrach, Amr Navot and Naftal Tshby. Margn Based Feature Selecton-Theory and Algorthms. In Proc. of the 1 st ICML, Banff, Canada, [13] I.Kononenko. Estmatng attrbutes: Analyss and extensons of relef. In Proceedngs of the Seventh ECML, Sprnger-Verlag, [14] Isabelle Guyon and André Elsseeff. An Introducton to Varable and Feature Selecton. JMLR, 003(3): [15] K.Crammer, R.Glad-Bachrach, A.Navot, N.Tshby. Margn analyss of the lvq algorthm. Proc. of 17 th CNIPS, 00. [16] V.Vapnk. The nature of statstcal learnng theory. Ne York :Sprnger-Verlag,1995 [17] C.Domencon, J.Peng, D.Gunopulos. Locally daptve metrc nearest-neghbor classfcaton. IEEE TPAMI,00, 4(9): [18] T. Haste and R.Tbshran. Dscrmnant Adaptve Nearest Neghbor Classfcaton. IEEE TPAMI, 1996, 18(6): Yang Mng receved hs Ph.D. degree n the department of computer scence and engneerng for Southeast Unversty at Nanjng n 004. He receved hs M.S. degree n the department of mathematcs from Unversty of Scence & Technology of Chna, and hs B.S. degree n the department of mathematcs from Anhu Normal Unversty, n 1990 and He s currently a Professor n the department of computer scence at Nanjng Normal Unversty. Hs research nterests nclude data mnng and knoledge dscovery, machne learnng, rough sets theory and ts applcatons. He s the members of Chnese Assocaton of Artfcal Intellgence(CAAI) Machne Learnng Socety and rough sets & Soft Computng Socety, respectvely. 008 ACADEMY PUBLISHER

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School