An Improved Particle Swarm Optimization for Feature Selection

Similar documents
The Research of Support Vector Machine in Agricultural Data Classification

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Network Intrusion Detection Based on PSO-SVM

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

Support Vector Machines

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Classifier Selection Based on Data Complexity Measures *

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Meta-heuristics for Multidimensional Knapsack Problems

Complexity Analysis of Problem-Dimension Using PSO

A Notable Swarm Approach to Evolve Neural Network for Classification in Data Mining

Classification / Regression Support Vector Machines

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

Classifier Swarms for Human Detection in Infrared Imagery

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Analysis of Particle Swarm Optimization and Genetic Algorithm based on Task Scheduling in Cloud Computing Environment

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

A Clustering Algorithm Solution to the Collaborative Filtering

Support Vector Machines

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

Optimizing SVR using Local Best PSO for Software Effort Estimation

Smoothing Spline ANOVA for variable screening

Data Mining For Multi-Criteria Energy Predictions

An Optimal Algorithm for Prufer Codes *

THE PATH PLANNING ALGORITHM AND SIMULATION FOR MOBILE ROBOT

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

A fast algorithm for color image segmentation

CHAPTER 4 OPTIMIZATION TECHNIQUES

A Binarization Algorithm specialized on Document Images and Photos

Optimal Design of Nonlinear Fuzzy Model by Means of Independent Fuzzy Scatter Partition

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Natural Computing. Lecture 13: Particle swarm optimisation INFR /11/2010

Using Particle Swarm Optimization for Enhancing the Hierarchical Cell Relay Routing Protocol

Study of Data Stream Clustering Based on Bio-inspired Model

Edge Detection in Noisy Images Using the Support Vector Machines

Performance Evaluation of Information Retrieval Systems

Available online at Available online at Advanced in Control Engineering and Information Science

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

Training ANFIS Structure with Modified PSO Algorithm

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Image Feature Selection Based on Ant Colony Optimization

ARTICLE IN PRESS. Applied Soft Computing xxx (2012) xxx xxx. Contents lists available at SciVerse ScienceDirect. Applied Soft Computing

An Adaptive Multi-population Artificial Bee Colony Algorithm for Dynamic Optimisation Problems

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Parameters Optimization of SVM Based on Improved FOA and Its Application in Fault Diagnosis

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Cluster Analysis of Electrical Behavior

Classifying Acoustic Transient Signals Using Artificial Intelligence

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS

An Influence of the Noise on the Imaging Algorithm in the Electrical Impedance Tomography *

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

PARETO BAYESIAN OPTIMIZATION ALGORITHM FOR THE MULTIOBJECTIVE 0/1 KNAPSACK PROBLEM

Efficient Text Classification by Weighted Proximal SVM *

Support Vector Machines. CS534 - Machine Learning

A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A Novel Deluge Swarm Algorithm for Optimization Problems

K-means Optimization Clustering Algorithm Based on Hybrid PSO/GA Optimization and CS validity index

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Multi-objective Design Optimization of MCM Placement

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Classifier Ensemble Design using Artificial Bee Colony based Feature Selection

USING MODIFIED FUZZY PARTICLE SWARM OPTIMIZATION ALGORITHM FOR PARAMETER ESTIMATION OF SURGE ARRESTERS MODELS

Optimizing Document Scoring for Query Retrieval

Collaboratively Regularized Nearest Points for Set Based Recognition

Straight Line Detection Based on Particle Swarm Optimization

Face Recognition Method Based on Within-class Clustering SVM

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

S1 Note. Basis functions.

An Improved Image Segmentation Algorithm Based on the Otsu Method

Using Neural Networks and Support Vector Machines in Data Mining

RESEARCH ON JOB-SHOP SCHEDULING PROBLEM BASED ON IMPROVED PARTICLE SWARM OPTIMIZATION

A Load-balancing and Energy-aware Clustering Algorithm in Wireless Ad-hoc Networks

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling

Japanese Dependency Analysis Based on Improved SVM and KNN

MULTIOBJECTIVE OPTIMIZATION USING PARALLEL VECTOR EVALUATED PARTICLE SWARM OPTIMIZATION

Rule Discovery with Particle Swarm Optimization

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

Design of Structure Optimization with APDL

The Shortest Path of Touring Lines given in the Plane

Feature Reduction and Selection

UB at GeoCLEF Department of Geography Abstract

Transcription:

Journal of Bonc Engneerng 8 (20)?????? An Improved Partcle Swarm Optmzaton for Feature Selecton Yuannng Lu,2, Gang Wang,2, Hulng Chen,2, Hao Dong,2, Xaodong Zhu,2, Sujng Wang,2 Abstract. College of Computer Scence and Technology, Jln Unversty, Changchun 3002, P. R. Chna 2. Key Laboratory of Symbolc Computaton and Knowledge Engneerng of Mnstry of Educaton, Jln Unversty, Changchun 3002, P. R. Chna Partcle Swarm Optmzaton (PSO) s a popular and bonc algorthm based on the socal behavor assocated wth brd flockng for optmzaton problems. To mantan the dversty of swarms, a few studes of mult-swarm strategy have been reported. However, the competton among swarms, reservaton or destructon of a swarm, has not been consdered further. In ths paper, we formulate four rules by ntroducng the mechansm for survval of the fttest, whch smulates the competton among the swarms. Based on the mechansm, we desgn a modfed Mult-Swarm PSO (MSPSO) to solve dscrete problems, whch conssts of a number of sub-swarms and a mult-swarm scheduler that can montor and control each sub-swarm usng the rules. To further settle the feature selecton problems, we propose an Improved Feature Selecton (IFS) method by ntegratng MSPSO, Support Vector Machnes (SVM) wth F-score method. The IFS method ams to acheve hgher generalzaton capablty through performng kernel parameter optmzaton and feature selecton smultaneously. The performance of the proposed method s compared wth that of the standard PSO based, Genetc Algorthm (GA) based and the grd search based methods on 0 benchmark datasets, taken from UCI machne learnng and StatLog databases. The numercal results and statstcal analyss show that the proposed IFS method performs sgnfcantly better than the other three methods n terms of predcton accuracy wth smaller subset of. Keywords: partcle swarm optmzaton, feature selecton, data mnng, support vector machnes Copyrght 20, Jln Unversty. Publshed by Elsever Lmted and Scence Press. All rghts reserved. do: Introducton Feature selecton s one of the most mportant factors whch can nfluence the classfcaton accuracy rate. If the dataset contans a number of, the dmenson of the space wll be large and non-clean, degradng the classfcaton accuracy rate. An effcent and robust feature selecton method can elmnate nosy, rrelevant and redundant data []. Feature subset selecton algorthms can be categorzed nto two types: flter algorthms and wrapper algorthms. Flter algorthms select the feature subset before the applcaton of any classfcaton algorthm, and remove the less mportant from the subset. Wrapper methods defne the learnng algorthm, the performance crtera and the search strategy. The learnng algorthm searches for the subset usng the tranng data and the performance of the current subset. Partcle Swarm Optmzaton (PSO) was motvated from the smulaton of smplfed socal behavor of brd flockng, frstly developed by Kennedy and Eberhart [2 3]. It s easy to mplement wth few parameters, and t s wdely used to solve the optmzaton problems, as well as feature selecton problem [4 5]. Varous attempts have been made to mprove the performance of standard PSO n recent years. However, few studes have put emphass on researchng nto mult-swarm strategy. Usually, the PSO-based algorthms only have one swarm that contans a number of partcles. The PSO-based algorthms usng mult-swarm strategy have more exploraton and explotaton abltes due to the fact that dfferent swarms have the possblty to explore dfferent parts of the soluton space [6]. On the other hand, standard PSO converges over tme, thereby losng dversty, and thus ther ablty to quckly react to a peak s move. The mult-swarm PSO can sustan the dversty of swarms, and ensure ts adaptablty, thereby mprovng the performance of PSO. Correspondng author: Xaodong Zhu E-mal: zhuxaodong.jlu@gmal.com

2 Journal of Bonc Engneerng (20) Vol.8 No.2 Blackwell and Branke [7] splt the populaton of ssts of a number of sub-swarms and a schedulng module. partcles nto a set of nteractng swarms. They used a smple competton mechansm among swarms that are close to each other. The wnner s the swarm wth the best functon value at ts swarm attractor. The loser s expelled and rentalzed n the search space, otherwse the wnner remans. Parrott and L [8] dvded the swarm populaton nto speces subpopulatons based on ther smlarty. Addtonal duplcated partcles are removed when partcles are dentfed as havng the same ftness wth the speces seed wthn the same speces. After destroyng the duplcated ones, the new partcles are added randomly untl ts sze s resumed to ts ntal sze. Nu et al. [9] proposed Mult-swarm Cooperatve Partcle Swarm Optmzer (MCPSO) based on a master-slave model, n whch a populaton conssts of one master swarm and several slave swarms. MCPSO s based on an antagonstc scenaro, where the master swarm enhances ts partcles by a seres of compettons wth the slave warms. The master swarm enhances ts partcles based on drect competton wth the slave swarms, and the most ftted partcles n all the swarms possess the opportunty to gude the fght drecton of the partcles n the master swarm. However, the studes mentoned above have only solved the tradtonal optmzaton problems, namely contnuous parameter optmzaton. Our proposed Mult-Swarm Partcle Swarm Optmzaton (MSPSO) The survval of the fttest s ntroduced to decde whether a sub-swarm should be destroyed or reserved. To acheve that goal, 4 rules are desgned. The schedulng module montors and controls each sub-swarm accordng to the rule durng the teratons. (2) The F-score [0], whch can calculate the score of each feature, was ntroduced to evaluate the results of the feature selecton. The objectve functon s desgned accordng to classfcaton accuracy rate and the feature scores. (3) An Improved Feature Selecton (IFS) method was proposed, whch conssts of two stages. In the frst stage, both the Support Vector Machnes (SVM) parameter optmzaton and the feature selecton are dynamcally executed by MSPSO. In the second stage, SVM model performs the classfcaton tasks usng these optmal values and selected va 0-fold cross valdaton. The remander of ths paper s organzed as follows. Secton 2 revews basc prncples of PSO and SVM. Secton 3 descrbes the objectve functon, mult-swarm schedulng module and IFS approach n detal. Secton 4 presents the expermental results on 0 benchmark date sets. Fnally, secton 5 summarzes the concluson. 2 Basc prncples 2. Partcle swarm optmzaton can not only solve the contnuous parameter problems PSO orgnated from the smulaton of socal behavor of brds n a flock [2 3]. In PSO, each partcle fles but also the dscrete problems. Moreover, to mantan the dversty of swarms, they do not change the number of n the search space wth a velocty adjusted by ts own partcles, as well as the number of swarms, thereby gnorng the competton among the swarms. In ths paper, flyng memory and ts companon s flyng experence. Each partcle has ts objectve functon value whch s we propose MSPSO algorthm based on a modfed decded by a ftness functon: mult-swarm PSO through ntroducng the mechansm for survval of the fttest to descrbe the competton t t t t t t vd = w vd + c r ( pd xd ) + c2 r2 ( pgd xd ), () among the swarms. Four rules are desgned accordng to the mechansm, n whch the number of sub-swarms s allowed to reduce durng the teratons, namely, that some of the sub-swarms are destroyed durng the teratons, and the destroyed sub-swarms can not be reconstructed any more. To the best of our knowledge, ths s the frst paper to apply mult-swarm PSO to feature selecton problem. The man nnovatons n ths paper are descrbed as follows: () A MSPSO algorthm was proposed, whch con- where represents the th partcle and d s the dmenson of the soluton space, c denotes the cognton learnng factor, and c 2 ndcates the socal learnng factor, r and t r 2 are random numbers unformly dstrbuted n (0,), p d t and p gd stand for the poston wth the best ftness found so far for the th partcle and the best poston n the neghborhood, v t d and v t d are the veloctes at tme t and tme t, respectvely, and x t d s the poston of th partcle at tme t. Each partcle then moves to a new potental soluton based on the followng equaton:

Lu et al.: An Improved Partcle Swarm Optmzaton for Feature Selecton 3 x + = x + v, d =,2,..., D, (2) t t t d d d Kennedy and Eberhart [] proposed a bnary PSO n whch a partcle moves n a state space restrcted to 0 and on each dmenson, n terms of the changes n probabltes that a bt wll be n one state or the other: x d, rand( ) < S( vd, ), = (3) 0 Sv () =. v e (4) + The functon S(v) s a sgmod lmtng transformaton and rand( ) s a random number selected from a unform dstrbuton n [0.0,.0]. 2.2 Support vector machnes SVM s specfcally desgned for two-class problems [2 3]. Gven a tranng set of nstance-label pars (x, y ), =, 2,..., m, where x belongs to R n and y belongs to (+, ), the generalzed lnear SVM fnds an optmal separatng value f(x) = (w x) + b. The classfer s: n f ( x) = sgn{ ay( x x) + b}. (5) = For the non-lnear case, SVM wll map the data n a lower dmensonal space nto a hgher-dmensonal space through kernel trck. The classfer s: n f ( x) = sgn{ ayk( x x) + b}, (6) = where sgn{} s the sgn functon, a s Lagrange multpler, x s a tranng sample, x s a sample to be classfed, K(x x) s the kernel functon. Example kernel functon ncludes polynomal functon, lnear functon, and Radal Bass Functon (RBF). In ths work, we nvestgated the RBF kernel functon. 3 IFS approach We have proposed the IFS approach, whch combnes the parameter optmzaton and the feature selecton, n order to obtan the hgher classfcaton accuracy rate. A modfed PSO algorthm named MSPSO s proposed, whch holds a number of sub-swarms scheduled by the mult-swarm schedulng module. The multswarm schedulng module montors all the sub-swarms, and gathers the results from the sub-swarms. The storage of MSPSO s shown n Fg.. The SVM parameters, feature values and system parameters are descrbed n detal. We modfy the PSO to solve dscrete problem accordng to Ref. []. The proposed method conssts of two stages. In the frst stage, both the SVM parameter optmzaton and the feature selecton are dynamcally executed by MSPSO. In the second stage, SVM model performs the classfcaton tasks usng these optmal values and selected feature subsets va 0-fold cross valdaton. An effcent objectve functon s desgned accordng to classfcaton accuracy rate and F-score. The objectve functon conssts of two parts: one s classfcaton accuracy rate and the other s the feature score. Both of them are summed nto one sngle objectve functon by lnear weghtng. The two weghts are θ a and θ b, and each controls the weght of the specfc part. 3. Classfcaton accuracy The classfcaton accuracy for the dataset was measured accordng to followng equaton: N assess( n ) = accuracy( N ) =, n Ν N, (7) f classfy( n) = nc assess( n) = 0 otherwse where N s the set of data tems to be classfed (the test set), n N, nc s the class of the tem n, and classfy(n) returns the classfcaton accuracy rates of n by IFS. 3.2 F-score F-score s a smple technque whch measures the dscrmnaton of two sets of real numbers. Gven tranng vectors X k, k =.2,,m, f the number of postve and negatve nstances are n+ and n, respectvely, then the F-score of the th feature s defned as follows [0] : ( x x ) + ( x x ) F () =,(8) n ( + ) 2 ( ) 2 n+ n ( + ) ( + ) 2 ( ) ( ) 2 ( xk, x ) + ( xk, x ) + k= n k= ( ) where x, x + ( ), x are the averages of the th feature of the whole, postve, and negatve datasets, respectvely. ( ) x + k, s the th feature of the kth postve nstance, and ( ) x k, s the th feature of the kth negatve nstance. The numerator shows the dscrmnaton between the postve and negatve sets, and the denomnator defnes the

4 Journal of Bonc Engneerng (20) Vol.8 No.2 one wthn each of the two sets. The larger the F-score s, the more ths feature s dscrmnatve. Both of ths data have low F-scores as n Eq. (8) the denomnator (the sum of varances of the postve and negatve sets) s much larger than the numerator. Xe and Wang [20] proposed the mproved F-score to measure the dscrmnaton between them. Gven tranng vectors xk, k =, 2,, m, and the number of datasets l(lp2), f the number of the jth dataset s nj, j =, 2,, l, then the F-score of the th feature s defned as: F = l j= n j= j k= l ( x x ) ( j) 2 n j ( x x ) ( j) ( j) 2 k, ( ) where x, x j are the average of the th feature of the ( j) whole dataset and the jth dataset respectvely, x k, s the th feature of the kth nstance n the jth dataset. The numerator ndcates the dscrmnaton between each dataset, and denomnator ndcates the one wthn each of dataset. The larger the F-score s, the more ths feature s dscrmnatve. In ths study, we utlze F-score to calculate the score of each attrbute n order to get the weghts of the accordng to F(FS()). Eq. (9) s responsble for calculatng the scores of the feature masks. If the th feature s selected ( represents that feature s selected and 0 represents that feature s not selected), FS() equals the nstance of feature, otherwse FS() equals 0. nstance, f s selected FS() =, (9) 0, f s not selected 3.3 Objectve functon defnton We desgn an objectve functon whch combnes classfcaton accuracy rate and F-score. Objectve functon s the evaluaton crtera for the selected. To get accuracy rate, we need to tran and test the dataset accordng to the selected. Nb F( FS( )) j= ftness = θa accuracy + θb. Nb Fk () k=, (0) In Eq. (0), θ a s the weght for SVM classfcaton accuracy rate, accuracy the classfcaton accuracy rate for the selected, θ b the weght for the score of selected, F(FS()) the functon for calculatng the score of the current, and the total score of the selected and all respectvely are N b F( k) and k = N b j= F( FS( )) 3.4 Mult-swarm schedulng module MSPSO s proposed, whch holds a number of swarms scheduled by the mult-swarm schedulng module. Each swarm controls ts teraton procedure, poston updates, velocty updates, and other parameters respectvely. Each swarm selects dfferent occasons from current computng envronment, then, sends the current results to the mult-swarm schedulng module to decde whether t affects other swarms. The schedulng module montors all the sub-swarms, and gathers the results from the sub-swarms. Fg. shows the structure of mult-swarm schedulng model, whch conssts of a mult-swarm scheduler and some sub-swarms. Each sub-swarm contans a number of partcles. The mult-swarm scheduler can send commands or data to sub-swarms, and vce versa. () The swarm request rule If the current sub-swarm meets the condton accordng to Eq. (), t sends the results whch correspond pbest and gbest values to the mult-swarm scheduler. If S =, the current swarm sends records whch contan the pbest and gbest values, otherwse the current swarm does not send the results. tt t, f d < rand() Ftness tt S =, () tt t 0, f d rand() Ftness tt In Eq. (), d represents a threshold, tt the maxmal teraton number, t the current teraton number. rand( ) s a random number unformly dstrbuted n U (0, ). (2) The mult-swarm scheduler request rule The mult-swarm scheduler montors each subswarm, and sends a request n order to obtan a result form current sub-swarm when the current sub-swarm s valuable. If sub-swarm has sent the swarm request rules more than k n tmes, where k = 3, n =, 2, 3,...,00, 批注 [U]: 仍然建议在第一次出现时进行简单的说明

Lu et al.: An Improved Partcle Swarm Optmzaton for Feature Selecton 5 Fg. The structure of mult-swarm schedulng. the mult-swarm scheduler wll send the rule. The mult-swarm scheduler request rule s touched off accordng to evaluatng the actvty level of the current sub-swarm. The more actve the sub-swarm s, the more valuable t s, snce the best result may be n t. (3) The mult-swarm collecton rule The mult-swarm scheduler collects results from the alve sub-swarm and updates pbest and gbest from storage table. (4) The mult-swarm destroyng rule a. If the swarm sends the swarm request rule k tmes and k < f accordng to Eq. (2), then the mult-swarm scheduler destroys the current sub-swarm. b. If the swarm does not change the gbest n pn teratons, then the mult-swarm scheduler destroys the current sub-swarm. We set pn n the ntalzaton of PSO. f = n l= te() l m. pl (2) In Eq. (2), te( ) s the functon for calculatng how many tmes the sub-swarm sends swarm request rule, m a threshold, pl the alve sub-swarm sze. 3.5 MSPSO algorthm Step : Load the dataset from the text fle and convert the dataset from stream format to object format. Store the formatted memory data to temporary table for the ntalzaton of PSO. Intalze the sze of swarms randomly, and assgn dfferent memory to each swarm. Intalze all partcle postons x j and veloctes v j of each swarm wth random values, then calculate objectve functon. Update pbest (local best) and gbest (global best) of each swarm from the table. Go to Step 2. Step 2: Specfy the parameters of each swarm ncludng the lower and upper bounds of the velocty, the sze of partcles, the number of teratons, c (the cognton learnng factor), c 2 (socal learnng factor), d (n Eq. ()), m(n the mult-swarm destroyng rule) and pn(n Eq.(2)). Set teraton number = 0, current partcle number =, tt = sze of partcles, and t = current partcle number. Go to Step 3. Step 3: In each swarm, f current teraton number < teraton number or gbest keeps no changes less than 45 teratons, go to Step 4, otherwse destroy the swarm, and go to Step 0. The man schedulng module updates the pbest, and compares the gbest of current swarm wth the prevous one n the module, then judge whether to

6 Journal of Bonc Engneerng (20) Vol.8 No.2 update gbest usng mult-swarm scheduler request rule 400 and 50 respectvely. The searchng ranges for c and γ or not. If gbest or pbest s changed, execute mult-swarm are as follow: c [2 5, 2 5 ], λ [2 5, 2 5 ], [ v max, v max ] collecton rule. s predefned as [ 000, 000] for parameter c, as Step 4: In each swarm, f current partcle number < [ 000, 000] for parameter γ, and as [ 6, 6] for feature partcle sze, go to Step 5, otherwse, go to Step 9. mask. For objectve functon, we set w a and w b to 0.8 and Step 5: In each swarm, get gbest and pbest from the 0.2 accordng to our experence. The followng datasets table and each partcle updates ts poston and velocty. taken from the UCI machne learnng and StatLog databases are used to evaluate the performance of the Go to Step 6. Step 6: Restrct poston and velocty of each ndvdual. Go to Step 7. heart, breast cancer, heart dsease, vehcle slhouettes, proposed IFS approach: Australan, German, Cleveland Step 7: Each partcle calculates ts ftness and updates pbest and gbest. Execute swarm request rule, and agnostc Breast Cancer (WDBC). hll-valley, landsat satellte, sonar, and Wsconsn D- go to Step 8. If the current swarm needs to be destroyed The 0-fold cross valdaton was used to evaluate accordng to mult-swarm destroyng rule, dspose the the classfcaton accuracy. Then the average error across current swarm, and ext. all 0 trals was computed. Because hll-valley and Step 8: current partcle number = current partcle landsat satellte datasets have pre-defned tranng/test number +. Go to Step 4. splts. Thus, except these datasets, all of the expermental results are averaged over the 0 runs of 0-fold Step 9: current teraton number = current teraton number +. Go to Step 3. Cross-Valdaton (CV). Step 0: Execute mult-swarm collecton rule, and ext. Table Dataset descrpton 3.6 Convergence and complexty analyss Convergence analyss and stablty studes have been reported by Clerc and Kennedy [4], Trelea [5], Kadrkamanathan et al. [6], and Jang et al. [7]. The above studes proved condtons whch could lead PSO to converge n lmted teratons. In order to guarantee the convergence of the proposed method, we set the parameters of PSO as ω = 0.9, c = 2, c 2 = 2 (accordng to Refs. [8] and [9]). The tme complexty of the proposed method s O(M N K), where M, N, K are the number of teratons, the number of sub-swarms, the number of partcles respectvely. In the worst case, f the number of subswarms remans unchanged and the number of teraton reaches the maxmum teraton number, the tme complexty s O(M N K). In general, the number of sub-swarms s reduced after some teratons, and thus the tme complexty s M O( L K), where L N. = 4 Experments and results 4. Expermental settng The numbers of teratons and partcles are set to No. Dataset Classes Instances Features Mssng value Australan (Statlog project) 2 690 4 Yes 2 German (Statlog project) 2 000 24 No 3 Cleveland heart 2 303 3 Yes 4 Breast cancer (Wsconsn) 2 699 9 Yes 5 Heart dsease (Statlog project) 2 270 3 No 6 Vehcle slhouettes (Vehcle) 4 846 7 No 7 Hll-valley 2 22 00 No 8 Landsat satellte ( Landsat ) 6 6435 36 No 9 Sonar 2 208 60 No 0 WDBC 2 569 30 No 4.2 Results Table 2 shows the classfcaton accuracy rates of IFS wth and wthout feature selecton. As shown n Table 2, the IFS wth feature selecton performs sgnfcantly better than IFS wthout feature selecton n almost all cases examned at the sgnfcance level of 0.05, except the Australan dataset. The average classfcaton accuracy rate for each dataset mproved sgnfcantly after feature selecton. The results show that the classfcaton accuracy rates of the IFS approach wth and wthout feature selecton were better than those of grd search n all cases

Lu et al.: An Improved Partcle Swarm Optmzaton for Feature Selecton 7 as shown n Table 3. Grd search s a local search method whch s vulnerable to local optmum. Grd search can supply local optmal parameters to SVM, but the search regon s small, and t can not lead SVM to hgher classfcaton accuracy rate. The emprcal analyss ndcates that the developed IFS approach can obtan the optmal parameter values, and fnd a subset of dscrmnatve wthout decreasng the SVM classfcaton accuracy. Table 2 Results of the proposed IFS wth and wthout feature selecton Dataset Number of orgnal Wth feature selecton Number of selected rate (%) Wthout feature selecton rate (%) Par t test P-value Australan 4 8.4 ± 2.38 90.9 86.4 0.06 German 23 2.7 ±.025 80.2 75.9 < 0.00 Cleveland heart 3 6. ±.03 9. 85.7 < 0.00 Breast cancer 9 4.9 ± 0.734 99. 96.9 < 0.00 Heart dsease 3 7.8 ± 0.949 9.5 84.4 < 0.00 Vehcle 7 7. ± 0.432 89.6 85.8 < 0.00 Hll-valley 00 40. ±.264 74. 7.2 < 0.00 Landsat 36 3 ± 0.668 95.4 9.9 < 0.00 Sonar 60 25. ± 0.977 93.7 90. < 0.00 WDBC 30 3 ±.33 99.4 97.8 0.0 Table 3 Expermental results summary of IFS wth feature selecton, IFS wthout feature selecton and grd search algorthm Dataset () IFS wth feature selecton (2) IFS wthout feature selecton (3) Grd search Par t test ()vs(3) Par t test (2)vs(3) Australan 90.9 86.4 84.7 < 0.00 < 0.00 German 80.2 75.9 75.7 < 0.00 < 0.00 Cleveland heart 9. 85.7 82.3 < 0.00 < 0.00 Breast cancer 99. 96.9 95.2 < 0.00 < 0.00 Heart dsease 9.5 84.4 83.6 < 0.00 < 0.00 Vehcle 89.7 85.8 84.2 < 0.00 0.2 Hll-valley 74. 7.2 69.8 0.0 < 0.00 Landsat 95.4 9.9 9. < 0.00 0.02 Sonar 93.7 90. 88.9 0.028 < 0.00 WDBC 99.4 97.8 97.4 < 0.00 0.53 The comparson between IFS and GA + SVM by usng feature selecton s shown n Table 4. The detal parameter settngs for GA+SVM were as follows: populaton sze = 500, crossover rate = 0.7, mutaton rate = 0.02. The classfcaton accuracy rates of IFS wth feature selecton were hgher than GA + SVM for all datasets, whereas the classfcaton accuracy rates of GA + SVM were hgher than IFS wthout feature selecton as shown n Table 4. Therefore, t s mportant to elmnate nosy, rrelevant for ncreasng the classfcaton accuracy rates. Table 4 Comparson between the IFS and GA + SVM approach Dataset Number of orgnal Australan 4 German 23 Cleveland heart 3 Breast cancer 9 Heart dsease 3 Vehcle 7 Hll-valle y 00 Landsat 36 Sonar 60 WDBC 30 Number of selected 8.4 ± 2.38 2.7 ±.025 6. ±.03 4.9 ± 0.734 7.8 ± 0.949 7. ± 0.432 40. ±.264 3 ± 0.668 25. ± 0.977 3 ±.33 IFS rate(%) 90.9 80.2 9. 99. 9.5 89.6 74. 95.4 93.7 99.4 Number of selected 7.9 ± 0.432 0. ± 0.986 6.9 ± 2.0 5.5 ± 0.988 8. ± 0.445.5 ± 0.664 55.9 ±.98 8.3 ±.498 3.0 ±.22 7.3 ± 0.99 GA + SVM rate(%) 88. 77.4 86.8 98.2 86.7 88. 73.5 93.4 9.6 98.9 Fg. 2a and Fg. 2b show the global best classfcaton accuraces wth dfferent teratons on Australan and German datasets usng IFS, PSO+SVM, GA+SVM respectvely. Fg. 2e and Fg. 2f show the local best classfcaton accuraces wth dfferent teratons on Australan and German datasets usng IFS, PSO+SVM and GA+SVM respectvely. The convergence speeds of PSO+SVM and GA +SVM were faster than IFS, whereas the resultant classfcaton accuraces of PSO+SVM and GA+SVM were lower than IFS. Moreover, PSO+SVM and GA+SVM prematurely converged to local optmum, and thus t convnces that IFS has more exploraton capablty. The numbers of selected wth evoluton on German and Australan datasets usng three methods are shown n Fg. 3 and Fg. 4 respectvely. Fg. 2c and Fg. 2d show the number

8 Journal of Bonc Engneerng (20) Vol.8 No.2 of sub-swarms wth dfferent teratons on Australan feature selecton n terms of number of selected and German datasets usng IFS. Wth dfferent numbers and average classfcaton accuracy rates s shown n of ntal sub-swarms, a great number of sub-swarms Table 5. For comparson purpose, we mplemented the were reduced, and only a small number of sub-swarms PSO+SVM approach usng the standard PSO algorthm, were remaned at the fnal teraton. Most of the week and the parameter settngs were descrbed as follows: sub-swarms are elmnated durng the evoluton, and teraton sze was set as 500, number of partcles as 00. thus t can be seen that excellent sub-swarms are preserved after competton, as enhance the exploraton jectve functon. The analytcal results reveal that IFS The classfcaton accuracy rate was adopted as the ob- ablty of the whole swarm to obtan more mportant wth feature selecton performs sgnfcantly superor to. the standard PSO wth feature selecton n all datasets n The comparson between IFS and PSO+SVM usng terms of the classfcaton accuracy rates. Fg. 2 Predcton accuraces and number of sub-swarm wth dfferent teratons. (a) Global best accuraces wth dfferent teratons on Australan dataset usng IFS, PSO+SVM and GA+SVM. (b) Global best accuraces wth dfferent teratons on German dataset usng IFS, PSO+SVM and GA+SVM. (c) Each curve correspondng to a number of ntal sub-swarms on Australan dataset usng IFS. (d) Each curve correspondng to a number of ntal sub-swarms on German dataset usng IFS. (e) Local best accuraces wth dfferent teratons on Australan dataset usng IFS, PSO+SVM and GA+SVM. (f) Local best accuraces wth dfferent teratons on German dataset usng IFS, PSO+SVM and GA+SVM. Fg. 3 Number of selected wth dfferent teratons on Australan dataset usng IFS, PSO+SVM and GA+SVM. Fg. 4 Number of selected wth dfferent teratons on German dataset usng IFS, PSO+SVM and GA+SVM.

Lu et al.: An Improved Partcle Swarm Optmzaton for Feature Selecton 9 Table 5 Comparson between the IFS and standard PSO Dataset Number of orgnal Number of selected IFS rate (%) Number of selected PSO+SVM rate (%) Australan 4 8.4 ± 2.38 90.9 7. ± 0.798 89.9 German 23 2.7 ±.025 80.2 9.4 ±.233 76.8 Cleveland heart 3 6. ±.03 9. 6.4 ± 0.558 87.4 Breast cancer 9 4.9 ± 0.734 99. 5.8 ± 0.447 97.6 Heart dsease 3 7.8 ± 0.949 9.5 6.2 ± 0.976 85.3 Vehcle 7 7. ± 0.432 89.66 0.2 ±.298 86.2 Hll-Valley 00 40. ±.264 74.2 6.3 ± 2.0 72.3 Landsat 36 3 ± 0.668 95.44 5. ± 0.975 93.4 Sonar 60 25. ± 0.977 93.7 35.2 ±.23 90.8 WDBC 30 3 ±.33 99.4 6.9 ±.652 98.2 5 Concluson In ths study, a novel mult-swarm MSPSO algorthm s proposed to solve dscrete problem, an effcent objectve functon of whch s desgned by takng nto consderaton classfcaton accuracy rate and F-score. In order to descrbe the competton among the swarms, we ntroduced the mechansm for survval of the fttest. To further settle the feature selecton problem, we put forward the IFS approach, n whch both the SVM parameter optmzaton and the feature selecton are dynamcally executed by MSPSO algorthm, then, SVM model performs the classfcaton tasks usng the optmal parameter values and the subset of. The evaluaton on the 0 benchmark problems by comparng wth the standard PSO based, genetc algorthm based, and grd search based methods ndcates that the proposed approach performs sgnfcantly advantageously over others n terms of the classfcaton accuracy rates. Acknowledgments Ths work was supported by Natonal Natural Scence Foundaton of Chna (Grant no. 6097089), Natonal Electronc Development Foundaton of Chna (Grant no. 2009537), Jln Provnce Scence and Technology Department Project of Chna (Grant no. 20090502). References [] Guyon I, Elsseeff A. An ntroducton to varable and feature selecton. Journal of Machne Learnng Research, 2003, 3, 57 82. [2] Kennedy J, Eberhart R. Partcle swarm optmzaton. Proceedngs of the IEEE Internatonal Conference on Neural Network, Perth, Australa, 995, 942 948. [3] Eberhart R, Kennedy J. A new optmzer usng partcle swarm theory. Proceedngs of the Sxth Internatonal Symposum on Mcro Machne and Human Scence, Nagoya, Japan, 995, 39 43. [4] Ln S W, Yng K C, Chen S C, Lee Z J. Partcle swarm optmzaton for parameter determnaton and feature selecton of support vector machnes. Expert Systems wth Applcatons, 2008, 35, 87 824. [5] Huang C L, Dun J F. A dstrbuted PSO-SVM hybrd system wth feature selecton and parameter optmzaton. Appled Soft Computng, 2008, 8, 38 39. [6] Blackwell T. Partcle swarms and populaton dversty. Soft Computng, 2005, 9, 793 802. [7] Blackwell T, Branke J. Multswarms, excluson, and ant-convergence n dynamc envronments. IEEE Transactons on Evolutonary Computaton, 2006, 0, 459 472. [8] Parrott D, L X D. Locatng and trackng multple dynamc optma by a partcle swarm model usng specaton. IEEE Transactons on Evolutonary Computaton, 2006, 0, 440 458. [9] Nu B, Zhu Y L, He X X, Wu H. MCPSO: A mult-swarm cooperatve partcle swarm optmzer. Appled Mathematcs and Computaton, 2007, 85, 050 062. [0] Chen Y W, Ln C J. Combnaton of feature selecton approaches wth SVM n credt scorng. Expert Systems wth Applcatons, 2006, 37, 35 324. [] Kennedy J, Eberhart R. A dscrete bnary verson of the partcle swarm algorthm. Proceedngs of the IEEE Internatonal Conference on Systems, Man and Cybernetcs, Orlando, USA, 997, 404 408. [2] Vapnk V N. The Nature of Statstcal Learnng Theory, 2nd ed, Sprnger, New York, 999. [3] Boser B E, Guyon I M, Vapnk V N. A tranng algorthm for optmal margn classfers. Proceedngs of the ffth Annual Workshop on Computatonal Learnng Theory, Pttsburgh, USA, 992, 44 52. [4] Clerc M, Kennedy J. The partcle swarm-exploson, stablty, and convergence n a multdmensonal complex space. IEEE Transactons on Evolutonary Computaton, 2002, 6, 58 73. [5] Trelea I C. The partcle swarm optmzaton algorthm:

0 Journal of Bonc Engneerng (20) Vol.8 No.2 convergence analyss and parameter selecton. Informaton Processng Letters, 2003, 85, 37 325. [6] Kadrkamanathan V, Selvarajah K, Flemng P J. Stablty analyss of the partcle dynamcs n partcle swarm optmzer. IEEE Transactons on Evolutonary Computaton, 2006, 0, 245 255. [7] Jang M, Luo Y P, Yang S Y. Stochastc convergence analyss and parameter selecton of the standard partcle swarm optmzaton algorthm. Informaton Processng Letters, 2007, 02, 8 6. [8] Sh Y, Eberhart R. Modfed partcle swarm optmzer. Proceedngs of IEEE Internatonal Conference on Evolutonary Computaton, Anchorage, USA, 998, 69 73. [9] Zhan Z H, Zhang J, L Y. Adaptve Partcle Swarm Optmzaton. IEEE Transactons on Systems Man and Cybernetcs Part B-Cybernetcs, 2009, 39, 362 38. [20] Xe J Y, Wang C X. Usng support vector machnes wth a novel hybrd feature selecton method for dagnoss of erythemato-squamous dseases. Expert Systems wth Applcatons, 20, 38, 5809 585.