Image Feature Selection Based on Ant Colony Optimization

Image Feature Selecton Based on Ant Colony Optmzaton Lng Chen,2, Bolun Chen, Yxn Chen 3, Department of Computer Scence, Yangzhou Unversty,Yangzhou, Chna 2 State Key Lab of Novel Software Tech, Nanng Unversty, Nanng, Chna 3 Department of Computer Scence, Washngton Unversty n St Lous, USA lchen@yzu.edu.cn,chenbolun986@gmal.com,chen@cse.wustl.edu Abstract. Image feature selecton (FS) s an mportant tas whch can affect the performance of mage classfcaton and recognton. In ths paper, we present a feature selecton algorthm based on ant colony optmzaton (ACO). For n features, most ACO-based feature selecton methods use a complete graph wth O(n 2 ) edges. However, the artfcal ants n the proposed algorthm traverse on a dgraph wth only 2n arcs. The algorthm adopts classfer performance and the number of the selected features as heurstc nformaton, and selects the optmal feature subset n terms of feature set sze and classfcaton performance. Expermental results on varous mages show that our algorthm can obtan better classfcaton accuracy wth a smaller feature set comparng to other algorthms. Keywords: ant colony optmzaton; dmensonalty reducton; feature selecton; mage classfcaton Introducton Reducton of pattern dmensonalty va feature extracton s one of the most mportant tass for pattern recognton and classfcaton. Feature selecton has consderable mportance n areas such as bonformatcs[], sgnal processng[2], mage processng[3], text categorzaton [4], data mnng[5], pattern recognton [6], medcal dagnoss [7], remote sensor mage recognton [8]. The goal of feature selecton s to choose a subset of avalable features by elmnatng unnecessary features. To extract as much nformaton as possble from a gven mage set whle usng the smallest number of features, we should elmnate the features wth lttle or no predctve nformaton, and gnore the redundant features that are strongly correlated. Many feature selecton algorthms nvolve heurstc or random search strateges n order to reduce the computng tme. For a large number of features, heurstc search s often used to fnd the best subset of features. However, accuracy of the classfcaton results by the fnal feature subset s often decreased [6]. More recently, nature nspred algorthms are used for feature selecton. Populaton-based optmzaton algorthms for feature selecton such as genetc algorthm (GA) [3,9], ant colony optmzaton (ACO) [,4], partcle swarm optmzaton (PSO) [0] have been proposed. These

methods are stochastc optmzaton technques attemptng to acheve better solutons by referencng the feedbac and heurstc nformaton. Ant colony optmzaton (ACO) s an evoluton smulaton algorthm proposed by M. Dorgo et al. []. It has been successfully used for system fault detectng, obshop schedulng, networ load balancng, graph colorng, robotcs and other combnatonal optmzaton problems. In ths paper, we present an ACO-based feature selecton algorthm, ACOFS, to reduce the memory requrement and computaton tme. In ths algorthm, the artfcal ants traverse on a dgraph wth only 2n arcs. The algorthm adopts classfer performance and the number of the selected features as heurstc nformaton, and selects the optmal feature subset n terms of the feature set sze and classfer performance. Expermental results on mage data sets show that the proposed algorthm has superor performance. Comparng wth other exstng algorthms, our algorthm can obtan better classfcaton accuracy wth a smaller feature set from mages. 2 The ACO Algorthm for Image Feature Selecton Gven a feature set of sze n, the feature selecton problem s to fnd a mnmal feature subset of sze s (s< n) whle mantanng a farly hgh classfcaton accuracy n representng the orgnal features. Most of the ACO based algorthms for feature selectng uses a complete graph, on whch the ants try to construct a path wth part of the nodes. Snce the a soluton of feature selecton s a subset of those selected features, there s not any orderng among the components of the soluton. It s not necessary to use a path n a complete graph to represent such soluton. In the ACO on such complete graph, ant on one node (feature) selects an edge connectng another node (feature) based on the pheromone and heurstc nformaton assgned on ths edge between the two nodes (features). But n the feature selecton problem, one feature to be selected s ndependent wth the last feature added to the partal soluton. Therefore, t unnecessary to use a complete graph wth O(n 2 ) edges n the ACO algorthm. To effcently apply an ACO algorthm for feature selecton, we must redefne the way that the representaton graph s used. We proposed ant optmzaton algorthm on a dscrete search space represented by a dgraph wth only O(n) arcs as shown n Fgure, where the nodes represent features, and the arcs connectng two adacent nodes ndcatng the choce of the next feature. Fg.. The dgraph. Denote the n features as f,f 2,,f n, the th node v s used to represent feature f. An addtonal node v 0 s placed at the begnnng of the graph where each ant starts ts

search. As shown n Fgure, the ants travel on the dgraph from v 0 to v, and then to v 2, and so on. The ant termnates ts tour and outputs ths feature subset as t reaches the last node v n. When an ant completes the search from v 0 to v n, the arcs on ts trace form a soluton. 0 There are two arcs named C and C lnng two adacent nodes v - and v. If an artfcal ant at v selects arc C (or C ), the th feature s selected (or not 0 selected). On each arc C, vrtual pheromone value s assgned as the feedbac nformaton to drect the ants searchng on the graph. We ntalze pheromone matrx as = for all =,2,,n and =0,. The search for the optmal feature subset s the procedure of the ants traverse through the graph. Suppose an ant s currently at node v - and has to choose one path connectng v to pass through. A probablstc functon of transton, denotng the probablty of an ant at node v - to choose the path c to reach v s desgned by combnng the heurstc desrablty and pheromone densty of the arc. The probablty of an ant at node v - to choose the arc c at tme t s: p ( t) 0 0 [ ( )] ( ) [ ( )] ( t t ) [ ( t)] ( ) (=,2,,n; =0,) () Here, (t) s the pheromone on the arc whch reflects the potental tend for ants to follow arc c between nodes v - and v at tme t, c (=0,). s the heurstc nformaton reflectng the desrablty of choosng arc c. α and β are two parameters that determne the relatve mportance of the pheromone and the heurstc nformaton. From () we can see that the transton probablty used by ACO depends on the pheromone ntensty (t) and heurstc nformaton.to effectvely balance the nfluences of postve feedbac nformaton from prevous hgh-qualty solutons and the desrablty of the arc, we should chose proper values of the parameters α and β. When α= 0, no postve feedbac nformaton s used. Snce the prevous search experence s lost, the search degrades to a stochastc greedy search. When β=0, the potental beneft of arcs s neglected, and t becomes a entrely random search. The heurstc nformaton s the desrablty of choosng the arc c between nodes v- and v, whch means the preference of ant to choose the feature f.there are many ways to defne a sutable value of. It could be any evaluaton functon on the dscrmnaton ablty of a feature f,such as rough set dependency measure, or entropy-based measure. We set the value of usng F-score, whch s a easy measurement to evaluate the dscrmnaton ablty of feature f, defned as follows:

m m N x N x x x 2 (,..., n) (2) Here, m s the number of classes of the mage set; n s the number of features; N s the number of samples of the feature f n class, ( =,2,,m, =,2,,n), x s the th tranng sample for the x s the mean value of the feature f of all mages, feature f of the mages n class, (=,2,, N ), x s the mean of the feature f of the mages n class. In (2), the numerator ndcates the dscrmnaton between the classes of the mage set, and the denomnator specfes the dscrmnaton wthn each class. A larger value mples that the feature f has a greater dscrmnatve ablty. 0 For the value of, we smply set n, where ξ (0,) s a constant. 0 n 3 Implementaton of the Algorthm In an ACO based optmzaton method, the desgn of the pheromone update strategy, and the measurement of the qualty of the solutons are crtcal. 3. Pheromone updatng In each teraton, the algorthm ACOFS updates the pheromone value on each arc accordng to the pheromone and heurstc nformaton on the arc. Obvously, f an ant chooses the arc c, pheromone on ths arc should be assgned more ncrement, and ants should select arc c wth hgher probablty n the next teraton. Ths forms a postve feedbac of the pheromone system. In each teraton, the pheromone on each arc s updated accordng to formulas (3),(4) and (5). where ( t ) ( t) ( t) Q ( t) (3)

t ( ) f ( s S ) (4) s S and Q c sbest Q ( t) (5) 0 otherwse In (4), s s the set of solutons generated at the t-th teraton passng through c. In (5), S best s the best soluton found so far, and Q s a postve constant. To emphasze the nfluence of the best-so-far soluton, we add an extra pheromone ncrement on the arcs ncluded n S best. 3.2 The ftness functon Based on the ant s soluton, whch s a selected feature subset, the soluton qualty n terms of classfcaton accuracy s evaluated by classfyng the tranng data sets usng the selected features. The test accuracy measures the number of examples that are correctly classfed. In addton, the number of features n the set s also consdered n the qualty functon. The subset wth less features could get hgher qualty functon value. The qualty functon f(s) of a soluton s s defned as follows: f N N corr ( s) (6) feat where N corr the number of examples that are correctly classfed, N feat s the number of features selected n s, λ s a constant to adust the mportance of the accuracy and the number of features selected. The scheme obtanng hgher accuracy and wth less features wll get greater qualty functon value. 4 Expermental Results To test the effectveness and performance of our proposed feature selecton algorthm ACOFS, we test t by a seres of experments. All experments have been run on Pentum IV, Wndows XP, P.7G, usng VC++ 6. 0, and the results are vsualzed on Matlab 6.0. A set of mages was tested to demonstrate the classfcatory accuracy and determne whether the proposed algorthm can correctly select the relevant features. The data set contans 80 mages n 4 classes. The data set has 9 features ncludng frst and second order orgn moment, frst and second order central moment, twst

degree, pea values, entropy of the moments, and the statstcal of the gray dfferental statstcs, such as contrast, angle second-order moment (ASM), mean value, entropy etc. On the mage set, the ACFS algorthm s appled to select the relevant features and s compared to GA-based approach GAs [2] and the modfed ACO algorthm for feature selecton presented n [26] whch s denoted as maco. For GA-based feature selector GAs, we set the length of chromosomes as the number of features. In a chromosome, each gene g corresponds to the th feature. If g =, ths means we select the th feature. Otherwse, g = 0, whch means the th feature s gnored. By teratons of producng chromosomes for the new generaton, crossover and mutaton, the algorthm tres to fnd a chromosome wth the smallest number of s and the classfer accuracy s maxmzed. In order to select the ndvduals for the next generaton, GA s roulette wheel selecton method was used. We set the parameters of GAs as: probabltes of crossover and mutaton are P croos =0.9 and P mutaton =0.25, the populaton sze s m=50, and the maxmum teratons = 50. For ACO-based algorthms ACOFS and maco, we have appled them wth the same populaton sze as GA based algorthm GAs. Varous parameters leadng to a better convergence are tested and the best parameters that are obtaned by smulatons are as follows: α=, β=0.5, evaporaton rate ρ =0.95, the ntal pheromone ntensty of each arc s equal to, the number of ant n each teraton m = 50 and the maxmum teratons = 50. These values are chosen to ustfy the comparson wth GAs. For each subset of the features obtaned, ts qualty s measured by classfyng the tranng mage sets usng SVM classfer. The number of the selected features and the qualty of the classfcaton results are consdered for performance evaluaton. To evaluate the average classfcaton accuracy of the selected feature subsets, 0- fold and 5-fold cross valdaton (CV) s used. For the three algorthms, the CV accuracy on the tranng and testng data of the best-so-far soluton at each teraton are computed and recorded. Table shows the number of features selected n the best soluton obtaned by the three algorthms. From the table we can see that ACOFS selects the smallest number of features whle mantan the hgh accuracy of classfcaton. Table. Number of features selected by the three algorthms Algorthm ACOFS GAs maco 5-fold CV 7 8 0 0-fold CV 9 0 2 We measure the qualty of the classfcaton results n two crterons, namely, recall and precson of each class. The average recall and the precson of the classfcaton of the th class are defned as follows: NTP() recall() N () c NTP() precson() N ( ) N ( ) Here, Nc() s the number of mages n the th class, N TP () s the number of mages correctly classfed nto the th class, N FP () s the number of mages ncorrectly classfed nto the th class. FP TP (7)

To obtan a more relable result, 0 runs were conducted by 0-fold and 5-fold cross-valdaton wth the mage set. Tables II to V present the average recall on the tranng and testng mage sets by 5-fold and 0-fols CV tests. Table 2. Recall of the results n 5-fold CV on tranng sets 00% 87.5% 93.75% 2 00% 00% 00% 3 00% 87.5% 8.25% 4 87.5% 8.25% 87.5% Average 96.88% 89.06% 90.63% Table 3. Recall of the results n 5-fold CV on testng sets 75% 75% 75% 2 00% 00% 00% 3 00% 00% 00% 4 00% 75% 75% Average 93.75% 87.5%s 87.5% Table 4. Recall of the results n 0-fold CV on tranng sets 00% 00% 94.44% 2 00% 00% 00% 3 94.44% 94.44% 00% 4 88.89% 88.89% 83.33% Average 95.83% 95.83% 94.44% Table 5. Recall of the results n 0-fold CV on testng sets 00% 00% 94.44% 2 00% 00% 00% 3 00% 50% 00% 4 00% 00% 00% Average 00% 87.5% 98.6% Comparng the crteron of recall and the number of features selected, we can see from the tables that proposed ACOFS algorthm outperforms the maco, GAs algorthms. The number of features selected by the ACOFS algorthm s 7 and 9 n the 5-fold and 0-fold CV test respectvely, whle 8 and 0 by the GAs, 0 and 2 by maco. Furthermore, whle usng less features, the ACOFS algorthm gets hgher recall than the maco, GAs algorthms. For nstance, n the 0-fold CV test on the testng data, the average recall of ACOFS s 00%, whle that s 87.5% for GAs and

98.6% for maco. Ths means the recall of the results by ACOFS are always better than those of maco, GAs algorthms. Tables VI to IX lst the average precson of the tranng and testng mage sets by 5-fold and 0-fols CV tests. Table 6. Precson of the results n 5-fold CV on tranng sets 88.89% 82.35% 78.95% 2 00% 94.2% 88.89% 3 00% 82.35% 00% 4 00% 00% 00% Average 97.22% 89.7% 9.96% Table 7. Precson of the results n 5-fold CV on testng sets 00% 75% 75% 2 00% 00% 00% 3 80% 80% 00% 4 00% 00% 75% Average 95% 88.75% 87.5% Table 8. Precson of the results n 0-fold CV on tranng sets 85.7% 85.7% 85% 2 00% 00% 00% 3 00% 00% 00% 4 00% 00% 93.75% Average 96.43% 96.43% 94.69% Table 9. Precson of the results n 0-fold CV on testng sets 00% 66.67% 00% 2 00% 00% 00% 3 00% 00% 00% 4 00% 00% 94.74% Average 00% 9.66% 98.69% We can see from the tables that the proposed ACOFS algorthm has better precson than the maco, GAs algorthms. Even usng fewer features, the ACOFS algorthm stll can obtan hgher precson than the maco, GAs algorthms. For nstance, n the 5-fold CV test on the testng sets, the average precson of ACOFS s 95%, whle that s 88.75% for GAs and 87.5% for maco. Ths means the precson of the results by ACOFS are always better than those of algorthms maco and GAs.

Percentage Percentage Comparson of the average recall and precson of the three algorthms are dsplayed n Fgure 2 and 3 respectvely. We can conclude from the fgures and tables that the proposed ACOFS algorthm can successfully select subset of features whch can obtan hgh classfcaton accuracy. Compared wth algorthms GAs and maco n the tests usng the same mage set, ACOFS can obtan better classfcaton accuracy but had a smaller feature set. ACOFS maco GAs 00 98 96 94 92 90 88 86 84 82 80 Tranng accuracy Classfcaton accuracy Fg. 2. Comparson of the average recall and precson of the three algorthms n 5-fold CV 00 98 ACOFS maco GAs 96 94 92 90 88 86 Tranng accuracy Classfcaton accuracy Fg. 3. Comparson of the average recall and precson of three algorthms n 0-fold CV 5 Conclusons We proposed an ACO-based feature selectng algorthm ACOFS. The algorthm adopts classfer performance and the number of the selected features as heurstc

nformaton, and selects the optmal feature subset n terms of smallest feature number and the best performance of classfer. The expermental results on mage data sets show that the algorthm ACOFS can obtan better classfcaton accuracy but had a smaller feature set than other smlar methods. Acnowledgments. Ths research was supported n part by the Chnese Natonal Natural Scence Foundaton under grant No. 6070047,607033 and 6077303, Natural Scence Foundaton of Jangsu Provnce under contract BK20034, and Natural Scence Foundaton of Educaton Department of Jangsu Provnce under contract 08KJB52002 and 09KJB2003. References. Basr, M. E., Ghasem-Aghaee, N., & Aghdam, M. H. (2008). Usng ant colony optmzaton-based selected features for predctng post-synaptc actvty n protens, EvoBIO, LNCS 4973, Berln, Hedelberg, Italy: Sprnger-Verlag pp.2 23(2008) 2. Yun-Ch Yeh, Wen-June Wang, Che Wun Chou, Feature selecton algorthm for ECG sgnals usng Range-Overlaps Method, Expert Systems wth Applcatons, Volume 37, Issue 4, Aprl 200, Pages 3499-352,(200) 3. Janang Lu, Tanzhong Zhao, Yafe Zhang, Feature selecton based-on genetc algorthm for mage annotaton, Knowledge-Based Systems, Volume 2, Issue 8, December 2008, Pages 887-89,(2008) 4. Aghdam, M. H., Ghasem-Aghaee, N., & Basr, M. E. Applcaton of ant colony optmzaton for feature selecton n text categorzaton. In Proc. CEC 2008, Proceedng of the ffth IEEE congress on evolutonary computaton, IEEE Press, Hong Kong,(2008) 5. A decson rule-based method for feature selecton n predctve data mnng, Expert Systems wth Applcatons, Volume 37, Issue, January 200, Pages 602-609, Patrca E.N. Lutu, Andres P. Engelbrecht,(200) 6. Jensen, R. Combnng rough and fuzzy sets for feature selecton. PhD thess, Unversty of Ednburgh.(2005) 7. Kemal Polat, Salh Güneş, Medcal decson support system based on artfcal mmune recognton mmune system (AIRS), fuzzy weghted pre-processng and feature selecton, Expert Systems wth Applcatons, Volume 33, Issue 2, August 2007, Pages 484-490.(2007) 8. Satyanadh Gundmada, Vayan K. Asar, Neehara Gudur, Face recognton n mult-sensor mages based on a novel modular feature selecton technque, Informaton Fuson, Volume, Issue 2, Aprl 200, Pages 24-32,(200) 9. 9 L.S.Olvera, R. Sabourn, F.Bortolozz, C.Y. Suen, A methodology for feature selecton usng mult-obectve genetc algorthms for hand wrtten dgt strng recognton, Internatonal Journal of Pattern Recognton and Artfcal Intellgence 7(6)903 929,(2003) 0.Wang, X., Yang, J., Teng, X., Xa, W., & Jensen, R. Feature selecton based on rough sets and partcle swarm optmzaton. Pattern Recognton Letters, 28, 459 47.(2007).Dorgo.M,Brattar.M,Stützle.T,Ant Colony Optmzaton:Artfcal Ants as a Computatonal Intellgence Technque,IEEE Computatonal Intellgence Magazne,();28-29.(2006)