Investigating methods for improving Bagged k-nn classifiers

Ivestigatig methods for improvig Bagged k-nn classifiers Fuad M. Alkoot Telecommuicatio & Navigatio Istitute, P.A.A.E.T. P.O.Box 4575, Alsalmia, 22046 Kuwait Abstract- We experimet with baggig knn classifiers usig a optimal distace metric. The aim is to establish whether baggig knn is useful whe a better metric is used. We experimet o real world data sets, at differet traiig set sizes. Our experimets also ivolve Modified baggig, which was proposed by us, to see the effect of prior kowledge o the baggig performace uder the ew distace measure. Results idicate the optimal metric improves the performace of baggig as well as the sigle classifier. Key-Words: - baggig, fusio, classifier combiig, earest eighbor 1 Itroductio I order to improve the accuracy of classifiers researchers have foud that fusig (combiig) the decisios of more tha oe classifier would yield superior results over the best sigle classifier. The papers [1,2,12,13,14,15,17,20,21,22] are a sample showig the use of classifier fusio to improve classificatio accuracy. Baggig [6], proposed by Breima, is a method of geeratig multiple versios of a predictor or classifier, via bootstrappig ad the usig these to get a aggregated decisio. The multiple versios of classifiers are formed by makig bootstrap [11] replicas of the traiig set, ad these are the used to trai additioal classifiers. Breima postulates that the ecessary precoditio for baggig to improve accuracy beig the classifier istability.by istability we mea that a perturbatio of the learig set causes sigificat chages i the classifier output. Baggig has bee successfully applied to practical cases to improve the performace of ustable classifiers. A sample of such papers icludes [10, 8]. May authors have ivestigated its merits ad compared baggig to boostig or other methods [4,7,9,16,18]. I [6] Breima argued that baggig would ot improve a earest eighbor (NN) classifier, because it is stable. He cofirmed this experimetally by showig that, whe baggig NN classifiers, good results were achieved o 3 data sets out of 6. But, whe baggig decisio trees, better error rate o all 6 data sets was observed. Most research to date has bee directed towards applyig baggig to ustable classifiers, such as decisio trees ad eural etworks. Our previous work [3] o baggig k-nn uder small sample size showed that although regular baggig could ot outperform the sigle classifier, modified baggig which is implemeted with some modificatios to the baggig method, may outperform the sigle classifier. The improvemets were mostly at medium traiig set sizes, while at large traiig set sizes it had a similar performace to regular baggig ad the sigle classifier. Eve at the medium set size, for few data types, we did ot always achieve a sigificat improvemet. More ivestigatio is required to fid the reasos behid this variable performace. We also aim to further improve the performace of bagged k-nn classifiers, beyod what we achieved through modified baggig, ad fid the methods, coditios ad data properties that lead to the successful performace. There we used the Euclidea distace as the metric to fid the closest k samples. However, the limitatio of the Euclidea metric is that it does esure that the k earest eighbors are draw from a regio with a costat a posteriori class probability distributio. This may lead to suboptimal performaces. If a more robust distace metric is used the sigle classifier may outperform all baggig methods, or o the cotrary baggig may gai more from the optimal metric tha the sigle classifier. However, if we use a distace metric that improves the sigle classifier but is data depedet, we will be usig a method that creates diverse ad hece ISSN: 1790-5109 Page 351 ISBN: 978-960-6766-41-1

ustable classifiers. Ad sice baggig works best o ustable classifiers we may ed up with knn that ca be improved via baggig. We experimet with baggig usig the optimal distace metric proposed by Short ad Fukuaga [19]. I the ext sectio we will overview the ew metric. I sectio 3 we preset the differet baggig methods. Sectio 4 cotais the experimetal methodology. The results are preseted i sectio 5. The paper is brought to coclusio i sectio 6. 2 The optimal distace metric Short ad Fukuaga [19] showed that the variace of fiite sample size estimates may be miimized by a proper choice of the metric. The use of such a metric should lead to performace improvemet. The optimal distace metric is defied as follows. For a give test sample x 0 _ the distace to a poit X e is defied as ; T d x, X ) = V ( x X ) (1) ( 0 e x0 0 e Where V is the average gradiet directio of the a posteriori probability of class ω i at poit x 0. Its estimate V V x = x 0 m is i= 1 i 2 ( ) ( Mi( x0) M0( x0 )) 0 (2) where i is the umber of samples out of that belog to class i. m is the umber of classes. The local sample class i mea M i x ) ad local sample ( 0 mixture mea 0 ( x 0 ) are give as 1 M i ( x0 ) = ( X1 x0) (3) M i xj w i 0 ) = ( X j x0 ) j= 1 M 0 ( x (4) For k-nearest Neighbors we fid the k closest samples to x 0, the usig their labels we decide which class x 0 belogs to by the majority votig. The ecessary precoditio is that this distace measure is applied oly to the samples close to x 0 Hece, iitially the closest samples are foud usig the Euclidea distace, ad the from amog these samples the closest k are foud accordig to equatio 1. We automatically select to be 3k rouded to the closest iteger. 2 3 Baggig ad modified baggig procedures Whe a data set is small, the proportios of traiig patters from the differet classes may be urepresetative. The probability of drawig a traiig set with samples from some class completely missig becomes o egligible. Whe this occurs, baggig may eve become couterproductive. For this reaso we set out to ivestigate the effect of bootstrap cotrol as suggested i [3]. Three modificatios of the stadard baggig method are cosidered. We ame the stadard procedure as method 1 ad its modified versios as methods 2-4. The methods which exploit icreasig amouts of prior kowledge ca be summarized as follows. Method 1 Give a learig set, each bootstrap set is created by samplig from the learig set radomly with replacemet. The cardiality of each boot set is the same as the size of the traiig set. Method 2 Whe bootstrap sets are created from the learig set we check the ratio of the umber of samples per class i the bootstrap set. This ratio is compared to the ratio of samples per class i the learig set. If the differece betwee the compared ratios is larger tha a certai class populatio bias tolerace threshold we reject the bootstrap set. The bias tolerace threshold used is set at 10%. Method 3 This method is similar to method 2 except that the bootstrap set ratio is compared to the ratio i the full set. By full set we mea the set cotaiig all samples, learig ad test samples. This full set ratio simulates a prior kowledge of the class distributio i the sample space. Method 4 Here we oly require that all classes be represeted i the bootstrap set, without eforcig a certai ratio of samples per class. This is doe by rejectig ay bootstrap set that does ot represet all classes. ISSN: 1790-5109 Page 352 ISBN: 978-960-6766-41-1

4 Methodology I this project we experimet with baggig [6] k-nn classifiers. The experimets are simulated usig Matlab o several Petium based persoal computers. Table 1, Characteristics of used data sets Data ame No. of samples No. of features No of classes Wiscosi 699 9 2 Breast Cacer, BCW Ioosphere 351 34 2 Iris 150 4 3 Wie 178 13 3 Four real data sets from the UCI repository [5] are used. Traiig ad test sets are geerated from the origial data set. For each of the data sets a sigle traiig set is radomly take from the origial sample space, i.e. the full data set. The K-NN classifier built usig this origial learig set is referred to as the sigle classifier. The remaiig samples are used as a test set. N Bootsets are geerated usig the traiig set. The decisio of the N boot sets are aggregated to classify the test set. As suggested by Breima we may start with N = 25 as the umber of aggregated k-nn classifiers. The experimets are the repeated for varyig N, umber of compoet k-nn classifiers, to have a overview of the effect of the umber of classifiers o system performace. These results are referred to as the bagged classifier results. We compare these results to those obtaied from the sigle classifier, ad to those obtaied from other baggig methods of creatig bootstrap sets, as outlied i sectio 3, for two types of learig sets. I the first case the learig set is created by radomly takig samples from the full data set. This results i a set that may cotai some but ot all classes. The secod type of learig set is referred to as a modified learig set. It is costructed usig Method 3 which was metioed as a techique to create ubiased bootstrap sets. This results i a set that is represetative of all the classes, with class populatio ratios similar to those of the full set. The modified learig set simulates a ubiased sample space. To obtai ustable k-nn classifiers, we propose usig a optimal distace metric proposed by Short ad Fukuaga [19], as explaied i sectio 2. We repeat the experimets usig the Euclidea metric, i order to make a direct compariso ad asses the merits of baggig i cojuctio with the optimal metric. I our previous experimets, for each set size, the umber of earest eighbors, k, is foud automatically as the square root of the umber of traiig sets rouded to the closest iteger. I this paper 10 values of k are used for each traiig set size, to see the effect of k o the baggig performace uder the ew metric. The above is repeated for the four traiig set sizes icludig 10, 20, 40 ad 80 samples. The experimets are repeated for differet data sets to fid which leads to improved performace. The data sets used are obtaied from the UCI repository, ad the characteristics are show i table 1. We repeat the experimets for varyig feature set sizes. Where we radomly choose a subset of the full feature set. We experimet with 3 feature set sizes icludig 100, 80 ad 50 percet of the features. The radomly selected features are the same for the learig set ad the bootstrap sets. All experimets are repeated 100 times the averaged to achieve statistically reliable results. 5 Results We experimets with baggig knn classifiers usig a optimal distace metric. The aim is to establish whether baggig knn is useful whe the proposed metric is used. Our results, o 4 data sets, idicate that both the sigle classifier ad baggig results improve due to this metric, with the exceptio of small learig sets. Figures show baggig results for the BCW data set. Each figure shows the classificatio rate of a baggig method i compariso to the sigle classifier usig the Euclidea ad optimal metrics. For each value of k we ca compare the performaces at differet umber of fused classifiers, ragig from 3 to 25 bootstrap sets. The values of the x-axis represet the umber of fused bootstrap sets, whe multiplied by 2 ad added to 1. Therefore, 12 represets 25 bootstrap sets, while 5 represets 11. ISSN: 1790-5109 Page 353 ISBN: 978-960-6766-41-1

Results usig regular baggig method 1, show that at the smallest set size optimal metric baggig is almost ever superior. At set size 2 optimal baggig improves but it still is ot superior, with the exceptio at large values of k=9 ad 10 whe regular learig set is used ad at k greater tha 4 whe modified learig set is used. At set sizes 3 ad 4 optimal metric baggig is superior at all k except at k=5 ad 9 usig size 3 ad k=5, 7 ad 9 usig size 4, whe regular learig set is used. However if modified learig is used the optimal metric sigle classifier outperforms optimal metric baggig at all odd values of k. at the largest two sizes the optimal metric outperforms Euclidea regardless of type of classifier system. For all sizes the baggig performace improves as the umber of fused classifiers icreases. optimal baggig outperforms all. It is at this size that we begi to see the advatage of baggig whe method 2 is used i cojuctio with a optimal metric. This is most obvious at k=7, where the optimal metric yields a superior baggig performace while baggig usig the Euclidea metric yields worse performace tha the sigle classifier. Also, at size 4, the optimal metric always outperforms the Euclidea. However, at odd k, the sigle classifier outperforms baggig as k icreases. The differece reaches a sigificat level at k=9. Figure 2 BCW data results whe baggig method 4 is used with modified learig set usig Sum fusio strategy at set size 2 Figure 1 BCW data results whe baggig method 3 is used with regular learig set usig Sum fusio strategy at set size 2 Usig baggig method 2, at the smallest set size, Euclidea is best at k=4, 8 ad 9, otherwise, optimal is either better or equal to Euclidea. Optimal metric sigle classifier outperforms optimal baggig oly at k=1, 3 ad 10, for regular learig set ad at k=7 ad 10 for modified learig set. At set size 2, Euclidea is best at k=7 ad 8 usig regular learig set. However, at k=8 usig regular learig, ad at k=7 ad 8 usig optimal learig, whe the umber of fused classifiers icreases optimal baggig outperforms Euclidea baggig ad outperforms all. At this size, the sigle classifier outperforms baggig at odd values of k. At size 3, the optimal metric sigle classifier outperforms baggig at k=5 ad 9 oly. Otherwise, Usig baggig method 3, at the smallest set size, baggig outperforms the sigle at small values of k. It improves as umber of fused classifiers icreases. The superiority of the optimal is ot obvious at this size where the two metrics show a equal performace. At set size 2 optimal metric outperforms Euclidea ad optimal metric baggig shows the best performace. Ad exceptio is at k=3 usig regular learig set, where baggig reaches the sigle classifier oly at higher umber of fused classifiers. At the third set size, optimal metric baggig outperforms all, especially at larger umber of fused classifiers. However, usig modified learig set the sigle classifier usig a optimal metric outperforms baggig, although isigificatly, at k= 5 ad 9. This is also true at the fourth set size. ISSN: 1790-5109 Page 354 ISBN: 978-960-6766-41-1

Figure 3 BCW data results whe baggig method 3 is used with modified learig set usig Sum fusio strategy at set size 2 At the fourth baggig method, the smallest set size performace is similar to baggig method 3. However, usig a modified learig set the sigle classifier outperforms baggig. This is because bootstrap sets created usig baggig method 4 do ot cotai a correct represetatio of the sample space. At set sizes 2, 3 ad 4, baggig ad the sigle classifier show a mixed performace at odd values of k. At eve values of k baggig clearly outperforms the sigle classifier, especially as the umber of fused classifiers icreases. We ca coclude that baggig methods 2 usig a optimal metric outperforms the sigle classifier usig either learig set type at large traiig set sizes. A exceptio is whe k is the closest odd value to the square root of the umber of traiig samples. Baggig method 3 outperforms the sigle classifier, as method 2 did, i additio to the few cases where method 2 failed. At the smallest set sizes 1 ad 2 the two metrics show a mixed performace, ad the optimal metric looses its clear advatage over the Euclidea. 6 Coclusios Experimets o combiig k-nn classifiers usig baggig are coducted. Four differet baggig methods are experimeted uder two types of learig sets. I our experimets we varied the umber of combied classifiers, the umber of earest eighbors, the fusio method ad the learig set size. The experimets icluded four data sets obtaied from the UCI repository. Results of all data sets show the advatage of the optimal metric over the Euclidea for the sigle classifier at most set sizes ad values of earest eighbor k. Baggig gaied from the optimal metric, ad o three of the data types was able to outperform the optimal metric sigle classifier. Whe baggig usig the Euclidea metric was ot able to outperform the sigle classifier, the optimal metric baggig was able to outperform the sigle classifier. We foud that at small traiig set size baggig Euclidea may outperform the optimal metric. Therefore, we recommed the optimal metric at set sizes above 20 samples. We ca coclude that modified baggig methods 2 ad 3, usig the optimal metric, have the highest probability of achievig the best performace. Ackowledgmets This research was supported by the Public Authority for applied Educatio ad Traiig of Kuwait uder grat umber TR-06-01. Refereces 1. F. M. Alkoot ad J. Kittler. Moderatig k classifiers. Patter Aalysis ad Applicatios, 5(3):326-332, 2002 2. F. M. Alkoot ad J. Kittler. Modified product fusio. Patter Recogitio Letters, 23(8):957-965, 2002 3. Fuad M. Alkoot ad Josef Kittler, Populatio Bias Cotrol for Baggig k-nn Experts, I Proceedigs Sesor Fusio: Architectures, Algorithms, ad Applicatios V, Orlado, Florida, 17-20/April 2001. SPIE Volume 4385, pp36. 4. E. Bauer ad R. Kohavi. A empirical compariso of votig classificatio algorithms : Baggig, boostig ad variats. Machie Learig, pages 1-38, 1998. 5. C.L. Blake, E. Keogh, C.J. Merz. UCI repository of machie learig databases, http://www.ics.uci.edu/mlear/mlrepository.html, Departmet of Iformatio ad Computer Sciece, Uiversity of Califoria, Irvie, CA, 1998. ISSN: 1790-5109 Page 355 ISBN: 978-960-6766-41-1

6. L. Breima. Baggig predictors. Machie Learig, 24:123 140, 1996. 7. L. Breima. Bias, variace ad arcig classifiers. Techical Report 460, Statistics Departةet, Uiversity of Califoria at Berkeley, 1996. 8. P. dechazal ad B. Celler. Improvig ecg diagostic classificatio by combiig multiple eural etworks. I Computers i Cardiology, volume 24, pages 473{476. IEEE, 1997 9. T. Dietterich. A experimetal compariso of three methods for costructig esembles of decisio trees: Baggig, boostig, ad radomizatio. Machie Learig, pages 1-22, 1998 10. B. Draper ad K. Baek. Baggig i computer visio. I Proceedigs of the IEEE Computer Society Coferece o Computer Visio ad Patter Recogitio, pages 144-149. IEEE Comp Soc, Los Alamos, CA, USA, 1998. 11. B. Efro ad R. Tibshirai. A Itroductio to the Bootstrap. Chapma ad Hall, 1993. 12. T.K. Ho, J.J. Hull, ad S.N. Srihari. Decisio combiatio i multiple classifier systems. IEEE Trasactios o Patter Aalysis ad Machie Itelligece, 16(1):66 75, 1994. 13. J. Kittler ad F. M. Alkoot. Sum versus vote fusio i multiple classifier systems. IEEE Tras o PAMI, 25(1):110--115, 2003. 14. J. Kittler, M. Hatef, R. Dui, ad J. Matas. O combiig classifiers. IEEE Tras Patter Aalysis ad Machie Itelligece, 20(3):226 239, 1998. 15. J. Kittler. Combiig classifiers: A theoretical framework. Patter Aalysis ad Applicatios, 1:18 27, 1998. 16. J. Quila. Baggig, boostig ad c4.5. I Proceedigs of the 13th Natioal Coferece o Artificial Itelligece, volume 1, pages 725{730, Portlad, OR, USA,, 1996. AAAI, Melo Park, CA, USA 17. G. Rogova. Combiig the results of several eural etwork classifiers. Neural Networks, 7(5):777 781, 1994. 18. R. E. Schapire. A brief itroductio to boostig. I Proceedigs of the Sixteeth Iteratioal Joit Coferece o Artificial Itelligece, 1999. 19. Robert Short ad Keiosuke Fukuaga. A ew earest eighbor distace measure. I IEEE coferece, pages 81 to 86, 1980. 20. D.H. Wolpert. Stacked geeralizatio. Neural Networks, 5(2):241 260, 1992. 21. K Woods, W P Kegelmeyer, ad K Bowyer. Combiatio of multiple experts usig local accuracy estimates. IEEE Tras PAMI, 19:405 410, 1997. 22. L. Xu, A. Krzyzak, ad C.Y. Sue. Methods of combiig multiple classifiers ad their applicatios to hadwritig recogitio. IEEE Tras. SMC, 22(3):418 435,1992. ISSN: 1790-5109 Page 356 ISBN: 978-960-6766-41-1