ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April PDF Free Download

Performance Evoluton of Dfferent Codng Methods wth β - densty Decodng Usng Error Correctng Output Code Based on Multclass Classfcaton Devangn Dave, M. Samvatsar, P. K. Bhanoda Abstract A common way to address a mult-class classfcaton problem s to desgn a model that conssts of handpcked bnary classfers and to combne them so as to solve the problem. Error-Correctng Output Codes (ECOC) s one such framework that deals wth mult-class classfcaton problems. ECOC framework s a powerful tool to deal wth mult-class classfcaton problems because of the recent works n the ECOC doman has shown promsng results demonstratng mproved performance. The error correctng ablty mproves and enhances the generalzaton ablty of the base classfers. The man goal of ths paper s to compare Dscrmnate ECOC codng method wth one-versus-one, one-versus-all, dense random, sparse random codng methods usng pessmstc β-densty as decodng method whch shows sgnfcant performance mprovement. The tool we have used for performng experments s MATLAB. Index Terms Error Correctng Output Codes (ECOC), Codng, Decodng, Dscrmnate ECOC (DECOC) I. INTRODUCTION The task of supervsed machne learnng can be seen as the problem of fndng an unknown functon C(x) gven the tranng set of example pars < x; C (x) >. C(x) s usually a set of dscrete labels. For example, n face detecton, C(x) s a bnary functon c(x) belongs to {face, nonface} n optcal dgt recognton c(x) belongs to {0 9}.In order to address the bnary classfcaton task many technques and algorthms have been proposed: decson trees, neural networks, large margn classfcaton technques, etc. Some of those methods can be easly extended to multclass problems. However, some other powerful and popular classfers, such as AdaBoost [15] and Support Vector machnes [14], do not extend to multclass easly. In those stuatons, the usual way to proceed s to reduce the complexty of the multclass problem nto multple smpler bnary classfcaton problems. There are many dfferent approaches for reducng multclass to bnary classfcaton problems. The smplest approach consders the comparson between each class aganst all the others. Ths produces N c bnary problems, where Nc s the number of classes. Other researchers suggested the comparson of all possble pars of classes [13], resultng n an N c (N c -1)/2 set of bnary problems. In ths approach, the problem s transformed n n bnary classfcaton sub problems, where n s the error correctng output code length n ε {N c,, }. Then, the output of all classfers must be combned tradtonally usng Hammng dstance. In the lterature, one can fnd several powerful bnary classfers. However, when one needs to deal wth multclass classfcaton problems, many learnng technques fal to manage ths nformaton. Instead, t s common to construct the classfers to dstngush between ust two classes and to combne them n some way. In ths sense, Error-Correctng Output Codes (ECOCs) were born as a general framework to combne bnary problems to address the multclass problem. The strategy was ntroduced by Detterch and Bakr n 1995. Based on the error correctng prncples and because of ts ablty to correct the bas and varance errors of the base classfers, ECOC has been successfully appled to a wde range of applcatons such as face recognton, face verfcaton, text recognton, and manuscrpt dgt classfcaton. It was when Allwen et al. [2] ntroduced a thrd symbol (the zero symbols) n the codng process when the codng step receved specal attenton. Ths symbol ncreases the number of parttons of classes to be consdered n a ternary ECOC framework by allowng some classes to be gnored. Then the Ternary codng Matrx becomes M belongs to {-1, +1, 0} N n.in ths case, the symbol zero means that a partcular class s not consdered by a certan bnary classfer. The rest of the paper s organzed as follows: secton-ii descrbes the basc framework of Error Correctng Output Codes, secton-iii covers results and comparson wth dfferent codng methods on varous datasets, secton-iv ncludes concluson. II. THE BASIC FRAMEWORK OF ERROR CORRECTING OUTPUT CODES 1. Gven a set of N c classes, the bass of the ECOC framework conssts of desgnng a codeword for each of the classes. 2. These code words encode the membershp nformaton of each class for a gven bnary problem. 3. Arrangng the code words as rows of a matrx, we obtan a codng matrx M c, where M c {-1, 0, +1} Nc n, beng n the length of the code words codfyng each classes. 145

4. From the pont of vew of learnng, M c s constructed by consderng n bnary problems each one correspondng to a column of the matrx M c. 5. Each of these bnary problems splts the set of classes n two parttons (coded by +1 or -1 n M c accordng to ther class set membershp or 0 f the class s not consdered by the current bnary problem). 6. Then at the decodng step, applyng the n traned bnary classfers, a code s obtaned for each data pont n the test set. 7. Ths code s compared to the base code words of each class defned n the matrx M c, and the data pont s assgned to the class wth the closest codeword. Fgure 1 shows ECOC codng desgn for a 4-class problem. Whte, black and grey postons corresponds to the symbols +1, -1 and 0, respectvely. Once the four bnary problems are learnt, at the decodng step a new test sample X s tested by the n classfers. Then the new codeword x={x1 xn} s compared wth the class code words {C1 C4}, classfyng the new sample by the class C whch codeword mnmzes the decodng measure. Fg. 1. ECOC Codng Desgn for 4-class Problem. A. Codng Desgn Here the ECOC codng desgn covers the state-of-the art of codng strateges, manly dvded n two man groups: problem-ndependent approaches, whch do not take nto account the dstrbuton of the data to defne the codng matrx, and the problem-dependent desgns, where nformaton of the partcular doman s used to gude the codng desgn [1]. Problem-Independent ECOC Codng Desgns. One-versus-all (Rfkn and Klautau, 2004): Nc dchotomzers are learnt for Nc classes, where each one splts one class from the rest of classes. One-versus-one(Nlsson,1965):n=Nc(Nc 1)/2 dchotomzers are learnt for Nc classes, splttng each possble par of classes. Dense Random (Allwen et al., 2002): n=10.lognc dchotomzers are suggested to be learnt for Nc classes, where P ( 1) = 1 P (+1), beng P ( 1) and P (+1) the probablty of the symbols -1 and +1 to appear, respectvely. Then, from a set of defned random matrces, the one whch maxmzes a decodng measure among all possble rows of Mc s selected. Sparse Random (Escalera et al., 2009): n = 15.logNc dchotomzers are suggested to be learnt for Nc classes, where P (0) = 1 P ( 1) P (+1), defnng a set of random matrces Mc and selectng the one whch maxmzes a decodng measure among all possble rows of Mc. Problem-Dependent ECOC Codng Desgns ECOC (Puol et al., 2006): problem-dependent desgn that uses n = Nc 1 dchotomzers. The parttons of the problem are learnt by means of a bnary tree structure usng exhaustve search or a SFFS crteron. Fnally, each nternal node of the tree s embedded as a column n Mc. Forest-ECOC (Escalera et al., 2007): problem-dependent desgn that uses n = (Nc 1) T dchotomzers, where T stands for the number of bnary tree structures to be embedded. Ths approach extends the varablty of the classfers of the DECOC desgn by ncludng extra dchotomzers. ECOC-ONE (Puol et al., 2008): problem-dependent desgn that uses n = 2 Nc suggested dchotomzers. A valdaton sub-set s used to extend any ntal matrx Mc and to ncrease ts generalzaton by ncludng new dchotomzers that focus on dffcult to splt classes. In ths paper we have analyzed the performance by usng a problem-dependent Dscrmnate ECOC (DECOC) Desgn. The goal of ths work s to fnd a compact n terms of codeword length matrx M wth hgh dscrmnatve power. DECOC s born as an answer to three demands: Frst, a heurstc for the desgn of the ECOC matrx, second, the search for hgh-performance classfcaton usng the mnmum number of classfers, and, thrd, a tool to descrbe the classfcaton doman n terms of class dependences. The proposed method renders each column of the output code matrx to the problem of fndng the bnary partton that dvdes the whole set of classes so that the dscrmnablty between both sets s maxmum. The crteron used for achevng ths goal s based on the mutual nformaton between the feature data and ts class label. Snce the problem s defned as a dscrete optmzaton process, we propose usng the floatng search method as a suboptmal search procedure for fndng the partton that maxmzes the mutual nformaton. The whole ECOC matrx s created wth the ad of an ntermedate step formulated as a bnary tree. Wth ths formulaton, we ensure that we decompose the multclass problem nto Nc - 1 bnary problems. B. Decodng Desgn decodng: n HD( x, y) 1(1 sgn( x y )) / 2, beng x a test codeword and y a codeword from M c correspondng to class C. Inverse Hammng decodng: IHD(x, y) = max ( 1 D T ), where ( 1, 2 ) = HD (y 1, y 2 ), and D s the vector of Hammng decodng values of the test codeword x for each of the base code words y. Eucldean decodng: ED( x, y ) n 2 1( x y Attenuated Eucldean decodng: 146

n AED( x, y ) 1 y x ( x y ) Loss-based decodng: LB(, v ) 1 L( y. f ( )), n Where s a test sample, L s a loss functon, and f s a real-valued functon f: R n R. Probablstc-based decodng: PD( y, x log( [1,..., n] : M (, 0 P( x M c (, f ) K), Where K s a constant factor that collects the probablty mass dspersed on the nvald codes, and the probablty P(x = Mc (, f ) s estmated by means of P x y f ey ( v f ( w ) 1/1 ), Where vectors and are obtaned by solvng an Optmzaton problem (Passern et al., 2004). Laplacan decodng: 1 LAP(x, y ) =, where α s the number of K matched postons between x and y, β s the number of mss-matches wthout consderng the postons coded by 0, and K s an nteger value that codfes the number of classes consdered by the classfer. Pessmstc β-densty Dstrbuton decodng: v 1 Accuracy s : v s( v,, ) dv, where 3 1 ( v,, ) v (1 v), s the β-densty K Dstrbuton between a codeword x and a class codeword y for class c, and v R :[0,1 ]. Loss-Weghted decodng: LW(, ) n 1M W (, L( y. f (, ), Where M (, H(, / 1 H(,, w m k n k H(, 1/ m 1 ( h ( ),,, 1, x y ( x,, 0, otherwse m s the number of tranng samples from class C, and s the k th sample from class C. k c 2 aganst classes C2, C3, and C4, dchotomzer h 2 learns C2 aganst classes C1, C3, and C4, etc. The Dense Random Strategy The dense random strategy, where a random matrx M s generated, s maxmzng the rows and columns separablty n terms of the Hammng dstance. An example of a dense random matrx for a four class problem s shown n Fgure 2(c). One-Versus-One and Random Sparse Strategy It was when Allwen et al. Introduced a thrd symbol (the zero symbols) n the codng process when the codng step receved specal attenton. Ths symbol ncreases the number of parttons of classes to be consdered n a ternary ECOC framework by allowng some classes to be gnored. Then, the ternary codng matrx becomes. In ths case, the symbol zero means that a partcular class s not consdered by a certan bnary classfer. Thanks to ths, strateges such as one-versus-one and random sparse codng can be formulated n the framework. Fgure 2(b) shows the one-versus-one ECOC confguraton for a four-class problem. In ths case, the gray postons correspond to the zero symbol. A possble sparse random matrx for a four-class problem s shown n Fgure 2(d). C. Dscusson of Some ECOC codng methods One-Versus-All strategy The most well-known bnary codng strateges are the one-versus-all strategy, where each class s dscrmnated aganst the rest of classes. In Fgure2 (a), the one-versus-all ECOC desgn for a four-class problem s shown. The whte regons of the codng matrx M correspond to the postons coded by 1 and the black regons to -1. Thus, the code word for class C1 s {1,-1,-1,-1}. Each column of the codng matrx codfes a bnary problem learned by ts correspondng dchotomzer h. For nstance, dchotomzer h 1 learns C1 Fg.2 (a) one-versus-all, (b) one-versus-one, (c) dense random, and (d) sparse random ECOC desgns. D. An Algorthm for Problem Dependent ECOC based on DECOC Method Create the trval partton { } of the set of classes { : { } = {{ }, { = { } 147

Step 1. s the frst element of = \ { Step 2. Fnd the optmal bnary partton BP ( ): {, } = (I(x,d(BP( Where I s the mutual nformaton crteron, x s the random varable assocated to the features and d s the dscrete random varable od the dchotomy labels. a Step 3. = { } f > 1 Step 4. If K=k+1 go to Step-1. [a Use SFFS algorthm as the maxmzaton procedure and MI of to estmate I.] Table-I Terms Used C set of classes I Mutual Informaton ndex J data matrx of the problem L set of class labels ξ - error functon J data of class C h th dchotomzer M codng matrx m number of obect nstances n number of dschotomzers P total number of experments N c number of classes ƍ { ƍ 1, ƍ 2 }, set of postve & negatve sub-set of the th bnary problem The goal of ths work s to fnd a compact n terms of codeword length matrx M wth hgh dscrmnatve power. The general algorthm can be descrbed as follows: General procedure Create the Column Code Bnary Tree recursvely, fnd the most dscrmnant bnary partton of each parent node class set { ƍk 1,ƍk 2 } usng floatng search wth fast quadratc mutual nformaton crteron. Assgn to the column k of matrx M the code obtaned by the partton { ƍk 1,ƍk 2 }. The frst step s the creaton of the Column Code Bnary Tree (CCBT), where each node of the tree defnes a partton of the classes. The partton at each node must satsfy the condton of beng hghly separable n terms of dscrmnaton. Ths dvson s obtaned as the result of the maxmzaton of the quadratc mutual nformaton between the data x and the labels created for such partton d. The algorthm used for the dscrete maxmzaton s the floatng search method, whch wll be ntroduced n the next secton. Above the basc algorthm for creatng the code column bnary tree (CCBT) has been shown. In the algorthm, d s a dscrete random varable, so that, gven a bnary partton { ƍk 1,ƍk 2 }of the set Sk, { ƍk 1,ƍk2 }=BP(Sk), d s defned n the followng terms, d = d(x, BP(Sk )) = The tree must be seen as a means to fndng the codewords. The second step s the process of fllng the ECOC matrx. The fnal matrx M s composed by the codes obtaned at each node except for the leaves. Those codes are placed as columns n the codng matrx, M(., ). In order to create each column code, we use the relatonshp between a parent node and ts chldren. Therefore, gven a certan class C r and the class set assocated to node (where ƍ 1 k and ƍ 2 k are the sets of classes for each one of the chldren of the node k, respectvely), matrx M s flled as follows: M(r,) = Note that the number of columns n concdes wth the number of nternal nodes. It s easy to see that, n any bnary tree, the number of nternal nodes s N c - 1 gven that the number of leaves s N c. Therefore, by means of the CCBT, we can assure that the codeword wll have length N c - 1. Fgure 3 shows an example of a CCBT for eght classes. On the rght sde of the fgure, we show the resultng dscrmnant ECOC matrx. The whte squares are +1, black squares are -1, and gray squares have 0 value. Observe, for nstance, that column N5 corresponds to the partton and. On the other hand, f we look at the rows of the matrx, the codeword assocated to class 6 (c6) s {+1,0,-1,0,-1,0,+1}.From a more general pont of vew, the creaton of the ECOC matrx s only one of the parts nvolved n the multclass classfcaton technque. The other two remanng parts to be defned are the dchotomy learnng technque and the decodng strategy. Here chosen classfer s AdaBoost for each dchotomy. A maxmzaton process s needed to obtan the dvson of the classes n two sets. Although lookng for the best partton set requres of an exhaustve search among all possble parttons, due to the mpractcablty of ths endeavor a suboptmal strategy must be used. The strategy chosen s the floatng search method. The followng subsecton detals ths method that allows the problem to be computatonally feasble. 148

Precson Precson Precson Precson Accuracy Irs 4 3 Numerc 150 Fg. 3Example of converson from bnary tree to the ECOC matrx. The Floatng search method [9] was born as a suboptmal sequental search method for allevatng the prohbtve computaton cost of exhaustve search methods n feature selecton. Furthermore, these methods allowed the search crteron to be non monotonc, thus solvng the man constrant of many sequental methods. Floatng search methods can be descrbed as a dynamcally changng number of forward steps and backward steps as long as the resultng subsets are better than the prevously evaluated ones at that level.. In ths sense, ths method avods nestng effects that are typcal of sequental forward and backward selecton whle beng equally step-optmal snce the best (worst) tem s always added (dscarded) to (from) the set. Our goal s to maxmze the mutual nformaton between the data n the sets and the class labels created for each subset. Fast Quadratc Mutual Informaton Mutual nformaton (MI) s a well-known crteron to compute the amount of nformaton that one random varable tells about another one. In classfcaton theory, ths measure has been shown to be optmal n terms of class separaton [6], [4], allowng to take nto account hgh-order statstcs. MI also bounds the optmal Bayes error rate. Fnally, to decode problem-dependent desgn of Dscrmnate ECOC, we take advantage of the recently proposed Pessmstc β densty dstrbuton decodng.pessmstc Beta Densty Dstrbuton Decodng (β-den), s based on estmatng the probablty densty functons between code words. III. RESULT AND COMPARISON MATLAB s a hgh-level techncal computng language and nteractve envronment for algorthm development, data vsualzaton, data analyss, and numerc computaton. In ths secton the proposed method s compared wth the dfferent ECOC Codng Methods [17] whch are already avalable. For Comparson One Vs One, One Vs All and Random (Dense Random/Sparse Random) Codng strateges have been used wth DECOC Codng strategy. Table-II shows the characterstcs of the datasets whch have been used to perform experments. Table-II Dataset Characterstcs Problem #Attrbute s #classes Attrbute Types #nstance s Vehcle 18 4 Numerc 15000 Segment 19 7 Numerc 1500 Two 128 4 Numerc 5000 Patterns The accuracy comparson and results are shown n Table III and Fgure 4, Table IV and Fgure 5 show the TP_rate, FP_rate, Precson and calculatons for all the datasets wth varous codng methods whch ndcates that by usng DECOC wth Pessmstc β - Densty Dstrbuton Decodng for Multclass classfcaton, maxmum numbers of nstances of data have been correctly classfed. Table Iv s shown n Appendx. Table-III Classfcaton Accuracy on the datasets Dataset Accuracy OneVsOne OneVsAll Random DECOC Vehcle 74.1467 92.7333 97.5467 99.0733 Segment 68.3333 93.4000 85.0667 96.9333 Irs 96.6667 96.6667 98.6667 99.3333 Two Patterns 1.2 1 0.8 0.6 0.4 0.2 0 120 100 80 60 40 20 0 61 54.3800 65.3400 55.0600 Vehcle Segment Irs Two Patterns Accuracy OneVsOne Accuracy OneVsAll Accuracy Random Accuracy Random DECOC Fg.4. Accuracy chart of varous codng methods Vehcle Segment Irs Two Patterns OneVsOne OneVsAll Random DECOC Fg. 5.,, Precson and Chart IV. CONCLUSION Dscrmnate ECOC desgn wth β-densty Decodng s a very promsng alternatve to other ECOC methods, 149

frequently outperformng most of them. Ths ECOC strategy s a novel way to model complex multclass classfcaton problems. The method s based on embeddng dchotomzers n a problem-dependent ECOC desgn. The results are even more sgnfcant when one has a suffcently large tranng sze. The zero symbol produces serous nconsstences when usng the tradtonal decodng strateges so pessmstc β-densty dstrbuton decodng s used whch gves sgnfcant performance mprovement. By performng the varous experments on the selected datasets, t s observed that accuracy can be ncreased by 5 to 6% wth compare to other strateges. REFERENCES [1] Orol Puol, Peta Radeva, Member, IEEE, and Jord Vtra Dscrmnant ECOC: A Heurstc Method for Applcaton Dependent Desgn of Error Correctng Output Codes, IEEE Transactons on pattern analyss and machne ntellgence, Vol. 28, No. 6, June 2006. [2] E. Allwen, R. Schapre, and Y. Snger, Reducng Multclass to Bnary: A Unfyng Approach for Margn Classfers, J. Machne Learnng Research, vol. 1, pp 113 141, 2002. [3] E.L Allwen, R.E Shapre, and Y. Snger, Reducng Multclass to Bnary: A Unfyng Approach for Margn Classfers, J. Machne Learnng Research, vol. 1, pp 113-141, 2000. [4] J. Prncpe, D. Xu, and J. Fsher III, Informaton Theoretc Learnng, Unsupervsed Adaptve Flterng, Wley, 2000. [5] K. Crammer and Y. Snger. On the learn ablty and desgn of output codes for multclass problems. Machne Learnng, 47(2-3):201 233, 2002. [6] K. Torkkola, Feature Extracton by Non-Parametrc Mutual Informaton Maxmzaton, J. Machne Learnng Research, vol. 3, pp. 1415-1438, 2003. [7] N.J. Nlsson, Learnng Machnes. McGraw-Hll, 1965. [8] O. Puol and P. Radeva. Dscrmnant ecoc: A heurstc method for applcaton dependent desgn of error correctng output codes. PAMI, 28(6):1007 1012, 2006. [9] P. Pudl, F. Ferr, J. Novovcova, and J. Kttler, Floatng Search Methods for Feature Selecton wth Nonmonotonc Crteron Functons, Proc. Int l Conf. Pattern Recognton, pp. 279-283, 1994. [10] R. Ghader and T.Wndeatt. Crcular ecoc: A theoretcal and expermental analyss. In ICPR, pages 2203 2206, 2000. [11] R. Schapr and Y. Snger. Solvng multclass learnng problems va error-correctng output codes. Journal of Artfcal Intellgence Research, 2:263 286, 1995. [12] T.G. Detterch and G. Bakr, Solvng Multclass Learnng Problems va Error-Correctng Output Codes J. Artfcal Intellgence Research, vol. 2, pp. 263-286, 1995. [13] T. Haste and R. Tbshran, Classfcaton by Par wse Couplng, Annals of Statstcs, vol. 26, no. 2, pp. 451-471, 1998. [14] V.N. Vapnk, the Nature of Statstcal Learnng Theory, Sprnger 1995. [15] Y. Freund and R.E. Shapre, A Decson-Theoretc Generalzaton of On-Lne Learnng and an Applcaton to Boostng, J. Computer and System Scences, vol. 55, no. 1, pp. 119-139, 1997. [16] Y. Wess, A. Torralba, and R. Fergus. Spectral hashng. In NIPS, 2008. [17] Sergo Escalera,Orol Puol,Peta Radeva,Error-Correctng Ouput Codes Lbrary Journal of Machne Learnng Research 11 (2010) 661-664 Submtted 8/09; Revsed 1/10; Publshed 2/10. 150

APPENDIX Table-IV.,, Precson and Calculaton for All the Datasets. Dataset OneVsOne OneVsAll Random DECOC Vehcle 0.7415 0.9273 0.9755 0.9907 0.0862 0.0242 0.0082 0.0031 Precson 0.7875 0.9381 0.9776 0.9909 0.6719 0.9265 0.9765 0.9907 Segment 0.6794 0.9253 0.8457 0.9682 0.0524 0.0061 0.0248 0.0051 Precson 0.6770 0.9251 0.8330 0.9731 0.6798 0.9257 0.8451 0.9689 Irs 0.9667 0.9667 0.9867 0.9933 0.0167 0.0167 0.0067 0.0033 Precson 0.9668 0.9668 0.9872 0.9935 0.9667 0.9667 0.9867 0.9933 Two Patterns 0.6029 0.5478 0.6536 0.5473 0.1309 0.1507 0.1154 0.1510 Precson 0.6097 0.5439 0.6553 0.5369 0.5714 0.5456 0.6588 0.5437 151

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012