Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *

Size: px

Start display at page:

Download "Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *"

Dulcie Copeland
5 years ago
Views:

1 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press 1 Journal of Zhejang Unversty-SCIENCE C (Computers & Electroncs) ISSN (Prnt); ISSN X (Onlne) E-mal: jzus@zju.edu.cn Improvng Nave Bayes classfer by dvdng ts decson regons * Zh-yong YAN, Cong-fu XU, Yun-he PAN (Insttute of Artfcal Intellgence, Zhejang Unversty, Hangzhou 3127, Chna) E-mal: xucongfu@zju.edu.cn Receved Dec. 2, 21; Revson accepted Apr. 8, 211; Crosschecked Abstract: Classfers can be regarded as dvdng the data space nto decson regons separated by decson boundares. In ths paper we analyze decson tree algorthms and the Tree algorthm from ths perspectve. A decson tree can be regarded as a classfer tree. Each classfer on a non-root node can be regarded as traned n decson regons of the classfer on the parent node. The Tree algorthm generates a classfer tree wth the C4.5 algorthm and the Nave Bayes classfer as the root and leaf classfers respectvely. The Tree algorthm can be regarded as tranng Nave Bayes classfers n decson regons of the C4.5 algorthm. We propose the second dvson (SD) and three soft second dvson (SD-soft) algorthms to tran classfers n decson regons of the Nave Bayes classfer. These four new algorthms all generate two-level classfer trees wth the Nave Bayes classfer as root classfers. The SD and three SD-soft algorthms can make good use of the nformaton contaned n nstances near decson boundares, and the nformaton may be gnored by the Nave Bayes classfer. Expermental results on 3 UCI data sets ndcate that when usng the C4.5 algorthm and support vector machne as leaf classfers, the SD algorthm can obtan better generalzaton abltes than the Tree and the AODE algorthms. Expermental results also ndcate that when selectng approprate argument values three SD-soft algorthms can obtan better generalzaton abltes than the SD algorthm. Key words: Nave Bayes classfer, Decson regon, Tree, C4.5 algorthm, SVM do:1.1631/jzus.c1437 Document code: A CLC number: 1 Introducton The Nave Bayes classfer (Domngos and Pazzan, 1997) s an example of global learnng (Huang et al., 28), whch obtans a dstrbuton estmaton of the whole data set. The Nave Bayes classfer assumes that attrbutes of nstances are ndependent gven the class (Domngos and Pazzan, 1997). Although ths assumpton s very nave, the Nave Bayes classfer has good generalzaton ablty, and s one of top 1 algorthms n data mnng voted by IEEE ICDM 26 (Wu et al., 28). There are many studes on mprovng the generalzaton ablty of the Nave Bayes classfer, amongst whch the Tree algorthm (Kohav, 1996) Correspondng author * Ths paper s supported by the Natonal Natural Scence Foundaton of Chna under Grant No , and partally by the Natonal Basc Research Program of Chna under Grant No. 21CB Zhejang Unversty and Sprnger-Verlag Berln Hedelberg 211 s typcal. The Tree algorthm trans Nave Bayes classfers on the leaf nodes of a decson tree. Some researchers regard classfcaton as dvdng data space X nto some decson regons separated by decson boundares (Bshop, 26), although most researchers regard classfcaton as fndng a mappng from data space X to label set Y (Mtchell, 1997). We call the former perspectve as the dvdng perspectve. In ths paper, we wll analyze decson tree algorthms and the Tree algorthm from the dvdng perspectve. A decson tree can be regarded as a classfer tree composed of two types of classfers. An Tree can be regarded as a two-level classfer tree wth a decson tree classfer as the root node and several Nave Bayes classfers as leaf nodes. Then the Tree algorthm can be regarded as tranng Nave Bayes classfers n decson regons of the C4.5 algorthm (Qunlan, 1993). The Tree algorthm can utlze advantages of both the C4.5 algorthm and the Nave Bayes classfer, and t outper-

2 2 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press forms these two classfers (Kohav, 1996). Beng contrary to the Tree algorthm whch trans Nave Bayes classfers n decson regons of the C4.5 algorthm, we propose the second dvson (SD) algorthm to tran classfers n decson regons of the Nave Bayes classfer. The SD algorthm wll generate a two-level classfer tree, wth the Nave Bayes classfer as the root node and other classfers as leaf nodes. We also propose three soft versons of the SD algorthm (SD-soft) to deal wth overlapped regons generated by the Nave Bayes classfer. In ths paper leaf classfers used are the Nave Bayes classfer, the C4.5 algorthm and Support Vector Machne (SVM) (Vapnk, 1995). We perform experments on 3 UCI data sets (Murphy and Aha, 1998) to compare the SD algorthm wth the Tree and the AODE algorthms (Webb et al., 25). We also compare three SD-soft algorthms wth the SD algorthm. We apply the SD algorthm to the AODE algorthm, the C4.5 algorthm and SVM. Fnally we adopt the global / local learnng theory of Huang et al (28) to dscuss why the SD algorthm works. 2 Analyss of decson tree and the Tree algorthms from the dvdng perspectve The dvdng perspectve regards classfers as dvdng data space X nto some decson regons (Bshop, 26). The defnton of decson regon s as follows. Defnton 1 (Decson Regon) If regon R n data space X satsfes the followng two condtons for classfer C, then t wll be called the decson regon of label Y under classfer C, denoted as DR(C, Y ). x R, C( x ) = Y ; (1) x R, C( x ) Y. (2) For decson regons, the followng formulas are true. DR( CY, ), Y Y; (3) DR( CY, ) DRCY (, j) =, Y Yj; (4) DR( CY, ) = X. (5) Y Y From Eq. (3) (5), t s clear that decson regons of classfer C consttute a partton of data space X. From the dvdng perspectve, a classfer can be regarded as a dvder, whose work s to obtan a partton of data space X. 2.1 Analyss of decson tree algorthms The decson tree s a knowledge representaton method. Non-leaf nodes of a decson tree are attrbutes, and leaf nodes of a decson tree are labels. On each non-leaf node, branches are generated accordng to value of the attrbute on ths node. An example of the decson tree s shown n Fg. 1. R 1 N 1 X N R 2 R 3 R 4 R 5 R 6 N2 Y Y 1 Y 2 Y 3 Fg. 1 Structure of a decson tree and regons correspondng to ts nodes Each node of a decson tree corresponds to a regon of data space X. For example, the root node N of the decson tree n Fg. 1 corresponds to data space X. A leaf node of a decson tree corresponds to a part of a decson regon. Decson tree algorthms adopt a majorty votng method to determne the label for the regon correspondng to a leaf node. On each non-leaf node, decson tree algorthms dvde the regon to whch the node corresponds nto several regons to whch the node s chldren nodes correspond. The relatonshp among regons of parents and chldren nodes of the decson tree n Fg. 1 s as follows. X = R1 R2, R1 R2 =, R1, R2 ; (6) R1 = R3 R4, R3 R4 =, R3, R4 ; (7) R = R R, R R =, R, R. (8) It s clear that each non-leaf node corresponds to a dvder. From the dvdng perspectve, a classfer

3 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press 3 can be regarded as a dvder. If regons generated by a dvder are assocated wth labels, then regons can be regarded as decson regons. After ths process, a dvder can be regarded as a classfer. For example, f node N 1 and N 2 are assocated wth label Y 1 and Y 2 respectvely, then the root node N can be regarded as correspondng to a classfer, whose output s Y 1 or Y 2. Each non-leaf node corresponds to a pece-wse classfer (PWC) n Eq. (9). Obvously, PWC s a very smple classfer. Y, x[ ] < v1; Y1, x[ ] [ v1, v2); PWC( x) =... Yk 1, x[ ] vk 1. (9) The regon correspondng to a node can be regarded as a decson regon of the classfer correspondng to the parent node. For a non-leaf node, decson tree algorthms tran a PWC n the decson regon of the classfer correspondng to ts parent node. In Fg. 1, PWC correspondng to the root node N dvdes data space X nto R 1 and R 2, whch correspond to node N 1 and N 2 respectvely. For node N 1, PWC 1 s traned n decson regon R 1 of PWC. Smlarly, for node N 2, PWC 2 s traned n decson regon R 2 of PWC. Each leaf node corresponds to a majorty votng classfer (MVC) n Eq. (1). Lke PWC, MVC s also a very smple classfer. MVC( x) = argmax {( x, y) y = Y }. (1) Y Decson tree algorthms tran an MVC n the regon correspondng to a leaf node. The label predcted by a leaf node s the one predcted by the MVC traned n the regon correspondng to the leaf node. Each non-leaf node corresponds to a PWC, and each leaf node corresponds to an MVC, thus a decson tree can be regarded as a classfer tree. The classfer tree correspondng to the decson tree n Fg. 1 s shown n Fg. 2. Although both PWC and MVC are very smple, we can obtan very good generalzaton ablty by organzng these two classfers as a tree. For example, the C4.5 algorthm s one of top 1 algorthms n data mnng voted by ICDM 26 (Wu et al., 28). Fg. 2 The classfer tree correspondng to the decson tree n Fg Analyss of the Tree algorthm The Tree algorthm also generates a decson tree, whose leaf nodes are Nave Bayes classfers nstead of labels. An Tree s also a classfer tree, whose non-leaf and leaf nodes are PWCs and Nave Bayes classfers respectvely. The structure of an Tree s shown n Fg. 3. X PWC R 1 R 2 PWC 1 PWC 2 R 3 R 4 R 5 R Fg. 3 An Tree as a classfer tree In Fg. 3, f node, 1, 2 and 3 are assocated wth label Y, Y 1, Y 2 and Y 3, then regon R 3, R 4, R 5 and R 6 can be regarded as decson regons of PWC 1 and PWC 2. Then the sub-tree composed of PWC, PWC 1 and PWC 2 can be regarded as a decson tree generated by the C4.5 algorthm, whose objectve s to generate decson regons for tranng Nave Bayes classfers. So an Tree can also be regarded as a classfer tree composed of a C4.5 classfer and several Nave Bayes classfers. These Nave Bayes classfers are traned n decson regons of the C4.5 classfer. The two-level classfer tree correspondng to the classfer tree n Fg. 3 s shown n Fg. 4.

4 4 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press Fg. 4 An Tree s a classfer tree composed of a C4.5 and several s 3 Tranng classfers n decson regons of the Nave Bayes classfer Tranng Nave Bayes classfers n decson regons of the C4.5 algorthm can mprove the generalzaton abltes of both the Nave Bayes classfer and the C4.5 algorthm (Kohav, 1996). In ths secton, we wll study the method of tranng classfers n decson regons of the Nave Bayes classfer. 3.1 The SD algorthm The smplest method s to tran classfers n decson regons of the Nave Bayes classfer, whch s shown n Fg. 5. Fg. 5 Tran classfers n decson regons of the Nave Bayes classfer There are two questons when tranng classfers n decson regons of the Nave Bayes classfer: (1) When should we tran classfer C n a decson regon of the Nave Bayes classfer? (2) When classfer C s traned n a decson regon of the Nave Bayes classfer, there are two classfers n ths decson regon. Then whch classfer should be selected? For queston 1, t s clear that when classfer C can mprove the generalzaton ablty of the Nave Bayes classfer n the decson regon classfer C should be traned. However, t s very dffcult to determne whether classfer C can do ths. So we only exclude one mpossble case that when the tranng accuracy of the Nave Bayes classfer n the decson regon s 1%. When ths case s true, all nstances n the decson regon have the same class. If classfer C s traned n the decson regon, only a classfcaton model wth Y as ts unque output can be obtaned. Ths classfcaton model cannot mprove the generalzaton ablty of the Nave Bayes classfer n ths decson regon. For queston 2, between the Nave Bayes classfer and classfer C, the one wth better generalzaton ablty should be selected, but t s very dffcult to determne n the tranng phase. There are at least three methods for selectng a better classfer. The frst method s to dvde tranng data set D nto tranng sub-set and valdaton sub-set and then choose the classfer wth better test accuracy on the valdaton sub-set. The second method adopts cross valdaton to select the better classfer. The thrd method s to select the classfer wth better tranng accuracy. Amongst above three methods, the former two may obtan more accurate selecton, but they smultaneously need more tme. We adopt the last method n ths paper. Tranng the classfer can be regarded as dvdng data space X once. Then tranng classfer C n decson regons of the Nave Bayes classfer can be regarded as the second dvson to data space X. The procedure of the second dvson (SD) algorthm s shown n Fg. 6. Lke the Tree algorthm, the SD algorthm also generates a two-level classfer tree. In ths classfer tree, the Nave Bayes classfer s the root classfer, and classfer C s the leaf classfer. 3.2 The SD-soft algorthms The SD algorthm trans classfers n decson regons of the Nave Bayes classfer. Decson regons consttute a partton of data space X. Ths dvdng s hard dvdng. But the Nave Bayes classfer can generate overlapped regons n data space X, whch means nstance (x, y) may belong to several decson regons wth condton probabltes. In ths sub-secton we propose a soft verson of the SD al-

5 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press 5 gorthm to deal wth overlapped regons generated by the Nave Bayes classfer. Tranng phase: Input: tranng data set D, the Nave Bayes classfer, classfer C Output: a two-level classfer tree named SD tree (composed of, C, ndcator vector s_used) 1. Tran the Nave Bayes classfer on tranng data set D. 2. Obtan data sets {D } n decson regons of usng the followng formula. D = {( x, y) ( x ) = Y}. 3. For each D, compute tranng accuracy of. If t s 1%, s_used[] = false. If t s not 1%, tran C on D to obtan C. If tranng accuracy of C on D s greater than that of, then s_used[] = true, otherwse s_used[] = false. Test phase: Input: an SD tree, test nstance x Output: label y 1. Obtan predcton y of for x. 2. If s_used[y ]==false, y=y. Otherwse, y=cy (x). Fg. 6 Procedure of the SD algorthm There are many methods for generatng regons wth overlaps. In ths sub-secton, we adopt the method smlar to that whch the dvde-and-conquer algorthm (Frosynots et al., 23) uses to generate overlapped regons and combne classfers. The dvde-and-conquer algorthm frst uses a clusterng algorthm to obtan data sub-sets {D }, and then trans several mult-layered perceptron classfers (Pal and Mtra, 1992) on these data sub-sets. The clusterng algorthm used s fuzzy C-means algorthm (Bezdek, 1981) or greedy EM algorthm (Vlasss and Lkas, 22). In the test phase, predctons of these classfers are combned to make the fnal predcton. The membershp degree of nstance (x, y) belongng to cluster s denoted as u(cluster, x). Data sub-set D s obtaned by Eq. (11). D = {( x, y) u( cluster, x ) > q}. (11) The dvde-and-conquer algorthm then uses Eq. (12) to obtan the probablty dstrbuton p(y x). py ( x) = ucluster (, x) IC ( ( x ) == Y). (12) j j j= The ndcator functon I(z) s defned n Eq. (13). 1, f z s true, I( z) = (13), otherwse. We adopt the above sub-set generatng method to obtan regons wth overlaps. The output of the Nave Bayes classfer s a probablty dstrbuton vector [p (Y x),, p (Y k-1 x)]. The data sub-set D s obtaned by Eq. (14). D = {( x, y) p ( Y x ) > q}. (14) From Eq. (14), t s clear that nstance (x, y) may be contaned n several data sub-sets. Snce classfer C may be a classfer outputs the probablty dstrbuton, we mprove the combnng method of the dvde-and-conquer algorthm. Our combnng method s n Eq. (15). py ( x) = p ( Y x) p ( Y x ). (15) j Cj j= Takng ndcator vector s_used nto consderaton, the output of p(y x) s calculated by Eq. (16). py ( x) = p ( Y x) p ( Y x) I( jt, ) j Cj j= + p ( Y x) I(, F). (16) For smplcty, we use I(j, T) and I(j, F) to represent I(s_used[j]==true) and I(s_used[j]==false) respectvely. Then we have Eq. (17). I( j, T) + I(, F) = 1. (17) Eq. (18) s true for every classfer that outputs the probablty dstrbuton. pc( Y x ) = 1. (18) = Then we have the followng dervaton.

6 6 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press k 1 k 1 k 1 py ( x) = p ( Y x) p ( Y x) I( jt, ) j Cj = = j= + p ( Y x) I(, F) = k 1 k 1 = p ( Y x) I( j, T) p ( Y x) j C j j= = + p ( Y x) I(, F) = j j= = = p Y = = p ( Y x) I( j, T) + p ( Y x) I(, F) = p ( Y x) ( I( j, T) + I(, F)) = = 1 ( x) The fnal output s determned by Eq. (19). y = arg max p( Y x ). (19) Y Y Fnally, the soft verson of the SD algorthm (SD-soft) s shown n Fg Tme complexty of the SD algorthm We frst assume the tranng data has m attrbutes, n nstances and k classes. Then we dscuss the tme complexty of the SD algorthm. It s clear that the SD algorthm ncreases the computatonal cost of the Nave Bayes classfer.the tranng and test tme complextes of the Nave Bayes classfer are O(mn) and O(km) respectvely (Webb et al., 25). In the tranng phase, after tranng the Nave Bayes classfer, the SD algorthm frst dvdes the tranng data set nto data sub-sets, and the tme complexty s O(kmn). Then the SD algorthm trans classfer C at most k tmes, and the tme complexty s O(kC tran (n/k)), n whch O(C tran (n/k)) s the tme complexty of tranng classfer C on data set of sze n/k. Thus the tranng tme complexty of the SD algorthm s O(mn)+O(kmn)+O(kC tran (n/k)). In the test phase, the SD algorthm frst uses to obtan the predcton y, and the tme complexty s O(km). Then the SD algorthm may use classfer C y to make the fnal predcton, and the tme complexty s O(C test (n/k)), n whch O(C test (n/k)) s the tme complexty of usng classfer C to make the predcton. Thus the test tme complexty of the SD algorthm s O(km)+O(C test (n/k)). Tranng phase: Input: tranng data set D, the Nave Bayes classfer, classfer C, probablty threshold q. Output: a two-level classfer tree named SD tree (composed of, C, ndcator vector s_used) 1. Tran the Nave Bayes classfer on tranng data set D. 2. Obtan data sets {D s } and data set {D } n decson regons of usng followng formulas. Ds = {( x, y) p ( Y x ) > q}. D = {( x, y) ( x ) = Y}. 3. For each D, compute tranng accuracy of. If t s 1%, s_used[] = false. If t s not 1%, tran C on D s to obtan C. If tranng accuracy of C on D s greater than that of, then s_used[] = true, otherwse s_used[] = false. Test phase: Input: a SD tree, test nstance x Output: label y 1. Obtan predcton y of for x. 2. If s_used[y ]==false, y=y. Otherwse obtan probablty dstrbuton of for x, and then use followng two equatons to obtan the fnal predcton y. py ( x) = p ( Y x) p ( Y x) I( jt, ) j Cj j= + p ( Y x) I(, F). y = arg max p( Y x ). Y Y Fg. 7 Procedure of SD-soft algorthm In ths paper, we use the Nave Bayes classfer, the C4.5 algorthm and SVM as leaf classfers, and SVM adopts sequental mnmal optmzaton (SMO) tranng algorthm (Platt, 1999). These three classfers are eager classfers, whose test tme complexty s very small. Thus we only compare the tranng tme complexty of the SD algorthm wth the Nave Bayes classfers, the Tree and the AODE algorthms. Tranng tme complextes are lsted n Table 1. From Table 1, t s clear that tranng tme complextes of SD(C=) and SD(C=C4.5) are smaller than that of the Tree algorthm, and the tranng tme complexty of SD(C=SVM) may be comparable wth that of the Tree algorthm. The tranng tme

7 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press 7 complexty of SD(C=C4.5) may be comparable wth that of the AODE algorthm. The tranng tme complexty of SD(C=SVM) s larger than that of the AODE algorthm. The computatonal cost of the SD-soft algorthm s larger than that of the SD algorthm because data set D s usually has more nstances than data set D. Table 1 Tranng tme complextes of eght classfers Classfer Tme Complexty and Lterature O(mn) (Webb et al., 25) Tree O(n 2 km 2 /v) (Zheng and Webb, 25) AODE O(m 2 n) (Webb et al., 25) C4.5 O(mnlogn) (Frank and Wtten, 1998) SMO O(n d ), d [1, 2.2] (Platt, 1999) SD(C=) O(mn)+O(kmn)+ O(mn) SD(C=C4.5) O(mn)+O(kmn)+O(mnlog(n/k)) SD(C=SMO) O(mn)+O(kmn)+O(n d /k d-1 ) k s the number of classes. m s the number of attrbutes. v s the average number of values for an attrbute. n s the number of tranng nstances. 4 Expermental results In ths secton, we perform experments to use the SD and the SD-soft algorthms to tran the Nave Bayes classfer, the C4.5 algorthm and SVM n decson regons of the Nave Bayes classfer. We frst compare the SD-algorthm wth the Tree and the AODE algorthms, and then we compare the SD-soft algorthm wth the SD algorthm. The expermental envronment s WEKA (Wtten and Frank, 25). The Nave Bayes classfer, the Tree algorthm and the C4.5 algorthm adopt NaveBayes, Tree and J48 methods of WEKA respectvely, wth default settngs. J48 method s an mplementaton of the C4.5 release 8 (Qunlan, 1996). The AODE algorthm adopts AODE method of WEKA, and adopts the supervsed dscreton method to dscretze contnual attrbutes. SVM adopts SMO method of WEKA, and adopts a lnear kernel functon. The expermental data sets are 3 UCI data sets. We remove nstances wth unknown values from these data sets. After the above preprocessng, descrptons of these data sets are lsted n Table 2. Table 2 Descrptons of 3 UCI data sets after preprocessng Dataset No. of No. of No. of attrbutes classes nstances anneal balance-scale breast-cancer brdges car credt-g dermatology dabetes ecol haberman heart-c heart-statlog hepatts onosphere rs lver-dsorders lung-cancer molecular mushroom postoperatve segment solar sonar soybean sponge tc-tac-toe vehcle vote vowel zoo We adopt 1 tmes 1-fold cross valdaton to obtan test accuraces and adopt two-taled t-test wth 95% confdence to compare dfferent classfers to obtan wn-loss-te (w-l-t). Due to space lmtaton, only t-test results are lsted. 4.1 The SD algorthm We frst use the Nave Bayes classfer as the leaf classfer. The SD algorthm can be appled twce and three tmes, and they are denoted as 2SD and 3SD respectvely. We tran the Nave Bayes () classfer, SD, 2SD and 3SD, and then compare these classfers by t-test. The t-test results are lsted n Table 3.

8 8 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press Table 3 The t-test results (w-l-t) of applyng the SD algorthm once, twce and three tmes SD vs. 2SD vs. 3SD vs. 2SD vs. SD 3SD vs. 2SD From Table 3, SD, 2SD and 3SD all can mprove the generalzaton ablty of the Nave Bayes classfer. Compared wth the Nave Bayes classfer, 2SD obtans the best t-test result. Thus t s necessary to apply the SD algorthm twce. Then we compare 2SD wth the Tree and the AODE algorthms. Table 4 lsts the t-test results. Table 4 The t-test results (w-l-t) of comparng 2SD wth Tree and AODE 2SD vs. Tree 2SD vs. AODE From Table 4, 2SD can obtan better generalzaton ablty than the Tree algorthm, but the generalzaton ablty of 2SD s worse than that of the AODE algorthm. Another advantage of 2SD over the Tree algorthm s that the tranng tme of 2SD s much shorter than that of the Tree algorthm, because the SD algorthm does not adopt cross valdaton n the tranng phase. We then use the C4.5 algorthm and SVM as leaf classfers of the SD algorthm. Test accuraces obtaned by the SD algorthm are compared wth those obtaned by the Nave Bayes classfer, leaf classfers, the Tree and the AODE algorthms. The t-test results are lsted n Table 5. Table 5 The t-test results (w-l-t) of comparng the SD algorthm wth other four classfers SD C Tree AODE C=C C=SVM From table 5, t s clear when usng C4.5 algorthm and SVM as leaf classfers the SD algorthm can obtan better generalzaton abltes than the Nave Bayes classfer, leaf classfers, the Tree and the AODE algorthms. 4.2 The SD-soft algorthm In ths sub-secton, we compare the SD-soft algorthm wth the SD algorthm. The selecton of q value s crucal for the SD-soft algorthm. In our experment, the SD-soft algorthm s traned wth q values n Eq. (2). q= p/ m, p= {.,.2,.4,..., 2.}. (2) We tran the SD-soft algorthm wth 11 q values and the SD algorthm on 3 UCI data sets. The leaf classfers used are the Nave Bayes classfer, the C4.5 algorthm and SVM. We frst compare average test accuraces of the SD algorthm wth those of the SD-soft algorthm wth dfferent q values. Average test accuraces are shown n Fg. 8. The best q value we select for a data set s the one that makes the SD-soft algorthm obtan the best average test accuracy on ths data set. Then the SD-soft algorthm wth the best q value s compared wth the SD algorthm. The t-test results are lsted n Table 6. accuracy SD(C=) SD-soft (C=) SD(C=C4.5) SD-soft(C=C4.5) SD(C=SVM) SD-soft (C=SVM) p Fg. 8 Test accuraces of the SD and the SD-soft algorthms wth dfferent p values From Fg. 8, when usng the Nave Bayes classfer or SVM as the leaf classfer, the SD-soft algorthm obtans worse test accuraces than the SD algorthm. However when usng the C4.5 algorthm as the leaf classfer and when the p value s.6,.8 or 1., the SD-soft algorthm obtans better test accuraces. From Table 6, when the best average test accuracy s selected for each data set, the SD-soft algorthm can obtan better generalzaton ablty than the SD algorthm. Thus t s possble for the SD-soft al-

9 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press 9 gorthm to obtan better generalzaton ablty than the SD algorthm f approprate q value s selected. Table 6 The t-test results (w-l-t) of comparng the SD-soft algorthm wth the SD algorthm SD-soft C= C=C4.5 C=SVM t-test Dscussons In ths secton, we frst propose other two SD-soft algorthms. Then we apply the SD algorthm to the AODE algorthm, the C4.5 algorthm and SVM. Fnally we dscuss why the SD algorthm works. 5.1 Other two SD-soft algorthms The SD-soft algorthm s proposed to deal wth the overlapped regons generated by the Nave Bayes classfer. The SD-soft algorthm adopts the method whch the dvde-and-conquer algorthm uses to deal wth the overlapped regons. In ths sub-secton, we nvestgate two other methods of dealng wth the overlapped regons. We frst use Eq. (21) to obtan the data set {D s } of the SD-soft algorthm, and the resultng algorthm s named SD-soft.v2. When q.5, data set D s s the same as data set D. Thus when q.5, the SD-soft.v2 algorthm uses the predcton method of the SD algorthm to make the predcton. D = D {( x, y) ( x) Y, p ( Y x ) > q}. (21) s The SD-soft and the SD-soft.v2 algorthms are both based on the condtonal probablty p (Y x). Next we propose another SD-soft algorthm based on the predcton rank of label Y. The functon r(, x, Y ) (Eq. (22)) calculates the predcton rank of label Y. r (, x, Y) = { Y p ( Y x) > p ( Y x )} + 1. (22) j j We use Eq. (23) to obtan the data set {D s } of the SD-soft algorthm, and the resultng algorthm s named SD-soft.v3. When k=1, data set D s s the same as data set D. Thus when k=1, the SD-soft.v3 algorthm uses the predcton method of the SD algorthm to make the predcton. D = {( x, y) r(, x, Y) k, p ( Y x ) > }. (23) s We compare above two new SD-soft algorthms wth the SD algorthm. The SD-soft.v2 algorthm uses the q value n Eq. (2). The k values of the SD-soft.v3 algorthm are 1, 2 and 3. For these two new algorthms, we select the best average test accuracy amongst dfferent argument values for each data set, and compare t wth the test accuracy of the SD algorthm. The t-test results are lsted n Table 7. Table 7 The t-test results (w-l-t) of comparng two SD-soft algorthms wth the SD algorthm SD-soft C= C=C4.5 C=SVM v v From Table 7, these two new algorthms both can obtan better generalzaton abltes than the SD algorthm. Comparng Table 7 wth Table 6, we fnd that they cannot obtan better generalzaton abltes than the SD-soft algorthm, except that when the SD-soft.v3 algorthm adopts SVM as the leaf classfer. Lke the SD-soft algorthm, argument selecton s crucal for the above two new algorthms. 5.2 Applyng the SD algorthm to the AODE algorthm, the C4.5 algorthm and SVM We frst adopt the SD algorthm to tran the C4.5 algorthm and SVM n decson regons of the AODE algorthm. The t-test results are lsted n Table 8. Table 8 The t-test results (w-l-t) of applyng the SD algorthm to the AODE algorthm SD(AODE+C) vs. AODE vs. C C=C C=SVM From Table 8, tranng C4.5/SVM n decson regons of the AODE algorthm by the SD algorthm can mprove generalzaton abltes of both the AODE algorthm and leaf classfers. We then adopt the SD algorthm to tran the C4.5 algorthm, the Nave Bayes classfer and SVM n decson regons of the C4.5 algorthm and SVM. The t-test results are lsted n Table 9 and 1 respectvely.

10 1 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press Table 9 The t-test results (w-l-t) of applyng the SD algorthm to the C4.5 algorthm SD(C4.5+C) vs. C4.5 vs. C C=C C= C=SVM Table 1 The t-test results (w-l-t) of applyng the SD algorthm to SVM SD(SVM+C) vs. SVM vs. C C=SVM C=C C= From Table 9 and 1, the SD algorthm s applcable for nether the C4.5 algorthm nor SVM. SD(C4.5+C4.5) mproves the generalzaton ablty of C4.5 algorthm n only sx data sets. SD(C4.5+) and SD(C4.5+SVM) both mprove the generalzaton ablty of the C4.5 algorthm, but they reduce the generalzaton abltes of the Nave Bayes classfer and SVM. SD(SVM+SVM) mproves the generalzaton ablty of SVM n eght data sets, but reduces t n fve data sets. SD(SVM+C4.5) and SD(SVM+) cannot mprove the generalzaton ablty of SVM, although they mprove the generalzaton abltes of the C4.5 algorthm and the Nave Bayes classfer n over half the data sets. Thus the SD algorthm s applcable for the AODE algorthm, but t s applcable for nether the C4.5 algorthm nor SVM. 5.3 Why does the SD algorthm work? From expermental results of secton 4 and sub-secton 5.2, the SD algorthm s applcable for both the Nave Bayes classfer and the AODE algorthm, but t s not applcable for ether the C4.5 algorthm or SVM. In ths sub-secton we attempt to nvestgate the reason. Huang et al (28) dvdes classfers nto the global learnng and the local learnng. The global learnng makes descrptons of data, whereas the local learnng captures the local useful nformaton from data. The global learnng ams to estmate a dstrbuton of data, whereas the local learnng ams to obtan decson boundares for classfcaton. The dsadvantage of the global learnng les n that t may not make good use of the nformaton contaned n nstances near decson boundares. The dsadvantage of the local learnng les n that t may not make good use of the global nformaton contaned n the whole data set. The Nave Bayes classfer and the AODE algorthm belong to the global learnng, and the C4.5 algorthm and SVM belong to the local learnng. The Nave Bayes classfer and the AODE algorthm make good use of the global nformaton, but may not make good use of the local nformaton near decson boundares. The C4.5 algorthm and SVM make good use of the local nformaton, but may not make good use of the global nformaton. For the Nave Bayes classfer and the AODE algorthm, tranng the C4.5 algorthm and SVM n ther decson regons can make good use of the local nformaton contaned n nstances near decson boundares. For the C4.5 algorthm and SVM, beng traned n decson regons of the Nave Bayes classfer and the AODE algorthm can make good use of the global nformaton contaned n the whole data set. From sub-secton 5.2, t cannot mprove generalzaton abltes of the C4.5 algorthm and SVM by tranng the Nave Bayes classfer, the C4.5 algorthm and SVM n decson regons of the C4.5 algorthm and SVM, whch ndcates that t cannot obtan better generalzaton abltes to tran classfers n decson regons of the local learnng classfers by applyng the SD algorthm. The Tree algorthm can be regarded as tranng the Nave Bayes classfer n decson regons of the C4.5 algorthm and obtans better generalzaton ablty than both classfers (Kohav, 1996). However t dffers from usng the SD algorthm to tran the Nave Bayes classfer n decson regons of the C4.5 algorthm (SD(C4.5+)). Frst, the Tree algorthm trans the Nave Bayes classfer n regons generated by the C4.5 algorthm, not the real decson regons of C4.5 algorthm. Second, when tranng the C4.5 algorthm, the Tree algorthm takes test accuracy of the Nave Bayes classfer nto consderaton, whereas the SD algorthm does not consder ths factor. Thus the generalzaton ablty of the Tree algorthm s better than that of SD(C4.5+). The success of the Tree algorthm ndcates that t wll need much more complex methods to tran classfers n decson regons of classfers belongng to the local learnng.

11 Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press 11 6 Conclusons Ths paper has nvestgated methods of tranng classfers n decson regons of the Nave Bayes classfer. The SD and three SD-soft algorthms all generate two-level classfer trees wth the Nave Bayes classfers as root nodes. Expermental results on 3 UCI data sets demonstrate the effectveness of these four new algorthms. When usng the C4.5 algorthm and SVM as leaf classfers, the SD algorthm can obtan better generalzaton ablty than the Nave Bayes classfer, leaf classfers, the Tree and the AODE algorthms. When appled twce and usng the Nave Bayes classfer as leaf classfers, the SD algorthm can obtan better generalzaton ablty than the Nave Bayes classfer and the Tree algorthm. When usng the C4.5 algorthm and SVM as leaf classfers, three SD-soft algorthms can obtan better generalzaton abltes than the SD algorthm, but argument selecton s crucal for these three SD-soft algorthms. The SD algorthm s also applcable for the AODE algorthm, but t s not applcable for the C4.5 algorthm or SVM. The SD and SD-soft algorthms can make good use of the nformaton contaned n nstances near decson boundares, and the nformaton may be gnored by global learnng classfers, such as the Nave Bayes classfer and the AODE algorthm. The SD and SD-soft algorthms can be regarded as a new method of generatng a hybrd of the global learnng and the local learnng. References Bezdek, J.C., Pattern Recognton wth fuzzy objectve functon algorthms. Plenum Press, New York. Bshop, C.M., 26. Pattern Recognton and Machne Learnng. Seres: Informaton Scence and Statstcs. Sprnger Verlag, New York, p Domngos, P., Pazzan, M., On the Optmalty of the Smple Bayesan Classfer under Zero-one Loss. Machne Learnng, 29: [do:1.123/a: ] Frank, E., Wtten, I.H., Generatng Accurate Rule Sets wthout Global Optmzaton. Ffteenth Internatonal Conference on Machne Learnng, Morgan Kaufmann, San Francsco, CA, USA, p Frosynots D, Stafylopats A, Lkas A., 23. A dvde-and-conquer method for mult-net classfers. Pattern Analyss and Applcatons, 6(1):32-4. [do:1. 17/s ] Huang, K.Z., Yang, H.Q., Kng, I., Lyu, M., 28. Machne Learnng: Modelng Data Locally and Globally. Sprnger-Verlag, New York Inc, p Kohav, R., Scalng Up the Accuracy of Nave-Bayes Classfers: A Decson-Tree Hybrd. Second Internatonal Conference on Knowledge Dscovery and Data Mnng, AAAI Press, p Mtchell, T.M., Machne Learnng. WCB/McGraw-Hll, p Murphy, P.M., Aha, D.W., UCI Repostory of machne learnng databases [ MLRepostory.html], Irvne, CA: Unversty of Calforna, Department of Informaton and Computer Scence. Pal, S.K., Mtra, S., Mult-layer perceptron, fuzzy sets and classfcaton. IEEE Transacton on Neural Networks, 3: [do:1.119/ ] Platt, J.C, Fast Tranng of Support Vector Machnes Usng Sequental Mnmal Optmzaton. In: Scholkopf, B., Burges, C., Smola, A. (Eds.), Advances n Kernel Methods: Support Vector Machnes. MIT Press, Cam-brdge, MA, p Qunlan, J.R., C4.5: Programs for Machne Learnng. Morgan Kaufmann, San Mateo. Qunlan, J.R., Improved use of contnuous attrbutes n C4.5. Journal of Artfcal Intellgence Research, 4: Vapnk, V.N., The Nature of Statstcal Learnng Theory. Sprnger, Berln Hedelberg. Vlasss, N. Lkas, A., 22. A Greedy EM Algorthm for Gaussan Mxture Learnng. Neural Processng Letters, 15(1): [do:1.123/a: ] Webb, G.I., Boughton, J., Wang, Z., 25. Not so nave Bayes: Aggregatng one-dependence estmators. Machne Learnng, 58:5 24. [do:1.17/s ] Wtten, I.H., Frank, E., 25. Data Mnng: Practcal Machne Learnng Tools and Technques, 2nd Edton. Morgan Kaufmann, San Francsco. Wu, X., Kumar, V., Qunlan, J.R., et al, 28. Top 1 Algo-rthms n Data Mnng. Knowledge and Informaton Sys-tems, 14(1):1-37. [do: 1.17/s ] Zheng, F., Webb, G.I., 25, A Comparatve Study of Sem-nave Bayes Methods n Classfcaton Learnng. Fourth Australasan Data Mnng Workshop, Unversty of Technology Sydney, p

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.