Generalized Additive Bayesian Network Classifiers

Size: px

Start display at page:

Download "Generalized Additive Bayesian Network Classifiers"

Rodger Julian Sparks
5 years ago
Views:

1 Generalzed Addtve Bayesan Network Classfers Janguo L and Changshu Zhang and Tao Wang and Ymn Zhang Intel Chna Research Center, Bejng, Chna Department of Automaton, Tsnghua Unversty, Chna {janguo.l, tao.wang, ymn.zhang}@ntel.com, zcs@mal.tsnghua.edu.cn Abstract Bayesan network classfers (BNC) have receved consderable attenton n machne learnng feld. Some specal structure BNCs have been proposed and demonstrate promse performance. However, recent researches show that structure learnng n BNs may lead to a non-neglgble posteror problem,.e, there mght be many structures have smlar posteror scores. In ths paper, we propose a generalzed addtve Bayesan network classfers, whch transfers the structure learnng problem to a generalzed addtve models (GAM) learnng problem. We frst generate a seres of very smple BNs, and put them n the framework of GAM, then adopt a gradent-based algorthm to learn the combnng parameters, and thus construct a more powerful classfer. On a large sute of benchmark data sets, the proposed approach outperforms many tradtonal BNCs, such as nave Bayes, TAN, etc, and acheves comparable or better performance n comparson to boosted Bayesan network classfers. 1 Introducton Bayesan networks (BN), also known as probablstc graphcal models, graphcally represent the jont probablty dstrbuton of a set of random varables, whch explot the condtonal ndependence among varables to descrbe them n a compact manner. Generally, a BN s assocated wth a drected acyclc graph (DAG), n whch the nodes correspond to the varables n the doman and the edges correspond to drect probablstc dependences between them [Pearl, 1988]. Bayesan network classfers (BNC) characterze the condtonal dstrbuton of the class varables gven the attrbutes, and predct the class label wth the hghest condtonal probablty. BNCs have been successfully appled n many areas. Nave Bayesan (NB) [Langley et al., 1992] s the smplest BN, whch only consder the dependence between each feature x and the class varable y. Snce t gnores the dependence between dfferent features, NB may perform not well on data sets whch volate the ndependence assumpton. Many BNCs have been proposed to overcome NB s lmtaton. [Saham, 1996] proposed a general framework to descrbe the lmted dependence among feature varables, called k-dependence Bayesan network (kdb). [Fredman et al., 1997] proposed tree augmented Nave Bayes (TAN), a structure learnng algorthm whch learns a maxmum spannng tree (MST) from the attrbutes. Both TAN and kdb have tree-structure graph. K2 s an algorthm whch learns general BN for classfcaton purpose [Cooper and Herskovts, 1992]. The key dfferences between these BNCs are ther structure learnng algorthms. Structure learnng s the task of fndng out one graph structure that best characterzes the true densty of gven data. Many crtera, such as Bayesan scorng functon, mnmal descrpton length (MDL) and condtonal ndependence test [Cheng et al., 22], have been proposed for ths purpose. However, t s nevtable to encounter such a stuaton: several canddate graph structures have very close score value, and are non-neglgble n the posteror sense. Ths problem has been ponted out and presented theoretc analyss by [Fredman and Koller, 23]. Snce canddate BNs are all approxmatons of the true jont dstrbuton, t s natural to consder aggregatng them together to yeld a much more accurate dstrbuton estmaton. Several works have been done n ths manner. For example, [Thesson et al., 1998] proposed mxture of DAG, and [Jng et al., 25] proposed boosted Bayesan network classfers. In ths paper, a new soluton s proposed to aggregate canddate BNs. We put a seres of smple BNs nto the framework of generalzed addtve models [Haste and Tbshran, 199], and adopt a gradent-based algorthm to learn the combnng parameters, and thus construct a more powerful learnng machne. Experments on a large sute of benchmark data sets demonstrate the effectveness of the proposed approach. The rest of ths paper s organzed as follows. In Secton 2, we brefly ntroduce some typcal BNCs, and pont out the non-neglgble problem n structure learnng. In Secton 3, we propose the generalzed addtve Bayesan network classfers. To evaluate the effectveness of the proposed approach, extensve experments are conducted n Secton 4. Fnally, concludng remarks are gven n Secton 5. 2 Bayesan Network Classfers A Bayesan network B s a drected acyclc graph that encodes the jont probablty dstrbuton over a set of random varables x [x 1,, x d ] T. Denote the parent nodes of x by Pa(x ), the jont dstrbuton P B (x) can be represented by IJCAI-7 913

2 factors over the network structures as follows: d P B (x) P(x Pa(x )). 1 Gven data set D {(x, y)} n whch y s the class varable, BNCs characterze D by the jont dstrbuton P(x, y), and convert t to condtonal dstrbuton P(y x) for predctng the class label. 2.1 Several typcal Bayesan network classfers The Nave Bayesan (NB) network assumes that each attrbute varable only depends on the class varable,.e, d P(x, y) P(y)P(x y) P(y) P(x y). 1 Fgure 1(a) llustrates the graph structure of NB. Snce NB gnores the dependences among dfferent features, t may perform not well on data sets whch volate the attrbute ndependence assumpton. Many BNCs have been proposed to consder the dependence among features. [Saham, 1996] presented a more general framework for lmted dependence Bayesan networks, called k-dependence Bayesan classfers (kdb). Defnton 1: A k-dependence Bayesan classfer s a Bayesan network whch allows each feature x to have a maxmum of k feature varables as parents,.e., the number of varables n Pa(x ) equals to k+1 ( +1 means that k does not count the class varable y). Accordng to the defnton, NB s a -dependence BN. The kdb [Saham, 1996] algorthm adopts mutual nformaton I(x ; y) to measure the dependence between the th feature varable x and the class varable y, and condtonal mutual nformaton I(x, x j y) to measure the dependence between two feature varables x and x j.thenkdb employs a heurstc rule to construct the network structure va these two measures. kdb does not maxmze any optmal crteron n structure learnng. Hence, t yelds lmted performance mprovement over NB. [Keogh and Pazzan, 1999] proposed super-parent Bayesan networks (SPBN), whch assumes that there s an attrbute actng as publc parent (called super-parent) for all the other attrbutes. Suppose x s the super parent, denote the correspondng BN as P (x, y), we have P (x, y) P(y)P(x y)p(x \ x, y) n P(y)P(x y) P(x j x, y). (1) j1, j It s obvous that SPBN structure s a specal case of kdb (). Fgure 1(b) llustrates the graph structure of SPBN. The SPBN algorthm adopts classfcaton accuraces as the crteron to select out the best network structure. [Fredman et al., 1997] proposed tree augmented Nave Bayes (TAN), whch s also a specal case of kdb (). TAN attempts to add edges to the Nave Bayesan network n order to mprove the posteror estmaton. In detal, TAN frst computes the condtonal mutual nformaton I(x, x j y) between any two feature varables x and x j, and thus obtan a full adjacency matrx. Then TAN employs the mnmum spannng tree algorthm (MST) on the adjacency matrx to obtan a tree-structure BN. Therefore, TAN s optmal n the sense of MST. Many experments show that TAN sgnfcantly outperforms NB. Fgure 1(c) llustrates one possble graph structure of TAN. Both kdb and TAN generate tree-structure graph. [Cooper and Herskovts, 1992] proposed the K2 algorthm, whch adopts the K2 score measure and exhaustve search to learn general BN structures. 2.2 The structure learnng problem Gven tranng data D, structure learnng s the task of fndng a set of drected edges G that best characterzes the true densty of the data. Generally, structure learnng can be categorzed nto two levels: macro-level and mcro-level. In the macro-level, several canddate graph structures are known, and we need choosng the best one out. In order to avod overfttng, people often use model selecton methods, such as Bayesan scorng functon, mnmum descrptve length (MDL), etc [Fredman et al., 1997]. In the mcro-level, structure learnng cares about whether one edge n the graph should be exsted or not. In ths case, people usually employ the condtonal ndependence test to determne the mportance of edges [Cheng et al., 22]. However, n both cases, people may face such a stuaton that several canddates (graphs or edges) have very close scores. For nstance, suppose MDL s used as the crteron, people may encounter a stuaton that two canddate BN structure G 1 and G 2 have MDL score.899 and.9, respectvely. Whch one should be chosen? Someone may say that t s natural to select G 1 out snce t has a bt smaller MDL score, but practce may show that G 1 and G 2 have smlar performance, and G 2 may perform even better n some cases. In fact, both of them are non-neglgble n the posteror sense. Ths problem has been ponted out and presented theoretc analyss by [Fredman and Koller, 23]. It shows that when there are many models that can explan the data reasonably well, model selecton makes a somewhat arbtrary choce between these models. Besdes, the number of possble structures grows super-exponentally wth the number of random varables. For these two reasons, we don t want to do structure learnng drectly. We hope aggregatng a seres of smpler and weaker BNs together to obtan a much more accurate dstrbuton estmaton of the underlyng process. We note that several researchers have proposed some schemes for ths purpose, for examples, learnng mxtures of DAG [Thesson et al., 1998], or ensembles of Bayesan networks by model averagng [Rosset and Segal, 22; Webb et al., 25]. We brefly ntroduce them n the followng. 2.3 Model averagng for Bayesan networks Snce canddate BNs are all approxmatons of the true dstrbuton, model averagng s a natural way to combne canddates together for a more accurate dstrbuton estmaton. Mxture of DAG (MDAG) Defnton 2: If P(x θ c, G c ) s a DAG model, the followng equaton defnes a mxture of DAG model P(x θ s ) π cp(x θ c, G c ), c where π c s the pror for the c-th DAG model G c,andπ c, c π c 1, θ c s the parameter for graph G c. IJCAI-7 914

3 y x 1 x 2 x 3... x d (a) Nave Bayesan y x x j x j+1... x d (b) Super-parent kdb () y x 1 x 2 x 3 x 4... x d (c) One possble structure of TAN Fgure 1: Typcal Bayesan network structures MDAG learns the mxture models va maxmzng the posteror lkelhood of gven data set. In detal, MDAG combnng uses the Chesseman-Stutz approxmaton and the Expectaton-Maxmzaton algorthm for both the mxture components structure learnng and the parameter learnng. [Webb et al., 25] presented a specal and smple case of MDAG for classfcaton purpose, called the average one dependence estmaton (AODE). AODE adopts a seres of fxstructure smple BNs as the mxture components, and drectly assumes that all mxture components n MDAG have equal mxture coeffcent. Practces show that AODE outperforms Nave Bayes and TAN. Boosted Bayesan networks Boostng s another commonly used technque for combnng smple BNs. [Rosset and Segal, 22] employed the gradent Boostng algorthm [Fredman, 21] to combne BNs for densty estmaton. [Jng et al., 25] proposed boosted Bayesan network classfers (BBN), and adopted general AdaBoost algorthm to learn the weght coeffcents. Gven a seres of smple BNs: P (x, y), 1,, n, BBN ams to construct the fnal approxmaton by lnear addtve models: P(x, y) n 1 α P (x, y), where α are weght coeffcents, and α 1. More generally, the constrant on α can be relaxed, but only α skept: F(x, y) n 1 α P (x, y). In ths case, the posteror can be defned as follows exp{f(x, y)} P(y x) y exp{f(x, y )}. (2) For general bnary classfcaton problem y {-1,1}, ths problem can be solved by the exponent loss functon L(α) exp{ yf(x k, y)} (3) k va the AdaBoost algorthm [Fredman et al., 2]. 3 Generalzed addtve Bayesan networks In ths secton, we present a novel scheme that can aggregate a seres of smple BNs to a more accurate densty estmaton of the true process. Suppose P (x, y), 1,, n are the gven smple BNs, we consder puttng them n the framework of generalzed addtve models (GAM) [Haste and Tbshran, 199]. The new algorthm s called generalzed addtve Bayesan network classfers (). In the GAM framework, P (x, y) are consdered to be lnear addtve varables n the lnk functon space: n F(x, y) λ f [P (x, y)]. (4) s an extensble framework snce many dfferent lnk functons can be consdered. In ths paper, we study a specal lnk functon: f ( ) log( ). Defnng z (x, y) and takng exponent on both sdes of the above equaton, we have exp[f(z)] exp [ n λ f (z) ] n P λ (z). Ths s n fact a potental functon. It can also be wrtten as a probablstc dstrbuton when gven a normalzaton factor, P(z) 1 S λ (z) n P λ (z), (5) where S λ (z) s the normalzaton factor: { n S λ (z) P λ (z) } exp { n λ log P (z) }. (6) z z 1 The lkelhood of P(z) s called quas-lkelhood: L(λ) N log P(z k) { n λ log P (z k ) log S λ (z k ) } N N 1 { λ f(zk ) log S λ (z k ) }, (7) where λ [λ 1,,λ n ] T, f(z k ) [ f 1 (z k ),, f n (z k )] T. 3.1 The Quas-lkelhood optmzaton problem Maxmzng the quas-lkelhood, we can obtan the soluton of the addtve parameters. To make the GAM model meanngful and tractable, we add some constrants to the parameters. The fnal optmzaton problem turns to be: max L(λ) s.t. (1) λ 1 (2) λ 1 For equaton constrant, the Lagrange multpler can be adopted to transfer the problem nto an unconstrant one; whle for nequaton constrants, classcal nteror pont method (IPM) can be employed. In detal, the IPM utlzes barrer functons to transfer nequaton constrants nto a seres of unconstrant optmzaton problems [Boyd and Vandenberghe, 24]. (8) IJCAI-7 915

4 Here, we adopt the most used logarthmc barrer functon, and obtan the followng unconstrant optmzaton problem: L(λ, r k,α) r k n 1 log(λ ) + r k n 1 log(1 λ ) n + α(1 λ ) + L(λ) 1 [ r k log(λ) + log(1n λ) ] 1 n + α(1 λ 1 n ) + L(λ), (9) where 1 n ndcates a n-dmensonal vector wth all elements equal to 1, r k s the barrer factor n the kth step of the IPM teraton, and α s the Lagrange multpler. Therefore, n the kth IPM teraton step, we need to maxmze an unconstrant problem L(λ, r k,α). Quas-Newton method s adopted for ths purpose. 3.2 Quas-Newton method for the unconstrant optmzaton problem To solve the unconstrant problem: max L(λ, r k,α), we must have the gradent of L w.r.t λ. Theorem 1: The gradent of L(λ, r k,α) w.r.t λ s L(λ, r k,α) λ g λ N { f(zk ) E P(z) [f(z k )] } [ 1 + r k λ 1 ] 1n α1 n. (1) 1 n λ Proof: In Equaton (1), t s easy to obtan the gradent of the frst summaton term and non-summaton terms. Here, we only present the gradent soluton of the second summaton term,.e., log S λ (z k )nl(λ). log S λ (z) 1 S λ (z) λ S λ (z) λ S λ (z) z exp { λ f(z) } S λ (z) λ log S λ(z) λ f(z k )exp zk { λ f(z k ) } 1 { f(z k )exp λ f(zk ) } S λ (z) z k [ P(z k )f(z k ) E P(z) f(zk ) ]. z k For computatonal cost consderaton, we dd not further compute the second order dervatve of L(λ, r k,α), whle adopted the quas-newton method [Bshop, 1995] to solve the problem. In ths paper, the L-BFGS procedure provded by [Lu and Nocedal, 1989] s employed for ths task. 3.3 The IPM based tranng algorthm The nteror pont method starts from a pont n the feasble regon, sequentally adjusts the barrer factor r k n each teraton, and solves a seres unconstrant problem L(λ, r k,α), k 1, 2,. The detaled tranng algorthm s shown n Table 1. Table 1: The tranng algorthm for Input: Gven tranng set D {(x, y )} N 1 Tranng Algorthm S: set convergence precson ɛ>, and the maxmal step M; S1: ntalze the nteror pont as λ [λ 1,,λ n ] T, λ 1/n; S2: generate a seres of smple BNs: P (x, y), 1,, n; S3: for k 1:M S4: select r k > andr k < r k 1, obtan the kth step optmzaton problem L(λ, r k,α); S5: calculate g λ and the quas-lkelhood L(λ); S6: employ L-BFGS procedure to solve: max L(λ, r k,α); S7: test of the barrer term a k r k [ log(λ) + log(1n λ) ] 1 n ; S8: fa k <ɛjump to S9, else contnue the loop; S9: Output the optmal parameter λ, and obtan the fnal generalzed models P(z; λ ). 3.4 A seres of fx-structure Bayesan networks There are one unresolved problem n the algorthm lsted n Table 1, whch s n the S2 step,.e, how to generate a seres of smple BNs as the weak learner. There are many methods for ths purpose. In our experments, we take super parent BN as the weak learner. Readers may consder other possble strateges to generate smple BNs. For a d-dmensonal data set, when settng dfferent attrbute as the publc parent node accordng to Equaton (1), t can generate d dfferent fx-structure super-parent BNs: P (x, y), 1,, d. Fgure 1(b) depcts one example of ths knd of smple BNs. To mprove performance, mutual nformaton I(x, y) s computed for removng several BNs wth lowest mutual nformaton score. In ths way, we obtan n very smple BNs, and adopt them as weak learners n. Parameters (condtonal probablstc table) learnng n BNs s common, and thus detals are omtted here. Note that for robust parameter estmaton, Laplacan correcton and m- estmate [Cestnk, 199] are adopted. 3.5 Dscussons has several advantages over the typcal lnear addtve BN models: Boosted BN (BBN). Frst, s much more computatonal effcent than BBN. Gven d-dmensonal and N samples tranng set, t s not hard to prove that the computatonal complexty of s O(Nd 2 + MNd), where M s the IPM teraton steps. On the contrary, BBN requres sequentally learnng BN structures n each boostng step. Ths leads to a complexty of O(KNd 2 ), where K s the boostng step, whch s usually very large (n 1 2 magntude). Therefore, domnates BBN on scalable learnng task. Practce also demonstrates ths pont. Furthermore, presents a new drecton for combnng weaker learners snce t s a hghly extensble framework. We present a soluton for logarthmc lnk functon. It s not hard to adopt other lnk functons under the GAM framework, and thus propose new algorthms. Many exstng GAM propertes, optmzaton methods can be seamlessly adopted to ag- IJCAI-7 916

5 gregate smple BNs for more powerful learnng machnes. 4 Experments Ths secton evaluates the performance of the proposed algorthm, compared t wth other BNCs such as NB, TAN, K2, kdb, SPBN; model averagng methods such as AODE, BBN; and decson tree algorthm CART [Breman et al., 1984]. The benchmark platform was 3 data sets from the UCI machne learnng repostory [Newman et al., 1998]. One pont should be ndcated here: for BNCs, when data sets have contnuous features, we frst adopted dscretzaton method to transfer them nto dscrete features [Dougherty et al., 1995]. We employed 5-fold cross-valdaton for the error estmaton, and kept all compared algorthms havng the same fold splt. The fnal results are shown n Table 2, n whch the results by TAN and K2 are obtaned by the Java machne learnng toolbox Weka [Wtten and Frank, 2]. To present statstcal meanngful evaluaton, we conducted the pared t-test to compare wth others. The last row of Table 2 shows the wn/te/lose summary n 1% sgnfcance level of the test. In addton, Fgure 2 llustrates the scatter plot of the comparson results between and other classfers. We can see that outperforms most other BNCs, and acheves comparable performance to BBN. Specally note, the SPBN column shows results by the best ndvdual super-parent BN, whch are sgnfcant worse than. Ths demonstrates that t s effectve and meanngful to use GAM for aggregatng smple BNs. 5 Conclusons In ths paper, we propose a generalzed addtve Bayesan network classfers (). ams to avod the nonneglgble posteror problem n Bayesan network structure learnng. In detal, we transfer the structure learnng problem to a generalzed addtve models (GAM) learnng problem. We frst generate a seres of very smple Bayesan networks (BN), and put them n the framework of GAM, then adopt a gradent-based learnng algorthm to combne those smple BNs together, and thus construct a more powerful classfers. Experments on a large sute of benchmark data sets demonstrate that the proposed approach outperforms many tradtonal BNCs such as nave Bayes, TAN, etc, and acheves comparable or better performance n comparson to boosted Bayesan network classfers. Future work wll focus on other possble extensons wthn the framework. References [Bshop, 1995] C. M. Bshop. Neural Networks for Pattern Recognton. Oxford Unversty Press, London, [Boyd and Vandenberghe, 24] S. Boyd and L. Vandenberghe. Convex Optmzaton. Cambrdge Unversty Press, 24. [Breman et al., 1984] L. Breman, J. Fredman, R. Olshen, and C. Stone. Classfcaton And Regresson Trees. Wadsworth Internatonal Group, [Cestnk, 199] B. Cestnk. Estmatng probabltes: a crucal task n machne learnng. In the 9th European Conf. Artfcal Intellgence (ECAI), pages , 199. [Cheng et al., 22] J. Cheng, D. Bell, and W. Lu. Learnng belef networks from data: An nformaton theory based approach. Artfcal Intellgence, 137:43 9, 22. [Cooper and Herskovts, 1992] G. Cooper and E. Herskovts. A Bayesan method for the nducton of probablstc networks from data. Machne Learnng, 9:39 347, [Dougherty et al., 1995] J. Dougherty, R. Kohav, and M. Saham. Supervsed and unsupervsed dscretzaton of contnuous features. In the 12th Intl. Conf. Machne Learng (ICML), San Francsco, Morgan Kaufmann. [Fredman and Koller, 23] N. Fredman and D. Koller. Beng Bayesan about network structure: a Bayesan approach to structure dscovery n Bayesan networks. Machne Learnng, 5:95 126, 23. [Fredman et al., 1997] N. Fredman, D. Geger, and M. Goldszmdt. Bayesan network classfers. Machne Learnng, 29(2): , [Fredman et al., 2] J. Fredman, T. Haste, and R. Tbshran. Addtve logstc regresson: a statstcal vew of boostng. Annals of Statstcs, 28(337-47), 2. [Fredman, 21] J. Fredman. Greedy functon approxmaton: a gradent boostng machne. Annals of Statstcs, 29(5), 21. [Haste and Tbshran, 199] T. Haste and R. Tbshran. Generalzed Addtve Models. Chapman & Hall, 199. [Jng et al., 25] Y. Jng, V. Pavlovć, and J. Rehg. Effcent dscrmnatve learnng of Bayesan network classfers va boosted augmented nave Bayes. In the 22nd Intl. Conf. Machne Learnng (ICML), pages , 25. [Keogh and Pazzan, 1999] E. Keogh and M. Pazzan. Learnng augmented Bayesan classfers: A comparson of dstrbutonbased and classfcaton-based approaches. In 7th Intl. Workshop Artfcal Intellgence and Statstcs, pages , [Langley et al., 1992] P. Langley, W. Iba, and K. Thompson. An analyss of Bayesan classfers. In the 1th Natonal Conf. Artfcal Intellgenc (AAAI), pages , [Lu and Nocedal, 1989] D. Lu and J. Nocedal. On the lmted memory BFGS method for large-scale optmzaton. Mathematcal Programmng, 45:53 528, [Newman et al., 1998] D. Newman, S. Hettch, C. Blake, and C. Merz. UCI repostory of machne learnng databases, [Pearl, 1988] J. Pearl. Probablstc Reasonng n Intellgent Systems: Networks of Plausble Inference. Morgan Kaufmann, [Rosset and Segal, 22] S. Rosset and E. Segal. Boostng densty estmaton. In Advances n Neural Informaton Processng System (NIPS), 22. [Saham, 1996] M. Saham. Learnng lmted dependence Bayesan classfers. In the 2nd Intl. Conf. Knowledge Dscovery and Data Mnng (KDD), pages AAAI Press, [Thesson et al., 1998] B. Thesson, C. Meek, D. Heckerman, and et al. Learnng mxtures of DAG models. In Conf. Uncertanty n Artfcal Intellgence (UAI), pages , [Webb et al., 25] G. Webb, J. R. Boughton, and Zhha Wang. Not so nave Bayes: aggregatng one-dependence estmators. Machne Learnng, 58(1):5 24, 25. [Wtten and Frank, 2] I. Wtten and E. Frank. Data Mnng: Practcal Machne Learnng Tools and Technques wth Java Implementatons. Morgan Kaufmann Publshers, 2. IJCAI-7 917

6 Table 2: Testng error on 3 UCI data sets dataset BBN AODE TAN K2 kdb SPBN NB CART australa autos breast-cancer breast-w cmc cylnder-band dabetes german glass glass heart-c heart-stat onosphere rs letter lver lymph page-blocks post-operatve satmg segment sonar soybean-bg tae vehcle vowel waveform wavef+nose wdbc yeast average wn/te/lose 11/1/9 17/9/4 19/6/5 21/6/3 29//1 27/1/2 26/1/3 23/4/ BBN AODE TAN K (a) vs BBN (b) vs AODE (c) vs TAN (d) vs K kdb SPBN NB CART (e) vs kdb (f) vs SPBN (g) vs NB (h) vs CART Fgure 2: Scatter plots for expermental results on 3 UCI data sets. Each plot shows the relatve error rate of and one compared algorthm. Ponts above the dagonal lne correspond to data sets where performs better than the compared algorthm. IJCAI-7 918

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.