GenSVM: A Generalized Multiclass Support Vector Machine

Size: px
Start display at page:

Download "GenSVM: A Generalized Multiclass Support Vector Machine"

Transcription

1 Journal of Machne Learnng Research 17 (2016) 1-42 Submtted 12/14; Revsed 11/16; Publshed 12/16 GenSVM: A Generalzed Multclass Support Vector Machne Gerrt J.J. van den Burg Patrck J.F. Groenen Econometrc Insttute Erasmus Unversty Rotterdam P.O. Box DR Rotterdam The Netherlands burg@ese.eur.nl groenen@ese.eur.nl Edtor: Sathya Keerth Abstract Tradtonal extensons of the bnary support vector machne (SVM) to multclass problems are ether heurstcs or requre solvng a large dual optmzaton problem. Here, a generalzed multclass SVM s proposed called GenSVM. In ths method classfcaton boundares for a K-class problem are constructed n a (K 1)-dmensonal space usng a smplex encodng. Addtonally, several dfferent weghtngs of the msclassfcaton errors are ncorporated n the loss functon, such that t generalzes three exstng multclass SVMs through a sngle optmzaton problem. An teratve majorzaton algorthm s derved that solves the optmzaton problem wthout the need of a dual formulaton. Ths algorthm has the advantage that t can use warm starts durng cross valdaton and durng a grd search, whch sgnfcantly speeds up the tranng phase. Rgorous numercal experments compare lnear GenSVM wth seven exstng multclass SVMs on both small and large data sets. These comparsons show that the proposed method s compettve wth exstng methods n both predctve accuracy and tranng tme, and that t sgnfcantly outperforms several exstng methods on these crtera. Keywords: support vector machnes, SVM, multclass classfcaton, teratve majorzaton, MM algorthm, classfer comparson 1. Introducton For bnary classfcaton, the support vector machne has shown to be very successful (Cortes and Vapnk, 1995). The SVM effcently constructs lnear or nonlnear classfcaton boundares and s able to yeld a sparse soluton through the so-called support vectors, that s, through those observatons that are ether not perfectly classfed or are on the classfcaton boundary. In addton, by regularzng the loss functon the overfttng of the tranng data set s curbed. Due to ts desrable characterstcs several attempts have been made to extend the SVM to classfcaton problems where the number of classes K s larger than two. Overall, these extensons dffer consderably n the approach taken to nclude multple classes. Three types of approaches for multclass SVMs (MSVMs) can be dstngushed. Frst, there are heurstc approaches that use the bnary SVM as an underlyng classfer and decompose the K-class problem nto multple bnary problems. The most commonly used heurstc s the one-vs-one (OvO) method where decson boundares are constructed c 2016 Gerrt J.J. van den Burg and Patrck J.F. Groenen.

2 Van den Burg and Groenen x 2 x 2 x 2 (a) One vs. One x 1 (b) One vs. All x 1 (c) Non-heurstc x 1 Fgure 1: Illustraton of ambguty regons for common heurstc multclass SVMs. In the shaded regons tes occur for whch no classfcaton rule has been explctly traned. Fgure (c) corresponds to an SVM where all classes are consdered smultaneously, whch elmnates any possble tes. Fgures nspred by Statnkov et al. (2011). between each par of classes (Kreßel, 1999). OvO requres solvng K(K 1) bnary SVM problems, whch can be substantal f the number of classes s large. An advantage of OvO s that the problems to be solved are smaller n sze. On the other hand, the one-vs-all (OvA) heurstc constructs K classfcaton boundares, one separatng each class from all the other classes (Vapnk, 1998). Although OvA requres fewer bnary SVMs to be estmated, the complete data set s used for each classfer, whch can create a hgh computatonal burden. Another heurstc approach s the drected acyclc graph (DAG) SVM proposed by Platt et al. (2000). DAGSVM s smlar to the OvO approach except that the class predcton s done by successvely votng away unlkely classes untl only one remans. One problem wth the OvO and OvA methods s that there are regons of the space for whch class predctons are ambguous, as llustrated n Fgures 1a and 1b. In practce, heurstc methods such as the OvO and OvA approaches are used more often than other multclass SVM mplementatons. One of the reasons for ths s that there are several software packages that effcently solve the bnary SVM, such as LbSVM (Chang and Ln, 2011). Ths package mplements a varaton of the sequental mnmal optmzaton algorthm of Platt (1999). Implementatons of other multclass SVMs n hghlevel (statstcal) programmng languages are lackng, whch reduces ther use n practce. 1 The second type of extenson of the bnary SVM use error correctng codes. In these methods the problem s decomposed nto multple bnary classfcaton problems based on a constructed codng matrx that determnes the groupng of the classes n a specfc bnary subproblem (Detterch and Bakr, 1995; Allwen et al., 2001; Crammer and Snger, 2002b). Error correctng code SVMs can thus be seen as a generalzaton of OvO and OvA. In Detterch and Bakr (1995) and Allwen et al. (2001), a codng matrx s constructed that determnes whch class nstances are pared aganst each other for each bnary SVM. Both approaches requre that the codng matrx s determned beforehand. However, t s a pror 1. An excepton to ths s the method of Lee et al. (2004), for whch an R mplementaton exsts. See 2

3 Generalzed Multclass Support Vector Machne unclear how such a codng matrx should be chosen. In fact, as Crammer and Snger (2002b) show, fndng the optmal codng matrx s an NP-complete problem. The thrd type of approaches are those that optmze one loss functon to estmate all class boundares smultaneously, the so-called sngle machne approaches (Rfkn and Klautau, 2004). In the lterature, such methods have been proposed by, among others, Weston and Watkns (1998), Bredenstener and Bennett (1999), Crammer and Snger (2002a), Lee et al. (2004), and Guermeur and Monfrn (2011). The method of Weston and Watkns (1998) yelds a farly large quadratc problem wth a large number of slack varables, that s, K 1 slack varables for each observaton. The method of Crammer and Snger (2002a) reduces ths number of slack varables by only penalzng the largest msclassfcaton error. In addton, ther method does not nclude a bas term n the decson boundares, whch s advantageous for solvng the dual problem. Interestngly, ths approach does not reduce parsmonously to the bnary SVM for K = 2. The method of Lee et al. (2004) uses a sum-to-zero constrant on the decson functons to reduce the dmensonalty of the problem. Ths constrant effectvely means that the soluton of the multclass SVM les n a (K 1)-dmensonal subspace of the full K dmensons consdered. The sze of the margns s reduced accordng to the number of classes, such that asymptotc convergence s obtaned to the Bayes optmal decson boundary when the regularzaton term s gnored (Rfkn and Klautau, 2004). Fnally, the method of Guermeur and Monfrn (2011) s a quadratc extenson of the method developed by Lee et al. (2004). Ths extenson keeps the sum-to-zero constrant on the decson functons, drops the nonnegatvty constrant on the slack varables, and adds a quadratc functon of the slack varables to the loss functon. Ths means that at the optmum the slack varables are only postve on average, whch dffers from common SVM formulatons. The exstng approaches to multclass SVMs suffer from several problems. All current sngle machne multclass extensons of the bnary SVM rely on solvng a potentally large dual optmzaton problem. Ths can be dsadvantageous when a soluton has to be found n a small amount of tme, snce teratvely mprovng the dual soluton does not guarantee that the prmal soluton s mproved as well. Thus, stoppng early can lead to poor predctve performance. In addton, the dual of such sngle machne approaches should be solvable quckly n order to compete wth exstng heurstc approaches. Almost all sngle machne approaches rely on msclassfcatons of the observed class wth each of the other classes. By smply summng these msclassfcaton errors (as n Lee et al., 2004) observatons wth multple errors contrbute more than those wth a sngle msclassfcaton do. Consequently, observatons wth multple msclassfcatons have a stronger nfluence on the soluton than those wth a sngle msclassfcaton, whch s not a desrable property for a multclass SVM, as t overemphaszes objects that are msclassfed wth respect to multple classes. Here, t s argued that there s no reason to penalze certan msclassfcaton regons more than others. Sngle machne approaches are preferred for ther ablty to capture the multclass classfcaton problem n a sngle model. A parallel can be drawn here wth multnomal regresson and logstc regresson. In ths case, multnomal regresson reduces exactly to the bnary logstc regresson method when K = 2, both technques are sngle machne approaches, and many of the propertes of logstc regresson extend to multnomal regresson. Therefore, 3

4 Van den Burg and Groenen t can be consdered natural to use a sngle machne approach for the multclass SVM that reduces parsmonously to the bnary SVM when K = 2. The dea of castng the multclass SVM problem to K 1 dmensons s appealng, snce t reduces the dmensonalty of the problem and s also present n other multclass classfcaton methods such as multnomal regresson and lnear dscrmnant analyss. However, the sum-to-zero constrant employed by Lee et al. (2004) creates an addtonal burden on the dual optmzaton problem (Dogan et al., 2011). Therefore, t would be desrable to cast the problem to K 1 dmensons n another manner. Below a smplex encodng wll be ntroduced to acheve ths goal. The smplex encodng for multclass SVMs has been proposed earler by Hll and Doucet (2007) and Mroueh et al. (2012), although the method outlned below dffers from these two approaches. Note that the smplex codng approach by Mroueh et al. (2012) was shown to be equvalent to that of Lee et al. (2004) by Ávla Pres et al. (2013). An advantage of the smplex encodng s that n contrast to methods such as OvO and OvA, there are no regons of ambguty n the predcton space (see Fgure 1c). In addton, the low dmensonal projecton also has advantages for understandng the method, snce t allows for a geometrc nterpretaton. The geometrc nterpretaton of exstng sngle machne multclass SVMs s often dffcult snce most are based on a dual optmzaton approach wth lttle attenton for a prmal problem based on hnge errors. A new flexble and general multclass SVM s proposed, called GenSVM. Ths method uses the smplex encodng to formulate the multclass SVM problem as a sngle optmzaton problem that reduces to the bnary SVM when K = 2. By usng a flexble hnge functon and an l p norm of the errors the GenSVM loss functon ncorporates three exstng multclass SVMs that use the sum of the hnge errors, and extends these methods. In the lnear verson of GenSVM, K 1 lnear combnatons of the features are estmated next to the bas terms. In the nonlnear verson, kernels can be used n a smlar manner as can be done for bnary SVMs. The resultng GenSVM loss functon s convex n the parameters to be estmated. For ths loss functon an teratve majorzaton (IM) algorthm wll be derved wth guaranteed descent to the global mnmum. By solvng the optmzaton problem n the prmal t s possble to use warm starts durng a hyperparameter grd search or durng cross valdaton, whch makes the resultng algorthm very compettve n total tranng tme, even for large data sets. To evaluate ts performance, GenSVM s compared to seven of the multclass SVMs descrbed above on several small data sets and one large data set. The smaller data sets are used to assess the classfcaton accuracy of GenSVM, whereas the large data set s used to verfy feasblty of GenSVM for large data sets. Due to the computatonal cost of these rgorous experments only comparsons of lnear multclass SVMs are performed, and experments on nonlnear MSVMs are consdered outsde the scope of ths paper. Exstng comparsons of multclass SVMs n the lterature do not determne any statstcally sgnfcant dfferences n performance between classfers, and resort to tables of accuracy rates for the comparsons (for nstance Hsu and Ln, 2002). Usng suggestons from the benchmarkng lterature predctve performance and tranng tme of all classfers s compared usng performance profles and rank tests. The rank tests are used to uncover statstcally sgnfcant dfferences between classfers. Ths paper s organzed as follows. Secton 2 ntroduces the novel generalzed multclass SVM. In Secton 3, features of the teratve majorzaton theory are revewed and a number 4

5 Generalzed Multclass Support Vector Machne of useful propertes are hghlghted. Secton 4 derves the IM algorthm for GenSVM, and presents pseudocode for the algorthm. Extensons of GenSVM to nonlnear classfcaton boundares are dscussed n Secton 5. A numercal comparson of GenSVM wth exstng multclass SVMs on emprcal data sets s done n Secton 6. Secton 7 concludes the paper. 2. GenSVM Before ntroducng GenSVM formally, consder a small llustratve example of a hypothetcal data set of n = 90 objects wth K = 3 classes and m = 2 attrbutes. Fgure 2a shows the data set n the space of these two attrbutes x 1 and x 2, wth dfferent classes denoted by dfferent symbols. Fgure 2b shows the (K 1)-dmensonal smplex encodng of the data after an addtonal RBF kernel transformaton has been appled and the mappng has been optmzed to mnmze msclassfcaton errors. In ths fgure, the trangle shown n the center corresponds to a regular K-smplex n K 1 dmensons, and the sold lnes perpendcular to the faces of ths smplex are the decson boundares. Ths (K 1)-dmensonal space wll be referred to as the smplex space throughout ths paper. The mappng from the nput space to ths smplex space s optmzed by mnmzng the msclassfcaton errors, whch are calculated by measurng the dstance of an object to the decson boundares n the smplex space. Predcton of a class label s also done n ths smplex space, by fndng the nearest smplex vertex for the object. Fgure 2c llustrates the decson boundares n the orgnal space of the nput attrbutes x 1 and x 2. In Fgures 2b and 2c, the support vectors can be dentfed as the objects that le on or beyond the dashed margn lnes of ther assocated class. Note that the use of the smplex encodng ensures that for every pont n the predctor space a class s predcted, hence no ambguty regons can exst n the GenSVM soluton. The msclassfcaton errors are formally defned as follows. Let x R m be an object vector correspondng to m attrbutes, and let y denote the class label of object wth y {1,..., K}, for {1,..., n}. Furthermore, let W R m (K 1) be a weght matrx, and defne a translaton vector t R K 1 for the bas terms. Then, object s represented n the (K 1)-dmensonal smplex space by s = x W + t. Note that here the lnear verson of GenSVM s descrbed, the nonlnear verson s descrbed n Secton 5. To obtan the msclassfcaton error of an object, the correspondng smplex space vector s s projected on each of the decson boundares that separate the true class of an object from another class. For the errors to be proportonal wth the dstance to the decson boundares, a regular K-smplex n R K 1 s used wth dstance 1 between each par of vertces. Let U K be the K (K 1) coordnate matrx of ths smplex, where a row u k of U K gves the coordnates of a sngle vertex k. Then, t follows that wth k {1,..., K} and l {1,..., K 1} the elements of U K are gven by 1 f k l 2(l 2 +l) u kl = l f k = l + 1 (1) 2(l 2 +l) 0 f k > l + 1. See Appendx A for a dervaton of ths expresson. Fgure 3 shows an llustraton of how the msclassfcaton errors are computed for a sngle object. Consder object A wth true class 5

6 Van den Burg and Groenen x 2 s 2 x 2 (a) Input space x 1 (b) Smplex space s 1 x 1 (c) Input space wth boundares Fgure 2: Illustraton of GenSVM for a 2D data set wth K = 3 classes. In (a) the orgnal data s shown, wth dfferent symbols denotng dfferent classes. Fgure (b) shows the mappng of the data to the (K 1)-dmensonal smplex space, after an addtonal RBF kernel mappng has been appled and the optmal soluton has been determned. The decson boundares n ths space are fxed as the perpendcular bsectors of the faces of the smplex, whch s shown as the gray trangle. Fgure (c) shows the resultng boundares mapped back to the orgnal nput space, as can be seen by comparng wth (a). In Fgures (b) and (c) the dashed lnes show the margns of the SVM soluton. y A = 2. It s clear that object A s msclassfed as t s not located n the shaded area that has Vertex u 2 as the nearest vertex. The boundares of the shaded area are gven by the perpendcular bsectors of the edges of the smplex between Vertces u 2 and u 1 and between Vertces u 2 and u 3, and form the decson boundares for class 2. The error for object A s computed by determnng the dstance from the object to each of these decson boundares. Let q (21) A and q (23) A denote these dstances to the class boundares, whch are obtaned by projectng s A = x A W + t on u 2 u 1 and u 2 u 3 respectvely, as llustrated n the fgure. Generalzng ths reasonng, scalars q (kj) can be defned to measure the projecton dstance of object onto the boundary between class k and j n the smplex space, as q (kj) = (x W + t )(u k u j ). (2) It s requred that the GenSVM loss functon s both general and flexble, such that t can easly be tuned for the specfc data set at hand. To acheve ths, a loss functon s constructed wth a number of dfferent weghtngs, each wth a specfc effect on the object dstances q (kj). In the proposed loss functon, flexblty s added through the use of the Huber hnge functon nstead of the absolute hnge functon, and by usng the l p norm of the hnge errors nstead of the sum. The motvaton for these choces follows. As s customary for SVMs a hnge loss s used to ensure that nstances that do not cross ther class margn wll yeld zero error. Here, the flexble and contnuous Huber hnge loss 6

7 Generalzed Multclass Support Vector Machne A s 2 q (23) A u 3 u 2 u 1 q (21) A s 1 u 1 u 2 u 2 u 3 Fgure 3: Graphcal llustraton of the calculaton of dstances q (y Aj) for an object A wth y A = 2 and K = 3. The fgure shows the stuaton n the (K 1)-dmensonal space. The dstance q (21) A s calculated by projectng s A = x A W + t on u 2 u 1, and the dstance q (23) A s found by projectng s A on u 2 u 3. The boundary between the class 1 and class 3 regons has been omtted for clarty, but les along u 2. s used (after the Huber error n robust statstcs, see Huber, 1964), whch s defned as 1 q κ+1 2 f q κ 1 h(q) = 2(κ+1) (1 q)2 f q ( κ, 1] (3) 0 f q > 1, wth κ > 1. The Huber hnge loss has been ndependently ntroduced n Chapelle (2007), Rosset and Zhu (2007), and Groenen et al. (2008). Ths hnge error s zero when an nstance s classfed correctly wth respect to ts class margn. However, n contrast to the absolute hnge error, t s contnuous due to a quadratc regon n the nterval ( κ, 1]. Ths quadratc regon allows for a softer weghtng of objects close to the decson boundary. Addtonally, the smoothness of the Huber hnge error s a desrable property for the teratve majorzaton algorthm derved n Secton 4.1. Note that the Huber hnge error approaches the absolute hnge for κ 1, and the quadratc hnge for κ. The Huber hnge error s appled to each of the dstances q (y j), for j y. Thus, no error s counted when the object s correctly classfed. For each of the objects, errors wth respect to the other classes are summed usng an l p norm to obtan the total object error K h j=1 j y p ( q (y j) 7 ) 1/p.

8 Van den Burg and Groenen The l p norm s added to provde a form of regularzaton on Huber weghted errors for nstances that are msclassfed wth respect to multple classes. As argued n the Introducton, smply summng msclassfcaton errors can lead to overemphaszng of nstances wth multple msclassfcaton errors. By addng an l p norm of the hnge errors the nfluence of such nstances on the loss functon can be tuned. Wth the addton of the l p norm on the hnge errors t s possble to llustrate how GenSVM generalzes exstng methods. For nstance, wth p = 1 and κ 1, the loss functon solves the same problem as the method of Lee et al. (2004). Next, for p = 2 and κ 1 t resembles that of Guermeur and Monfrn (2011). Fnally, for p = and κ 1 the l p norm reduces to the max norm of the hnge errors, whch corresponds to the method of Crammer and Snger (2002a). Note that n each case the value of κ can addtonally be vared to nclude an even broader famly of loss functons. To llustrate the effects of p and κ on the total object error, refer to Fgure 4. In Fgures 4a and 4b, the value of p s set to p = 1 and p = 2 respectvely, whle mantanng the absolute hnge error usng κ = A reference pont s plotted at a fxed poston n the area of the smplex space where there s a nonzero error wth respect to two classes. It can be seen from ths reference pont that the value of the combned error s hgher when p = 1. Wth p = 2 the combned error at the reference pont approxmates the Eucldean dstance to the margn, when κ 1. Fgures 4a, 4c, and 4d show the effect of varyng κ. It can be seen that the error near the margn becomes more quadratc wth ncreasng κ. In fact, as κ ncreases the error approaches the squared Eucldean dstance to the margn, whch can be used to obtan a quadratc hnge multclass SVM. Both of these effects wll become stronger when the number of classes ncreases, as ncreasngly more objects wll have errors wth respect to more than one class. Next, let ρ 0 denote optonal object weghts, whch are ntroduced to allow flexblty n the way ndvdual objects contrbute to the total loss functon. Wth these ndvdual weghts t s possble to correct for dfferent group szes, or to gve addtonal weghts to msclassfcatons of certan classes. When correctng for group szes, the weghts can be chosen as ρ = n n k K, G k, (4) where G k = { : y = k} s the set of objects belongng to class k, and n k = G k. The complete GenSVM loss functon combnng all n objects can now be formulated as L MSVM (W, t) = 1 n K ρ k=1 G k j k h p ( q (kj) ) 1/p + λ tr W W, (5) where λ tr W W s the penalty term to avod overfttng, and λ > 0 s the regularzaton parameter. Note that for the case where K = 2, the above loss functon reduces to the loss functon for bnary SVM gven n Groenen et al. (2008), wth Huber hnge errors. The outlne of a proof for the convexty of the loss functon n (5) s gven. note that the dstances q (kj) functon s convex n q (kj) s trvally convex n q (kj) Frst, n the loss functon are affne n W and t. Hence, f the loss t s convex n W and t as well. Second, the Huber hnge functon, snce each separate pece of the functon s convex, and the Huber 8

9 Generalzed Multclass Support Vector Machne s 1 s 2 s 1 s 2 (a) p = 1 and κ = 0.95 (b) p = 2 and κ = s 1 s 2 s 1 s 2 (c) p = 1 and κ = 1.0 (d) p = 1 and κ = 5.0 Fgure 4: Illustraton of the l p norm of the Huber weghted errors. Comparng fgures (a) and (b) shows the effect of the l p norm. Wth p = 1 objects that have errors w.r.t. both classes are penalzed more strongly than those wth only one error, whereas wth p = 2 ths s not the case. Fgures (a), (c), and (d) compare the effect of the κ parameter, wth p = 1. Ths shows that wth a large value of κ, the errors close to the boundary are weghted quadratcally. Note that s 1 and s 2 ndcate the dmensons of the smplex space. hnge s contnuous. Thrd, the l p norm s a convex functon by the Mnkowsk nequalty, and t s monotoncally ncreasng by defnton. Thus, t follows that the l p norm of the Huber weghted nstance errors s convex (see for nstance Rockafellar, 1997). Next, snce t s requred that the weghts ρ are non-negatve, the sum n the frst term of (5) s a convex combnaton. Fnally, the penalty term can also be shown to be convex, snce tr W W s the square of the Frobenus norm of W, and t s requred that λ > 0. Thus, t holds that the loss functon n (5) s convex n W and t. Predctng class labels n GenSVM can be done as follows. Let (W, t ) denote the parameters that mnmze the loss functon. Predctng the class label of an unseen sample x n+1 can then be done by frst mappng t to the smplex space, usng the optmal projecton: s n+1 = x n+1 W + t. The predcted class label s then smply the label correspondng to 9

10 Van den Burg and Groenen the nearest smplex vertex as measured by the squared Eucldean norm, or 3. Iteratve Majorzaton ŷ n+1 = arg mn s n+1 u k 2, for k = 1,..., K. (6) k To mnmze the loss functon gven n (5), an teratve majorzaton (IM) algorthm wll be derved. Iteratve majorzaton was frst descrbed by Weszfeld (1937), however the frst applcaton of the algorthm n the context of a lne search comes from Ortega and Rhenboldt (1970, p ). Durng the late 1970s, the method was ndependently developed by De Leeuw (1977) as part of the SMACOF algorthm for multdmensonal scalng, and by Voss and Eckhardt (1980) as a general mnmzaton method. For the reader unfamlar wth the teratve majorzaton algorthm a more detaled descrpton has been ncluded n Appendx B and further examples can be found n for nstance Hunter and Lange (2004). The asymptotc convergence rate of the IM algorthm s lnear, whch s less than that of the Newton-Raphson algorthm (De Leeuw, 1994). However, the largest mprovements n the loss functon wll occur n the frst few steps of the teratve majorzaton algorthm, where the asymptotc lnear rate does not apply (Havel, 1991). Ths property wll become very useful for GenSVM as t allows for a quck approxmaton to the exact SVM soluton n few teratons. There s no straghtforward technque for dervng the majorzaton functon for any gven functon. However, n the next secton the dervaton of the majorzaton functon for the GenSVM loss functon s presented usng an outsde-n approach. In ths approach, each functon that consttutes the loss functon s majorzed separately and the majorzaton functons are combned. Two propertes of majorzaton functons that are useful for ths dervaton are now formally defned. In these expressons, x s a supportng pont, as defned n Appendx B. P1. Let f 1 : Y Z, f 2 : X Y, and defne f = f 1 f 2 : X Z, such that for x X, f(x) = f 1 (f 2 (x)). If g 1 : Y Y Z s a majorzaton functon of f 1, then g : X X Z defned as g = g 1 f 2 s a majorzaton functon of f. Thus for x, x X t holds that g(x, x) = g 1 (f 2 (x), f 2 (x)) s a majorzaton functon of f(x) at x. P2. Let f : X Z and defne f : X Z such that f(x) = a f (x) for x X, wth a 0 for all. If g : X X Z s a majorzaton functon for f at a pont x X, then g : X X Z gven by g(x, x) = a g (x, x) s a majorzaton functon of f. Proofs of these propertes are omtted, as they follow drectly from the requrements for a majorzaton functon gven n Appendx B. The frst property allows for the use of the outsde-n approach to majorzaton, as wll be llustrated n the next secton. 4. GenSVM Optmzaton and Implementaton In ths secton, a quadratc majorzaton functon for GenSVM wll be derved. Although t s possble to derve a majorzaton algorthm for general values of the l p norm parameter, 2 2. For a majorzaton algorthm of the l p norm wth p 2, see Groenen et al. (1999). 10

11 Generalzed Multclass Support Vector Machne the followng dervaton wll restrct ths value to the nterval p [1, 2] snce ths smplfes the dervaton and avods the ssue that quadratc majorzaton can become slow for p > 2. Pseudocode for the derved algorthm wll be presented, as well as an analyss of the computatonal complexty of the algorthm. Fnally, an mportant remark on the use of warm starts n the algorthm s gven. 4.1 Majorzaton Dervaton To shorten the notaton, defne V = [t W ], z = [1 x ], δ kj = u k u j, such that q (kj) = z Vδ kj. Wth ths notaton t becomes suffcent to optmze the loss functon wth respect to V. Formulated n ths manner (5) becomes L MSVM (V) = 1 n K ρ k=1 G k j k h p ( q (kj) ) 1/p + λ tr V JV, (7) where J s an m + 1 dagonal matrx wth J, = 1 for > 1 and zero elsewhere. To derve a majorzaton functon for ths expresson the outsde-n approach wll be used, together wth the propertes of majorzaton functons. In what follows, varables wth a bar denote supportng ponts for the IM algorthm. The goal of the dervaton s to fnd a quadratc majorzaton functon n V such that L MSVM (V) tr V Z AZ V 2 tr V Z B + C, (8) where A, B, and C are coeffcents of the majorzaton dependng on V. The matrx Z s smply the n (m + 1) matrx wth rows z. Property P2 above means that the summaton over nstances n the loss functon can be gnored for now. Moreover, the regularzaton term s quadratc n V, and thus requres no majorzaton. The outermost functon for whch a majorzaton functon has to be found s thus the l p norm of the Huber hnge errors. Hence t s possble to consder the functon f(x) = x p for majorzaton. A majorzaton functon for f(x) can be constructed, but a dscontnuty n the dervatve at x = 0 wll reman (Tsutsu and Morkawa, 2012). To avod the dscontnuty n the dervatve of the l p norm, the followng nequalty s needed (Hardy et al., 1934, eq ) j k h p ( q (kj) ) 1/p ( h j k Ths nequalty can be used as a majorzaton functon only f equalty holds at the supportng pont ( ) 1/p h p q (kj) = ( ) h q (kj). j k j k q (kj) ). 11

12 Van den Burg and Groenen ( ) It s not dffcult to see that ths only holds f at most one of the h q (kj) errors s nonzero for j k. Thus an ndcator varable ε s ntroduced whch s 1 f at most one of these errors s nonzero, and 0 otherwse. Then t follows that L MSVM (V) 1 K ( ) ε h q (kj) + (1 ε ) ( ) 1/p h p q (kj) (9) n j k ρ k=1 G k + λ tr V JV. j k Now, the next functon for whch a majorzaton needs to be found s f 1 (x) = x 1/p. From the nequalty a α b β < αa + βb, wth α + β = 1 (Hardy et al., 1934, Theorem 37), a lnear majorzaton nequalty can be constructed for ths functon by substtutng a = x, b = x, α = 1/p and β = 1 1/p (Groenen and Heser, 1996). Ths yelds f 1 (x) = x 1/p 1 ( p x1/p 1 x ) x 1/p = g 1 (x, x). p Applyng ths majorzaton and usng property P1 gves j k h p ( q (kj) ) 1/p 1 p j k h p ( q (kj) Pluggng ths nto (9) and collectng terms yelds L MSVM (V) 1 K ( ε h n ρ k=1 G k ) 1/p 1 ( ) ( ) h p q (kj) p j k j k + Γ (1) + λ tr V JV, q (kj) j k h p ( q (kj) 1/p ) + (1 ε )ω. j k ) ( ) h p q (kj) wth ω = 1 p j k h p ( q (kj) ) 1/p 1. (10) The constant Γ (1) contans all terms that only depend on prevous errors q (kj). The next majorzaton step by the outsde-n approach s to fnd a quadratc majorzaton functon for f 2 (x) = h p (x), of the form f 2 (x) = h p (x) a(x, p)x 2 2b(x, p)x + c(x, p) = g 2 (x, x). Snce ths dervaton s mostly an algebrac exercse t has been moved to Appendx C. In the remander of ths dervaton, a (p) jk wll be used to abbrevate a(q(kj), p), wth smlar 12

13 Generalzed Multclass Support Vector Machne abbrevatons for b and c. Usng these majorzatons and makng the dependence on V explct by substtutng q (kj) = z Vδ kj gves L MSVM (V) 1 K [ ] ρ ε a (1) n jk z Vδ kj δ kj V z 2b (1) jk z Vδ kj k=1 G k j k + 1 K [ ] ρ (1 ε )ω a (p) n jk z Vδ kj δ kj V z 2b (p) jk z Vδ kj G k k=1 + Γ (2) + λ tr V JV, where Γ (2) agan contans all constant terms. Due to dependence on the matrx δ kj δ kj, the above majorzaton functon s not yet n the desred quadratc form of (8). However, snce the maxmum egenvalue of δ kj δ kj s 1 by defnton of the smplex coordnates, t follows that the matrx δ kj δ kj I s negatve semdefnte. Hence, t can be shown that the nequalty z (V V)(δ kjδ kj I)(V V) z 0 holds (Bjleveld and De Leeuw, 1991, Theorem 4). Rewrtng ths gves the majorzaton nequalty z Vδ kj δ kj V z z VV z 2z V(I δ kj δ kj )Vz + z V(I δ kj δ kj )V z. j k Wth ths nequalty the majorzaton nequalty becomes L MSVM (V) 1 K ρ z n V(V 2V )z k=1 G k j k 2 K ρ z n V G k k=1 + Γ (3) + λ tr V JV, j k [ ] ε a (1) jk + (1 ε )ω a (p) jk ( ) [ε b (1) jk a(1) jk q(kj) ( )] +(1 ε )ω b (p) jk a(p) jk q(kj) δ kj (11) where q (kj) = z Vδ kj. Ths majorzaton functon s quadratc n V and can thus be used n the IM algorthm. To derve the frst-order condton used n the update step of the IM algorthm (step 2 n Appendx B), matrx notaton for the above expresson s ntroduced. Let A be an n n dagonal matrx wth elements α, and let B be an n (K 1) matrx wth rows β, where α = 1 n ρ [ ] ε a (1) jk + (1 ε )ω a (p) jk, (12) j k β = 1 n ρ ( ) ( )] [ε b (1) jk a(1) jk q(kj) + (1 ε )ω b (p) jk a(p) jk q(kj) δ kj. (13) j k Then the majorzaton functon of L MSVM (V) gven n (11) can be wrtten as L MSVM (V) tr (V 2V) Z AZV 2 tr B ZV + Γ (3) + λ tr V JV = tr V (Z AZ + λj)v 2 tr (V Z A + B )ZV + Γ (3). 13

14 Van den Burg and Groenen Ths majorzaton functon has the desred functonal form descrbed n (8). Dfferentaton wth respect to V and equatng to zero yelds the lnear system (Z AZ + λj)v = Z AZV + Z B. (14) The update V + that solves ths system can then be calculated effcently by Gaussan elmnaton. 4.2 Algorthm Implementaton and Complexty Pseudocode for GenSVM s gven n Algorthm 1. As can be seen, the algorthm smply updates all nstance weghts at each teraton, startng by determnng the ndcator varable ε. In practce, some calculatons can be done effcently for all nstances by usng matrx algebra. When step doublng (see Appendx B) s appled n the majorzaton algorthm, lne 25 s replaced by V 2V + V. In the mplementaton step doublng s appled after a burn-n of 50 teratons. The mplementaton used n the experments descrbed n Secton 6 s wrtten n C, usng the ATLAS (Whaley and Dongarra, 1998) and LAPACK (Anderson et al., 1999) lbrares. The source code for ths C lbrary s avalable under the open source GNU GPL lcense, through an onlne repostory. A thorough descrpton of the mplementaton s avalable n the package documentaton. The complexty of a sngle teraton of the IM algorthm s O(n(m + 1) 2 ) assumng that n > m > K. As noted earler, the convergence rate of the general IM algorthm s lnear. Computatonal complexty of standard SVM solvers that solve the dual problem through decomposton methods les between O(n 2 ) and O(n 3 ) dependng on the value of λ (Bottou and Ln, 2007). An effcent algorthm for the method of Crammer and Snger (2002a) developed by Keerth et al. (2008) has a complexty of O(nmK) per teraton, where m m s the average number of nonzero features per tranng nstance. In the methods of Lee et al. (2004) and Weston and Watkns (1998), a quadratc programmng problem wth n(k 1) dual varables needs to be solved, whch s typcally done usng a standard solver. An analyss of the exact convergence of GenSVM, ncludng the expected number of teratons needed to acheve convergence at a factor ɛ, s outsde the scope of the current work and a subject for further research. 4.3 Smart Intalzaton When tranng machne learnng algorthms to determne the optmal hyperparameters, t s common to use cross valdaton (CV). Wth GenSVM t s possble to ntalze the matrx V such that the fnal result of a fold s used as the ntal value for V 0 for the next fold. Ths same technque can be used when searchng for the optmal hyperparameter confguraton n a grd search, by ntalzng the weght matrx wth the outcome of the prevous confguraton. Such warm-start ntalzaton greatly reduces the tme needed to perform cross valdaton wth GenSVM. It s mportant to note here that usng warm starts s not easly possble wth dual optmzaton approaches. Therefore, the ablty to use warm starts can be seen as an advantage of solvng the GenSVM optmzaton problem n the prmal. 14

15 Generalzed Multclass Support Vector Machne Algorthm 1: GenSVM Algorthm Input: X, y, ρ, p, κ, λ, ɛ Output: V 1 K max(y) 2 t 1 3 Z [1 X] 4 Let V V 0 5 Generate J and U K 6 L t = L MSVM (V) 7 L t 1 = (1 + 2ɛ)L t 8 whle (L t 1 L t )/L t > ɛ do 9 for 1 to n do 10 Compute q (yj) = z Vδ y j for all j y ( 11 Compute h q (yj) ) for all j y by (3) 12 f ε = 1 then 13 Compute a (1) jy and b (1) jy for all j y accordng to Table 4 n Appendx C 14 else 15 Compute ω followng (10) 16 Compute a (p) jy 17 end 18 Compute α by (12) 19 Compute β by (13) 20 end 21 Construct A from α 22 Construct B from β 23 Fnd V + that solves (14) 24 V V 25 V V + 26 L t 1 L t 27 L t L MSVM (V) 28 t t end and b (p) jy for all j y accordng to Table 4 n Appendx C 5. Nonlnearty One possble method to nclude nonlnearty n a classfer s through the use of splne transformatons (see for nstance Haste et al., 2009). Wth splne transformatons each attrbute vector x j s transformed to a splne bass N j, for j = 1,..., m. The transformed nput matrx N = [N 1,..., N m ] s then of sze n l, where l depends on the degree of the splne transformaton and the number of nteror knots chosen. An applcaton of splne transformatons to the bnary SVM can be found n Groenen et al. (2007). A more common way to nclude nonlnearty n machne learnng methods s through the use of the kernel trck, attrbuted to Azerman et al. (1964). Wth the kernel trck, the dot product of two nstance vectors n the dual optmzaton problem s replaced by the dot product of the same vectors n a hgh dmensonal feature space. Snce no dot products appear n the prmal formulaton of GenSVM, a dfferent method s used here. 15

16 Van den Burg and Groenen By applyng a preprocessng step on the kernel matrx, nonlnearty can be ncluded usng the same algorthm as the one presented for the lnear case. Furthermore, predctng class labels requres a postprocessng step on the obtaned matrx V. A full dervaton s gven n Appendx D. 6. Experments To assess the performance of the proposed GenSVM classfer, a smulaton study was done comparng GenSVM wth seven exstng multclass SVMs on 13 small data sets. These experments are used to precsely measure predctve accuracy and total tranng tme usng performance profles and rank plots. To verfy the feasblty of GenSVM for large data sets an addtonal smulaton study s done. The results of ths study are presented separately n Secton 6.4. Due to the large number of data sets and methods nvolved, experments were only done for the lnear kernel. Experments on nonlnear multclass SVMs would requre even more tranng tme than for lnear MSVMs and s consdered outsde the scope of ths paper. 6.1 Setup Implementatons of the heurstc multclass SVMs (OvO, OvA, and DAG) were ncluded through LbSVM (v. 3.16, Chang and Ln, 2011). LbSVM s a popular lbrary for bnary SVMs wth packages for many programmng languages, t s wrtten n C++ and mplements a varaton of the SMO algorthm of Platt (1999). The OvO and DAG methods are mplemented n ths package, and a C mplementaton of OvA usng LbSVM was created for these experments. 3 For the sngle-machne approaches the MSVMpack package was used (v. 1.3, Lauer and Guermeur, 2011), whch s wrtten n C. Ths package mplements the methods of Weston and Watkns (W&W, 1998), Crammer and Snger (C&S, 2002a), Lee et al. (LLW, 2004), and Guermeur and Monfrn (MSVM 2, 2011). Fnally, to verfy f mplementaton dfferences are relevant for algorthm performance the LbLnear (Fan et al., 2008) mplementaton of the method by Crammer and Snger (2002a) s also ncluded (denoted LL C&S). Ths mplementaton uses the optmzaton algorthm by Keerth et al. (2008). To compare the classfcaton methods properly, t s desrable to remove any bas that could occur when usng cross valdaton (Cawley and Talbot, 2010). Therefore, nested cross valdaton s used (Stone, 1974), as llustrated n Fgure 5. In nested CV, a data set s randomly splt n a number of chunks. Each of these chunks s kept apart from the remanng chunks once, whle the remanng chunks are combned to form a sngle data set. A grd search s then appled to ths combned data set to fnd the optmal hyperparameters wth whch to predct the test chunk. Ths process s then repeated for each of the chunks. The predctons of the test chunk wll be unbased snce t was not ncluded n the grd search. For ths reason, t s argued that ths approach s preferred over approaches that smply report maxmum accuracy rates obtaned durng the grd search. 3. The LbSVM code used for DAGSVM s the same code as was used n Hsu and Ln (2002) and s avalable at 16

17 Generalzed Multclass Support Vector Machne Combne chunks Keep apart Grd search usng 10-fold CV Tranng Phase Testng Phase Tran at optmal confguraton Test Fgure 5: An llustraton of nested cross valdaton. A data set s ntally splt n fve chunks. Each chunk s kept apart once, whle a grd search usng 10-fold CV s appled to the combned data from the remanng 4 chunks. The optmal parameters obtaned there are then used to tran the model one last tme, and predct the chunk that was kept apart. For the experments 13 data sets were selected from the UCI repostory (Bache and Lchman, 2013). The selected data sets and ther relevant statstcs are shown n Table 1. All attrbutes were rescaled to the nterval [ 1, 1]. The mage segmentaton and vowel data sets have a predetermned tran and test set, and were therefore not used n the nested CV procedure. Instead, a grd search was done on the provded tranng set for each classfer, and the provded test set was predcted at the optmal hyperparameters obtaned. For the data sets wthout a predetermned tran/test splt, nested CV was used wth 5 ntal chunks. Hence, = 57 pars of ndependent tran and test data sets are obtaned. Whle runnng the grd search, t s desrable to remove any fluctuatons that may result n an unfar comparson. Therefore, t was ensured that all methods had the same CV splt of the tranng data for the same hyperparameter confguraton (specfcally, the value of the regularzaton parameter). In practce, t can occur that a specfc CV splt s advantageous for one classfer but not for others (ether n tme or performance). Thus, deally the grd search would be repeated a number of tmes wth dfferent CV splts, to remove ths varaton. However, due to the sze of the grd search ths s consdered to be nfeasble. Fnally, t should be noted here that durng the grd search 10-fold cross valdaton was appled n a non-stratfed manner, that s, wthout resamplng of small classes. The followng settngs were used n the numercal experments. The regularzaton parameter was vared on a grd wth λ {2 18, 2 16,..., 2 18 }. For GenSVM the grd search was extended wth the parameters κ { 0.9, 0.5, 5.0} and p {1.0, 1.5, 2.0}. The stoppng parameter for the GenSVM majorzaton algorthm was set at ɛ = 10 6 durng the grd search n the tranng phase and at ɛ = 10 8 for the fnal model n the testng phase. In addton, two dfferent weght specfcatons were used for GenSVM: the unt weghts wth ρ = 1,, as well as the group-sze correcton weghts ntroduced n (4). Thus, the grd search conssts of 342 confguratons for GenSVM, and 19 confguratons 17

18 Van den Burg and Groenen Data set Instances (n) Features (m) Classes (K) mn n k max n k breast tssue rs wne mage segmentaton 210/ glass vertebral ecol vowel 528/ balancescale vehcle contracepton yeast car Table 1: Data set summary statstcs. Data sets wth an astersk have a predetermned test data set. For these data sets, the number of tranng nstances s denoted for the tran and test data sets respectvely. The fnal two columns denote the sze of the smallest and the largest class, respectvely. for the other methods. Snce nested CV s used for most data sets, t s requred to run 10-fold cross valdaton on a total of hyperparameter confguratons. To enhance the reproducblty of these experments, the exact predctons made by each classfer for each confguraton were stored n a text fle. To run all computatons n a reasonable amount of tme, the computatons were performed on the Dutch Natonal LISA Compute Cluster. A master-worker program was developed usng the message passng nterface n Python (Dalcín et al., 2005). Ths allows for effcent use of multple nodes by successvely sendng out tasks to worker threads from a sngle master thread. Snce the total tranng tme of a classfer s also of nterest, t was ensured that all computatons were done on the exact same core type. 4 Furthermore, tranng tme was measured from wthn the C programs, to ensure that only the tme needed for the cross valdaton routne was measured. The total computaton tme needed to obtan the presented results was about 152 days, usng the LISA Cluster ths was done n fve and a half days wall-clock tme. Durng the tranng phase t showed that several of the sngle machne methods mplemented through MSVMpack dd not converge to an optmal soluton wthn reasonable amount of tme. 5 Instead of lmtng the maxmum number of teratons of the method, MSVMpack was modfed to stop after a maxmum of 2 hours of tranng tme per confguraton. Ths results n 12 mnutes of tranng tme per cross valdaton fold. The soluton found after ths amount of tranng tme was used for predcton durng cross valdaton. 4. The specfc type of core used s the Intel Xeon E v2, wth 16 threads at a clock speed of 2.6 GHz. At most 14 threads were used smultaneously, reservng one for the master thread and one for system processes. 5. The default MSVMpack settngs were used wth a chunk sze of 4 for all methods. 18

19 Generalzed Multclass Support Vector Machne Whenever tranng was stopped prematurely, ths was recorded. 6 Of the 57 tranng sets, 24 confguratons had prematurely stopped tranng n one or more CV splts for the LLW method, versus 19 for W&W, 9 for MSVM 2, and 2 for C&S (MSVMpack). For the LbSVM methods, 13 optmal confguratons for OvA reached the default maxmum number of teratons n one or more CV folds, versus 9 for DAGSVM, and 3 for OvO. No early stoppng was needed for GenSVM or for LL C&S. Determnng the optmal hyperparameters requres a performance measure on the obtaned predctons. For bnary classfers t s common to use ether the htrate or the area under the ROC curve as a measure of classfer performance. The htrate only measures the percentage of correct predctons of a classfer and has the well known problem that no correcton s made for group szes. For nstance, f 90% of the observatons of a test set belong to one class, a classfer that always predcts ths class has a hgh htrate, regardless of ts dscrmnatory power. Therefore, the adjusted Rand ndex (ARI) s used here as a performance measure (Hubert and Arabe, 1985). The ARI corrects for chance and can therefore more accurately measure dscrmnatory power of a classfer than the htrate can. Usng the ARI for evaluatng supervsed learnng algorthms has prevously been proposed by Santos and Embrechts (2009). The optmal parameter confguratons for each method on each data set were chosen such that the maxmum predctve performance was obtaned as measured wth the ARI. If multple confguratons obtaned the hghest performance durng the grd search, the confguraton wth the smallest tranng tme was chosen. The results on the tranng data show that durng cross valdaton GenSVM acheved the hghest classfcaton accuracy on 41 out of 57 data sets, compared to 15 and 12 for DAG and OvO, respectvely. However, these are results on the tranng data sets and therefore can contan consderable bas. To accurately assess the out-of-sample predcton accuracy the optmal hyperparameter confguratons were determned for each of the 57 tranng sets, and the test sets were predcted wth these parameters. To remove any varatons due to random starts, buldng the classfer and predctng the test set was repeated 5 tmes for each classfer. Below the smulaton results on the small data sets wll be evaluated usng performance profles and rank tests. Performance profles offer a vsual representaton of classfer performance, whle rank tests allow for dentfcaton of statstcally sgnfcant dfferences between classfers. For the sake of completeness tables of performance scores and computaton tmes for each method on each data set are provded n Appendx E. To promote reproducblty of the emprcal results, all the code used for the classfer comparsons and all the obtaned results wll be released through an onlne repostory. 6.2 Performance Profles One way to get nsght n the performance of dfferent classfcaton methods s through performance profles (Dolan and Moré, 2002). A performance profle shows the emprcal cumulatve dstrbuton functon of a classfer on a performance metrc. 6. For the classfers mplemented through LbSVM very long tranng tmes were only observed for the OvA method, however due to the nature of ths method t s not trval to stop the calculatons after a certan amount of tme. Ths behavor was observed n about 1% of all confguratons tested on all data sets, and s therefore consdered neglgble. Also, for the LbSVM methods t was recorded whenever the maxmum number of teratons was reached. 19

20 Van den Burg and Groenen P c (η) GenSVM LL C&S DAG OvA OvO C&S LLW MSVM 2 W&W η Fgure 6: Performance profles for classfcaton accuracy created from all repettons of the test set predctons. The methods OvA, C&S, LL C&S, MSVM 2, W&W, and LLW wll always have a smaller probablty of beng wthn a factor η of the maxmum performance than the GenSVM, OvO, or DAG methods. Let D denote the set of data sets, and C denote the set of classfers. Further, let p d,c denote the performance of classfer c C on data set d D as measured by the ARI. Now defne the performance rato v d,c as the rato between the best performance on data set d and the performance of classfer c on data set d, that s v d,c = max{p d,c : c C} p d,c. Thus the performance rato s 1 for the best performng classfer on a data set and ncreases for classfers wth a lower performance. Then, the performance profle for classfer c s gven by the functon P c (η) = 1 N D {d D : v d,c η}, where N D = D denotes the number of data sets. Thus, the performance profle estmates the probablty that classfer c has a performance rato below η. Note that P c (1) denotes the emprcal probablty that a classfer acheves the hghest performance on a gven data set. Fgure 6 shows the performance profle for classfcaton accuracy. Estmates of P c (1) from Fgure 6 show that there s a 28.42% probablty that OvO acheves the optmal performance, versus 26.32% for both GenSVM and DAGSVM. Note that ths ncludes cases where each of these methods acheves the best performance. Fgure 6 also shows that although there s a small dfference n the probabltes of GenSVM, OvO, and DAG wthn 20

21 Generalzed Multclass Support Vector Machne T c (τ) GenSVM LL C&S DAG OvA OvO C&S LLW MSVM 2 W&W τ Fgure 7: Performance profles for tranng tme. GenSVM has a pror about 40% chance of requrng the smallest tme to perform the grd search on a gven method. The methods mplemented through MSVMpack always have a lower chance of beng wthn a factor τ of the smallest tranng tme than any of the other methods. a factor of 1.08 of the best predctve performance, for η 1.08 GenSVM almost always has the hghest probablty. It can also be concluded that snce the performance profles of the MSVMpack mplementaton and the LbLnear mplementaton of the method of Crammer and Snger (2002a) nearly always overlap, mplementaton dfferences have a neglgble effect on the classfcaton performance of ths method. Fnally, the fgure shows that OvA and the methods of Lee et al. (2004), Crammer and Snger (2002a), Weston and Watkns (1998), and Guermeur and Monfrn (2011) always have a smaller probablty of beng wthn a gven factor of the optmal performance than GenSVM, OvO, or DAG do. Smlarly, a performance profle can be constructed for the tranng tme necessary to do the grd search. Let t d,c denote the total tranng tme for classfer c on data set d. Next, defne the performance rato for tme as w d,c = t d,c mn{t d,c : c C}. Note that here the classfer wth the smallest tranng tme has preference. Therefore, comparson of classfer computaton tme s done wth the lowest computaton tme acheved on a gven data set d. Agan, the rato s 1 when the lowest tranng tme s reached, and t ncreases for hgher computaton tme. Hence, the performance profle for tme s defned as T c (τ) = 1 {d D : w d,c τ}. N D 21

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines. CS534 - Machine Learning Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Fitting: Deformable contours April 26 th, 2018

Fitting: Deformable contours April 26 th, 2018 4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems Taxonomy of Large Margn Prncple Algorthms for Ordnal Regresson Problems Amnon Shashua Computer Scence Department Stanford Unversty Stanford, CA 94305 emal: shashua@cs.stanford.edu Anat Levn School of Computer

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming CEE 60 Davd Rosenberg p. LECTURE NOTES Dualty Theory, Senstvty Analyss, and Parametrc Programmng Learnng Objectves. Revew the prmal LP model formulaton 2. Formulate the Dual Problem of an LP problem (TUES)

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Simplification of 3D Meshes

Simplification of 3D Meshes Smplfcaton of 3D Meshes Addy Ngan /4/00 Outlne Motvaton Taxonomy of smplfcaton methods Hoppe et al, Mesh optmzaton Hoppe, Progressve meshes Smplfcaton of 3D Meshes 1 Motvaton Hgh detaled meshes becomng

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers

Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers Effcent Dstrbuted Lnear Classfcaton Algorthms va the Alternatng Drecton Method of Multplers Caoxe Zhang Honglak Lee Kang G. Shn Department of EECS Unversty of Mchgan Ann Arbor, MI 48109, USA caoxezh@umch.edu

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005 Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed

More information

Learning to Project in Multi-Objective Binary Linear Programming

Learning to Project in Multi-Objective Binary Linear Programming Learnng to Project n Mult-Objectve Bnary Lnear Programmng Alvaro Serra-Altamranda Department of Industral and Management System Engneerng, Unversty of South Florda, Tampa, FL, 33620 USA, amserra@mal.usf.edu,

More information

A Robust LS-SVM Regression

A Robust LS-SVM Regression PROCEEDIGS OF WORLD ACADEMY OF SCIECE, EGIEERIG AD ECHOLOGY VOLUME 7 AUGUS 5 ISS 37- A Robust LS-SVM Regresson József Valyon, and Gábor Horváth Abstract In comparson to the orgnal SVM, whch nvolves a quadratc

More information

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012 Performance Evoluton of Dfferent Codng Methods wth β - densty Decodng Usng Error Correctng Output Code Based on Multclass Classfcaton Devangn Dave, M. Samvatsar, P. K. Bhanoda Abstract A common way to

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids)

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids) Structured meshes Very smple computatonal domans can be dscretzed usng boundary-ftted structured meshes (also called grds) The grd lnes of a Cartesan mesh are parallel to one another Structured meshes

More information

Discriminative classifiers for object classification. Last time

Discriminative classifiers for object classification. Last time Dscrmnatve classfers for object classfcaton Thursday, Nov 12 Krsten Grauman UT Austn Last tme Supervsed classfcaton Loss and rsk, kbayes rule Skn color detecton example Sldng ndo detecton Classfers, boostng

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming Optzaton Methods: Integer Prograng Integer Lnear Prograng Module Lecture Notes Integer Lnear Prograng Introducton In all the prevous lectures n lnear prograng dscussed so far, the desgn varables consdered

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Random Kernel Perceptron on ATTiny2313 Microcontroller

Random Kernel Perceptron on ATTiny2313 Microcontroller Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation Tranng of Kernel Fuzzy Classfers by Dynamc Cluster Generaton Shgeo Abe Graduate School of Scence and Technology Kobe Unversty Nada, Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract We dscuss kernel fuzzy classfers

More information

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract

More information