Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems

Size: px
Start display at page:

Download "Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems"

Transcription

1 Taxonomy of Large Margn Prncple Algorthms for Ordnal Regresson Problems Amnon Shashua Computer Scence Department Stanford Unversty Stanford, CA emal: Anat Levn School of Computer Scence and Engneerng Hebrew Unversty of Jerusalem Jerusalem 91904, Israel emal: Abstract We dscuss the problem of rankng nstances where an nstance s assocated wth an nteger from 1 to k. In other words, the specalzaton of the general mult-class learnng problem when there exsts an orderng among the nstances a problem known as ordnal regresson or rankng learnng. Ths problem arses n varous settngs both n vsual recognton and other nformaton retreval tasks. In the context of applyng a large margn prncple to ths learnng problem, we ntroduce two man approaches for mplementng the large margn optmzaton crtera for k ; 1 margns. The frst s the fxed margn polcy n whch the margn of the closest neghborng classes s beng maxmzed whch turns out to be a drect generalzaton of SVM to rankng learnng. The second approach allows for k ; 1 dfferent margns where the sum of margns s maxmzed, thus effectvely havng the soluton based towards the pars of neghborng classes whch are the farthest apart from each other. Ths approach s shown to reduce to SV M when the number of classes k = 2. Both approaches are optmal n sze (of the dual functonal) of 2l where l s the total number of tranng examples. Experments performed on vsual classfcaton and collaboratve flterng show that both approaches outperform exstng ordnal regresson algorthms appled for rankng and mult-class SVM appled to general mult-class classfcaton. 1 Introducton In ths paper we nvestgate the problem of nductve learnng from the pont of vew of predctng varables of ordnal scale [3, 7, 5], a settng referred to as rankng learnng or ordnal regresson. We consder the problem of applyng the large margn prncple used n Support Vector methods [11, 2] to the ordnal regresson problem whle mantanng an (optmal) problem sze lnear n the number of tranng examples. Ordnal regresson may be vewed as a problem brdgng between the two standard machne Ths manuscrpt should be referenced as Techncal Report , Lebnz Center for Research, School of Computer Scence and Eng., the Hebrew Unversty of Jerusalem.

2 learnng tasks of classfcaton and (metrc) regresson. Let x 2 R n, =1 :::l, be the nput vectors (the nformaton upon whch predcton takes place) drawn from some unknown probablty dstrbuton D(x); let y 2 Y be the output of the predcton process accordng to a unknown condtonal dstrbuton functon D(yjx). The tranng set, on whch the selecton of the best predctor would be made, conssts of (x y ) ndependent and dentcally dstrbuted observatons drawn from the jont dstrbuton D(x y)=d(x)d(yjx). The learnng task s to select a predcton functon f(x) from a famly of possble functons F that mnmzes the expected loss over the tranng set weghted by the jont dstrbuton D(x y) (also known as rsk functonal). The loss functon c : Y Y! R represents the dscrepancy between f(x) and y. Snce the jont dstrbuton s unknown, the rsk functonal s replaced by the so-called emprcal rsk functonal[11] P whch s smply the average of the loss functon over the tranng set: (1=l) c(f(x ) y ). In a standard classfcaton problem the nput vectors are assocated wth one of k classes, thus y 2 Y = f1 ::: kg belongs to an unordered set of labels denotng the class membershp. Snce Y s unordered and snce the metrc dstance between the predcton f(x) and the correct output y s of no partcular value, the loss functon relevant for classfcaton s the non-metrc 0-1 ndcator functon c(f(x) y) = 0 f f(x) = y and c(f(x) y) = 1 f f(x) 6= y. In a standard regresson problem y ranges over the reals therefore the loss functon can take nto account the full metrc structure for example, c(f(x) y)=(f(x) ; y) 2. In ordnal regresson, Y s a fnte set (lke n classfcaton) but there s an orderng among the elements of Y (lke n regresson, but unlke classfcaton). On the other hand, the orderng of the labels does not justfy a metrc loss functon, thus castng the rankng learnng problem as an ordnary regresson (by treatng the contnuous varable wth a coarse scale) may not be realstc [1]. Settngs n whch t s natural to rank or rate nstances arse n many felds such as nformaton retreval, vsual recognton, collaboratve flterng, econometrc models and classcal statstcs. We wll later use some applcatons from collaboratve flterng and vsual recognton as our runnng examples n ths paper. In collaboratve flterng for example, the goal s to predct a person s ratng on new tems such as moves gven the person s past ratngs on smlar tems and the ratngs of other people of all the tems (ncludng the new tem). The ratngs are ordered, such as hghly recommended, good,..., very bad thus collaboratve flterng falls naturally under the doman of ordnal regresson. In ths paper we approach the ordnal regresson problem wthn a classfcaton problem framework, and n order to take advantage of the non-metrc nature of the loss functon we wsh to embed the problem wthn a large margn prncple used n Support Vector methods [11]. The Support Vector method (SVM) was ntroduced orgnally n the context of 2-class classfcaton. The SVM paradgm has a nce geometrc nterpretaton of dscrmnatng one class from the other by a separatng plane wth maxmum margn. The large-margn prncple gves rse to the representaton of the decson boundary by a small subset of the tranng examples called Support Vectors. The SVM approach s advantageous for representng the ordnal regresson problem for two reasons. Frst, the computatonal machnery for fndng the optmal classfer f(x) s based on the non-metrc 0-1 loss functon. Therefore, by adoptng the large-margn prncple for ordnal regresson we would be mplementng an approprate non-metrc loss functon as well. Second, the SVM approach s not lmted to lnear classfers where through the mechansm of Kernel nner-products one can draw upon a rch famly of learnng functons applcable to non-lnear decson boundares. To tackle the problem of usng an SVM framework for regresson learnng, one may take the approach proposed n [7], whch s to reduce the total order nto a set of preferences over pars whch n effect ncreases the tranng set by from l to l 2. Another approach, nherted from the one-versus-many classfers used for extendng bnary SVM to mult-class SVM,

3 s to solve k ; 1 bnary classfcaton problems. The dsadvantage of ths approach s that t gnores the total orderng of the class labels (and also the effectve sze of the tranng set s kl whereas we wll show that regresson learnng can be performed wth an effectve tranng set of sze 2l). Lkewse, the mult-class SVMs proposed n [4, 11, 12, 8] would also gnore the orderng of the class labels and use a tranng set of sze kl. In ths paper we adopt the noton of mantanng a totally ordered set va projectons n the sense of projectng the nstances x onto the reals f(x) = w x [7, 5] and show how ths could be mplemented wthn a large margn prncple wth an effectve tranng sze of 2l. In fact, we show there s more than one way to mplement the large margn prncple as there k ; 1 possble margns. Essentally, we show, there are two strateges n general: a fxed margn strategy where the large margn prncple s appled to the closest neghborng pars of classes, or a mult-margn strategy where the sum of the k ; 1 margns s maxmzed. 2 The Ordnal Regresson Problem Let x j be the set of tranng examples where j = 1 P ::: k denotes the class number, and =1 ::: j s the ndex wthn each class. Let l = j j be the total number of tranng examples. A straght-forward generalzaton of the 2-class separatng hyperplane problem, where a sngle hyperplane determnes the classfcaton rule, s to defne k ; 1 separatng hyperplanes whch would separate the tranng data nto k ordered classes by modelng the ranks as ntervals on the real lne an dea whose orgns are wth the classcal cumulatve model [9], see also [7, 5]. The geometrc nterpretaton of ths approach s to look for k ; 1 parallel hyperplanes represented by vector w 2 R n (the dmenson of the nput vectors) and scalars b 1 ::: b k;1 defnng the hyperplanes (w b 1 ) ::: (w b k;1 ), such that the data are separated by dvdng the space nto equally ranked regons by the decson rule f(x) = mn fr : w x ; b r < 0g: (1) r2f1 ::: kg In other words, all nput vectors x satsfyng b r;1 < w x < b r are assgned the rank r (usng the conventon that b k = 1). For nstance, recently [5] proposed an on-lne algorthm (wth smlar prncples to the classc perceptron used for 2-class separaton) for fndng the set of parallel hyperplanes whch would comply wth the separaton rule above. To contnue the analogy to 2-class learnng, n addton to the separablty constrants on the varables = fw b 1 ::: b k;1 g one would lke to control the tradeoff between lowerng the emprcal rsk R emp () (error measure on the tranng set) and lowerng the confdence nterval ( h) controlled by the VC-dmenson h of the set of loss functons. The structural rsk mnmzaton (SRM) prncple [11] controls the actual rsk R() (error measured on the test data) by keepng R emp () fxed (n the deal separable case t would be zero) whle mnmzng the confdence nterval. The geometrc nterpretaton for 2-class learnng s to maxmze the margn between the boundares of the two sets [11, 2]. In our settng of rankng learnng, there are k ; 1 margns to consder, thus there are two possble approaches to take on the large margn prncple for rankng learnng: fxed margn strategy: the margn to be maxmzed s the one defned by the closest (neghborng) par of classes. Formally, let w b q be the hyperplane separatng the two pars of classes whch are the closest among all the neghborng pars of classes. Let w b q be scaled such the dstance of the boundary ponts from the hyperplane s 1,.e., the margn between the classes q q +1s 2=jwj (see Fg. 1). Thus, the fxed margn polcy for rankng learnng s to fnd the drecton

4 2 w maxmze the margn ( 1 w, b ) w, b ) ( 2 Fgure 1: Fxed-margn polcy for rankng learnng. The margn to be maxmzed s assocated wth the two closest neghborng classes. As n conventonal SVM, the margn s pre-scaled to be equal to 2=jwj thus maxmzng the margn s acheved by mnmzng w w. The support vectors le on the boundares between the two closest classes. w and the scalars b 1 ::: b k;1 such that w w s mnmzed (.e., the margn between classes q q+1 s maxmzed) the separablty constrants (modulo margn errors n the non-separable case). sum of margns strategy: the sum of all k ; 1 margns are to be maxmzed. In ths case, the margns are not necessarly equal (see Fg. 2). Formally, the rankng rule employs a vector w, jwj = 1, and a set of 2(k ; 1) thresholds a 1 b 1 a 2 b 2 ::: a k;1 b k;1 such that wx j a j and wx j+1 b j for j = 1 ::: k ; 1. In other words, all the examples of class 1 j k are sandwched between two parallel hyperplanes (w a j ) and (w b j;1 ), where b 0 = ;1 and a k = 1. The k ; P 1 margns are therefore (b j ; a j ) and the large margn prncple s to maxmze j (b j ;a j ) the separablty constrants above. It s also farly straghtforward to apply the SRM prncple and derve the bounds on the actual rsk functonal by followng [11] and makng substtutons where necessary. Let the emprcal rsk be defned as: R emp () = 1 l j k =1 j=1 jf(x j ) ; yj j = m l where f(x j ) s the decson rule (1), j s the number of tranng examples of class j and l s the total number of tranng examples. The emprcal rsk s the average of the number of mstakes where the magntude of a mstake s related to the total orderng,.e., the loss functon Q(z )=jf(x) ; yj, where z =(x y), s an nteger between 0 and k ; 1 (unlke the 0/1 loss functon assocated wth classfcaton learnng). Snce the loss functon s totally bounded, the VC-dmenson of the class of loss functons 0 Q(z ) k ; 1 s equal to the VC-dmenson h of the class of ndcator (0/1) functons 0 Q(z ) ; <0 I(z )= 1 Q(z ) ; 0 where 2 (0 k; 1). Let 4-margn k-separatng hyperplanes be defned when jwj =1

5 and 8 >< 1 w x a 1 r b j;1 w x a j y = : : >: : : > k b k;1 w x and where b j ; a j = 4 (fxed margn polcy), and 4 s the margn between the closest par of classes. From the arguments above, the VC-dmenson of the set of 4-margn k- separatng hyperplanes s bounded by the nequalty (followng [11]): R 2 h mn n where R s the radus of the sphere contanng all the examples. Thus we arrve to the bound on the probablty that a test example wll not be separated correctly (followng [[11], pp. 77,133]): Wth probablty 1 ; one can assert that the probablty that a test example wll not be separated correctly by the 4-margn k-separatng hyperplanes has the bound P error m l (k ; 1) s 9 >= 1+ 4m l(k ; 1) where 2l h(ln h +1); ln =4 =4 : l Therefore, the larger the fxed margn s the better bounds we obtan on the generalzaton performance of the rankng learnng problem wth the fxed-margn polcy. Lkewse, we obtan the same bound under the sum-of-margns prncple, where 4 s defned by the sum of the k ; 1 margns. In the remander of ths paper we wll ntroduce the algorthmc mplcatons of these two strateges for mplementng the large margn prncple for rankng learnng. The fxedmargn prncple wll turn out to be a drect generalzaton of the Support Vector Machne (SVM) algorthm n the sense that substtutng k =2n our proposed algorthm would produce the dual functonal underlyng conventonal SVM. It s nterestng to note that the sum-of-margns prncple reduces to SV M (ntroduced by [10]) when k =2. 3 Fxed Margn Strategy Recall that n the fxed margn polcy (w b q ) s a canoncal hyperplane normalzed such that the margn between the closest classes q q +1s 2=jwj. The ndex q s of course unknown. The unknown varables w b 1 ::: b k;1 (and the ndex q) could be solved n a two-stage optmzaton problem: a Quadratc Lnear Programmng (QLP) formulaton followed by a Lnear Programmng (LP) formulaton. The (prmal) QLP formulaton of the ( soft margn ) fxed-margn polcy for rankng learnng takes the form: mn w b j j j w w + C j! j + j+1 (2) w x j ; b j ;1+ j (3) w x j+1 ; b j 1 ; j+1 (4) j 0 j 0 (5)

6 b1 a w 1 b2 a w 2 ( 1 w, a ) w, b ) w, a ) w, b ) ( 1 ( 2 ( 2 Fgure 2: Sum-of-margns polcy for rankng learnng. The objectve s to maxmze the sum of k ; 1 margns. Each class s sandwched between two hyperplanes, the norm of w s P set to unty as a constrant n the optmzaton problem and as a result the objectve s to maxmze (bj ; aj). In j ths case, the support vectors le on the boundares among all neghborng classes (unlke the fxedmargn polcy). When the number of classes k =2, the dual functonal s equvalent to SV M. where j =1 ::: k ; 1 and =1 ::: j, and C s some predefned constant. The scalars j and j+1 are postve for data ponts whch are nsde the margns or placed on the wrong sde of the respectve hyperplane f the tranng data s lnearly separable on all the k (ordered) classes then we wouldn t need those ( slack ) varables. The prmal functonal mplements the fxed-margn prncple even though we do not know n advance the ndex q. In the case of hard margn (the prmal functonal above when j j are set to zero) the margn s maxmzed whle mantanng separablty, thus the margn wll be governed by the closest par of classes because otherwse the separablty condtons would cease to hold. The stuaton may be slghtly dfferent and would depend on the choce of C n the soft margn mplementaton but qualtatvely the same type of behavor holds. The soluton to ths optmzaton problem s gven by the saddle pont of the Lagrange functonal (Lagrangan): L() = 1 2 w w + C j + + j j ; j j + j+1 j (w xj ; b j +1; j ) j (1 ; j+1 + b j ; w x j+1 ) j j ; j j+1 j+1 where j = 1 ::: k ; 1, = 1 ::: j, and j j+1 j j are all non-negatve Lagrange multplers. Snce the prmal problem s convex, there exsts a strong dualty between the prmal and dual optmzaton functons. By frst mnmzng the Lagrangan wth respect to w b j j j+1 we obtan the dual optmzaton functon whch then must be maxmzed wth respect to the Lagrange multplers. From the mnmzaton of the Lagrangan wth

7 respect to w we obtan: w = ; j j xj + j j xj+1 (6) That s, the drecton w of the parallel hyperplanes s descrbed by a lnear combnaton of the support vectors x assocated wth the non-vanshng Lagrange multplers. From the Kuhn-Tucker theorem the support vectors are those vectors for whch equalty s acheved n the nequaltes (3,4). These vectors le on the two boundares between the adjacent classes q q +1(and other adjacent classes whch have the same margn). From the mnmzaton of the Lagrangan wth respect to b j we obtan the constrant: j = j j =1 ::: k; 1 (7) and the mnmzaton wth respect to j and j+1 yelds the constrants: C ; j ; j =0 (8) C ; j ; j+1 =0 (9) whch n turn gves rse to the constrants 0 j C where j = C f the correspondng data pont s a margn error ( j = 0, thus from the Kuhn-Tucker theorem j > 0), and lkewse 0 j C where equalty j = C holds when the data pont s a margn error. Note that a data pont can count twce as a margn error once wth respect to the class on ts left and once wth respect to the class on ts rght. For the sake of presentng the dual functonal n a compact form, we wll ntroduce some new notatons. Let j be the n j matrx whose columns are the data ponts x j, = 1 ::: j : j = hx j 1 ::: xj j n j : Let j =( j 1 ::: j j ) > be the vector whose components are the Lagrange multplers j correspondng to class j. Lkewse, let j =( j 1 ::: j j ) > be the Lagrange multplers j correspondng to class j +1. Let =( 1 ::: k;1 1 ::: k;1 ) > be the vector holdng all the j and j Lagrange multplers, and let 1 = ( 1 1 ::: 1 k;1 )> = ( 1 ::: k;1 ) > and 2 = ( 2 1 ::: 2 k;1 )> = ( 1 ::: k;1 ) > the frst and second halves of. Note that 1 j = j s a vector, and lkewse so s 2 j = j. Let 1 be the vector of 1 s, and fnally, let Q be the matrx holdng two copes of the tranng data: Q = ; 1 ::: ; k;1 2 ::: k nn (10) where N =2l ; 1 ; k. For example, (6) becomes n the new notatons w = Q. By substtutng the expresson for w = Q back nto the Lagrangan and takng nto account the constrants (7,8,9) one obtans the dual functonal whch should be maxmzed wth respect to the Lagrange multplers : max N =1 ; > (Q > Q) (11) 0 C =1 ::: N (12) 1 1 j = 1 2 j j =1 ::: k; 1 (13)

8 There are several ponts worth notng at ths stage. Frst, when k =2,.e., we have only two classes thus the rankng learnng problem s equvalent to the 2-class classfcaton problem, the dual functonal reduces and becomes equvalent to the dual form of the conventonal SVM. In that case (Q > Q) j = y y j x xj where y y j = 1 denotng the class membershp. Second, the dual problem s a functon of the Lagrange multplers j and j alone, that s, all the remanng Lagrange multplers have dropped out. Therefore the sze of the dual QLP problem (the number of unknown varables) s proportonal to twce the number of tranng examples precsely N = 2l ; 1 ; k where l s the number of tranng examples. Ths favorably compares to the O(l 2 ) requred by the recent SVM approach to ordnal regresson ntroduced n [7] or the kl requred by the general mult-class approach to SVM [4]. In fact, the problem sze of N =2l ; 1 ; k s the smallest possble for the ordnal regresson problem snce each tranng example s flanked by a class on each sde (except examples of the frst and last class), therefore the mnmal number of constrants for descrbng an ordnal regresson problem usng separatng hyperplanes s N. Thrd, the crtera functon nvolves only nner-products of the tranng examples, thereby makng t possble to work wth kernel-based nner-products. In other words, the entres Q > Q are the nner-products of the tranng examples whch can be represented by the kernel nner-product n the nput space dmenson rather than by nner-products n the feature space dmenson. The decson rule, n ths case, gven a new nstance vector x would be the rank r correspondng to the frst smallest threshold b r for whch support vectors j K(xj+1 x) ; support vectors j K(xj x) b r where K(x y) =(x) (y) replaces the nner-products n the hgher-dmensonal feature space (x). Fnally, from the dual form one can solve for the Lagrange multplers and n turn obtan w = Q the drecton of the parallel hyperplanes. The scalar b q (separatng the adjacent classes q q +1whch are the closest apart) can be obtaned from the support vectors, but the remanng scalars b j cannot. Therefore an addtonal stage s requred whch amounts to a Lnear Programmng problem on the orgnal prmal functonal (2) but ths tme w s already known (thus makng ths a lnear problem nstead of a quadratc one). 4 Sum-of-Margns Strategy In the fxed margn polcy for rankng learnng the drecton w of the k ; 1 parallel hyperplanes was determned such as to maxmze the margn of the closest adjacent par of classes. In other words, vewed as an extenson to conventonal SVM, the crtera functon remaned essentally a 2-class representaton (maxmzng the margn between two classes) whle the lnear constrants represented the admssblty constrants necessary for makng sure that all classes are properly separable (modulo margn errors). In ths secton we propose an alternatve large-margn polcy whch allows for k ; 1 margns where the crtera functon maxmzes the sum of the k ; 1 margns. The challenge n formulatng the approprate optmzaton functonal s that one cannot adopt the prescalng of w approach whch s at the center of conventonal SVM formulaton and of the fxed-margn polcy for rankng learnng descrbed n the prevous secton. The approach we take s to represent the prmal functonal usng 2(k ; 1) parallel hyperplanes nstead of k ; 1. Each class would be sandwched between two hyperplanes (except the frst and last classes). Ths may appear superfluous, but n fact all the extra varables (havng 2(k ; 1) thresholds nstead of k ; 1) drop out n the dual functonal

9 therefore ths approach has no detrmental effect n terms of computatonal effcency. Formally, we seek a rankng rule whch employs a vector w and a set of 2(k ; 1) thresholds a 1 b 1 a 2 b 2 ::: a k;1 b k;1 such that w x j a j and w x j+1 b j for j =1 ::: k; 1. In other words, all the examples of class 1 j k are sandwched between two parallel hyperplanes (w a j ) and (w b j;1 ), where b 0 = ;1 and a k = 1. The margn between two hyperplanes separatng class j and j +1s: b j ; a j p (w w) : Thus, by settng the magntude of w to be of unt length (as a constrant n the optmzaton problem), the margn whch we would lke to maxmze s P j (b j ; a j ) for j =1 ::: k; 1 whch we can formulate n the followng prmal Quadratc Lnear Programmng (QLP) problem (see also Fg. 2): mn w a j b j k;1 j + j+1 (14) j ; b j )+C j=1(a j a j b j (15) b j a j+1 j =1 ::: k; 2 (16) w x j a j + j (17) b j ; j+1 w x j+1 (18) w w 1 (19) j 0 j+1 0 (20) where j =1 ::: k ; 1 (unless otherwse specfed) and =1 ::: j, and C s some predefned constant (whose physcal role would be explaned later). There are several ponts to note about the prmal problem. Frst, the constrants a j b j and b j a j+1 are necessary and suffcent to enforce the orderng constrant a 1 b 1 a 2 b 2 ::: a k;1 b k;1. Second, the (non-convex) constrant w w = 1 s replaced by the convex constrant w w 1 snce the optmal soluton w would have unt magntude n order to optmze the objectve functon. To see why ths s so, consder frst the case of k =2where we have a sngle (hard) margn: mn w a b (a ; b) a b w x a =1 ::: 1 b w x = 1 +1 ::: N w w 1 We would lke to show that for the optmal soluton (gven that the data s lnearly separable) w must be of unt norm. Let w a bbe the optmal soluton and jwj = 1. Let x + and x ; be ponts (support vectors) on the left and rght boundary planes,.e., w x ; = a and w x + = b. Let w =(1=)w (thus jw j =1). We have therefore, w x ; = 1 a w x + = 1 b

10 Therefore, the new soluton w (1=)a (1=)b has a lower energy value (larger margn) of (1=)(a ; b) when <1. As a result, =1snce the orgnal soluton was assumed to be optmal. Ths lne of reasonng readly extends to P multple margns as the factor 1= would apply to all the margns unformly thus the sum j (a j ; b j ) would decrease (larger sum of margns) by a factor of 1= thus =1. The ntroducton of the soft margn component (the second term n 14) does not affect ths lne of reasonng as long as the constant C s consstent wth the exstence of a soluton wth negatve energy otherwse there would be a dualty gap between the prmal and dual functonals. Ths consstency s related to the number of margn errors whch we wll dscuss n more detals later n ths secton and the followng secton. We wll proceed to derve the dual functonal below. The Lagrangan takes the followng form: L() = + j j (a j ; b j )+C j + j+1 j j (w xj ; a j ; j )+ + (w w ; 1) ; j j j j ; j + j k;2 j(a j ; b j )+ j (b j ; a j+1 ) j (b j ; j+1 ; w x j+1 ) j+1 j where j = 1 ::: k ; 1 (unless otherwse specfed), = 1 ::: j, and j j j j j j are all non-negatve Lagrange multplers. From the mnmzaton of the Lagrangan wth respect to w we obtan: w = 1 2 Q where the matrx Q was defned n ( 10) and the vector holds the Lagrange multplers j and j as defned n the prevous secton. From the mnmzaton wth respect to b j for j =1 ::: k; 2 we obtan: For j = k ; 1 we obtan, from whch t j = ;1 ; j + k;1 = ;1 ; k;1 + j =0: k;1 =0 j=1 k;1 1: (21) Lkewse, the mnmzaton wth respect to a 1 provdes the constrant, from whch t follows (snce 1 0) that 1 =1+ 1 and wth respect to a j, j =2 ::: k ; 1, we get j =1+ j ; j;1 ; 1 1 (22) j =0:

11 Summng up the Lagrange multpler gves rse to another constrant (beyond (21) and (22)), as follows: and k;2 j=1 1 + k;1 j=2 j + k;1 j =(k ; 1) + k;1 j=1 k;1 =(k ; 1) + j=1 k;2 j + j=1 k;2 j + Therefore, as a result we obtan the constrant: j = j : (23) j Fnally, the mnmzaton wth respect to j and j+1 yelds the expressons (8) and (9) from whch we obtan the constrants 0 j C (24) 0 j C (25) where j = C and/or j = C f the correspondng data pont x j s a margn error (as mentoned before, a data pont can count twce as a margn error once wth respect to the class n ts left and once wth respect to the class on ts rght ). After substtutng the expresson for w back nto the Lagrangan and consderng the constrants borne out of the partal dervatves wth respect to a j b j we obtan the dual functonal as a functon of j j only (all the remanng varables drop out): max L 0 ( ) =; ; 1 4 > (Q > Q) the constrants (21,22,24,25) and 0. Note that =0cannot occur f there s an optmal soluton wth negatve energy n the prmal functonal (otherwse we have a dualty-gap, see later) snce we have shown above that the jwj =1n the optmal soluton thus form the Kuhn-Tucker theorem 6= 0. We can elmnate as follows: = ; C =0: Substtutng the expresson for =(1=2) p C back to L 0 () provdes a new dual functonal L 00 () =; p > Q > Q and maxmzaton of L 00 () s equvalent to maxmzaton of the expresson ; > (Q > Q snce Q > Q s postve defnte. To conclude, the dual functonal takes the followng form: max j=1 j j ; > (Q > Q) (26) 0 C =1 ::: N (27) (28) 1 2 k;1 1 (29) 1 1 = 1 2 (30) where Q and are defned n the prevous secton. The drecton w s represented by the lnear combnaton of the support vectors: w = Q jqj

12 where, followng the Kuhn-Tucker theorem, > 0 for all vectors on the boundares between the adjacent pars of classes and margn errors. In other words, the vectors x assocated wth non-vanshng are those whch le on the hyperplanes,.e., satsfy a j = w x j or b j = w x j+1 or vectors tagged as margn errors ( j > 0 or j+1 > 0). Therefore, all the thresholds a j b j can be recovered from the support vectors unlke the fxed-margn scheme whch requred another LP pass. The dual functonal (26) s smlar to the dual functonal (11) but wth some crucal dfferences: () the quadratc crtera functonal s homogeneous, and () constrants (28,29) leads to the constrant P 2. From the Kuhn-Tucker theorem, j =0when a j <b j, and j =0when b j <a j+1 thus when the data s lnearly separable the optmal soluton we would have P =2(k ; 1). Snce a margn error mples that the correspondng Lagrange multpler = C, the number of margn errors s bounded snce P s bounded. These two dfferences are also what dstngushes between conventonal SVM and SV M for 2-class learnng proposed recently by [10]. Indeed, f we set k =2n the dual functonal (26) we would be able to conclude that the two dual functonals are dentcal. The prmal and dual functonals of SV M and the sum-of-margns polcy for rankng learnng for k =2classes are summarzed below: SVM : prmal SVM : Dual P 1 mnw b 2 w w ; + 1 N N =1 max ; 1 2 > M y (w x + b) ; 0 0 P 0 1 N P y =0 k =2sum; of ; margns : prmal P P mnw a b (a ; b)+c 1 N =1 + = 1+1 max ; > M w x a ; =1 ::: 1 0 C b ; w x = 1 +1 ::: N P y =0 w w 1 2 a b 0 0 k =2sum; of ; margns : dual where M = Q > Q and M j = y y j x x j where y = 1 dependng on the class membershp. Although the prmal functonals appear dfferent, the dual functonals are smlar and n fact can be made equvalent by the followng change of varables. Scale the Lagrange multplers assocated wth SV M such that! 2. Then, C = 2 N and equvalence between the two dual forms s establshed. Appendx A provdes a more detaled analyss of the role of C n the case of k =2. In the general case of k > 2 classes (n the context of rankng learnng) the role of the constant C carres the same meanng: C 2(k;1) #m:e: where #m:e: stand for total number of margn errors, thus 2(k ; 1) C 2(k ; 1): N Recall that n the worst case a data pont can count twce for a margn error beng both a margn error n the context of ts class and the class on ts left and n the context of ts

13 fxed-margn algorthm sum-of-margns algorthm Fgure 3: Synthetc data experments for k = 3 classes wth 2D data ponts usng second order kernel nner-products. The sold lnes correspond to a 1 a 2 and the dashed lnes to b 1 b 2 (from left to rght). Support vectors are marked as squares n the dsplay. The left column llustrates fxed-margn (dual functonal (35)) and the rght column llustrates sum-of-margns (dual functonal (26)). When the value of C s small (top row) the number of margn errors (and support vectors) s large n order to enable large margns,.e. b j ; a j are large. In the case of sum-of-margns (top rght dsplay) a small value of C makes b 1 = a 2 n order to maxmze the margns. When the value of C s large (bottom row) the number of margn errors (and support vectors) s small and as a result the margns are tght. class and the class on ts rght. Therefore the total number of margn errors n the worst case s N =2l ; 1 ; k where l s the total number of data ponts. The last pont of nterest to make s that, unlke the fxed margn polcy, all the thresholds a j b j are determned from the support vectors the second Lnear Programmng optmzaton stage s not necessary n ths case. In other words, there must be support vectors on each hyperplane (w a j ) and (w b j ), otherwse a better soluton exsts wth larger margns. To conclude, the multple margn polcy maxmzes the sum of the k ; 1 margns allowng the margns to dffer n sze thus effectvely rewardng larger margns between neghborng classes whch are spaced far apart from each other. Ths s opposte to the fxed margn polcy n whch the drecton of the hyperplanes s domnated by the closest neghborng classes. We saw that the fxed margn polcy reduces to conventonal SVM when the number of classes k = 2 and the multple margn polcy reduces to SV M. Other dfferences between the two polces of usng the large margn prncple s that the multple margn polcy requres a sngle optmzaton sweep for recoverng both the drecton w and the thresholds a j b j whereas the fxed margn polcy requres two sweeps: a QLP for recoverng w and a Lnear Programmng problem for recoverng the k ; 1 thresholds b j.

14 5 Fxed Margn Polcy Revsted: Generalzaton of SV M We have seen that the sum-of-margns polcy reduces to SV M when the number of classes k = 2. However, one cannot make the asserton n the other drecton that the dual functonal (26) s a generalzaton of SV M. In fact, the fxed margn polcy appled to SV M for rankng learnng would have the followng form: mn w b j j j w w ; + 1 l w x j ; b j ; + j w x j+1 ; b j ; j+1 0 j 0 j 0 and the resultng dual functonal would have the form: max j j + j+1 (31) ; 1 2 > (Q > Q) (32) 0 1 l =1 ::: N (33) 1 1 j = 1 2 j (34) whch s not equvalent to the dual functonal (26) of the multple-margn polcy (nor to the dual functonal (11) of the fxed-margn polcy). EachMove data set: 72,916 users Ratng of user j of move matrx s sparse (5% full) 1628 moves x1 x x Total of 2,811,983 ratngs x Tranng set {, )} 1628 ( x y = 1 y ( 0,...,6) Target user y1 y f (x) Predct the ratng of a target user to a new move Fgure 4: EachMove dataset used for predctng a person s ratng on a new move gven the past ratngs on smlar moves and the ratngs of other people on all the moves. See text for detals. We saw that SV M could be rederved usng the prncple of two parallel hyperplanes (prmal functonal (14) n the case k = 2). We wll show next that the generalzaton of

15 SV M to rankng learnng (dual functonal (32) above) can be derved usng the 2(k ; 1) parallel hyperplanes approach. The prmal functonal takes the followng form: mn w a j b j t + C j a j ; b j = t w x j a j + j j + j+1 b j ; j+1 w x j+1 w w 1 j 0 j 0: Note that the objectve functon mn t the constrant a j ; b j = t captures the fxed margn polcy. The resultng dual functonal takes the followng form: max ; > (Q > Q) (35) 0 C =1 ::: N (36) =2 (37) 1 1 j = 1 2 j (38) whch s equvalent (va change of varables) to the dual functonal (32). Thus to conclude, there are two fxed-margn mplementatons for rankng learnng, one s a drect generalzaton of conventonal SVM (dual functonal (11)), and the second s a drect generalzaton of SV M (dual functonal (35)). 6 Experments We have conducted experments on synthetc data n order to vsualze the behavor of the new rankng algorthms, experments on collaboratve flterng problems, and experments on rankng vsual data of vehcles. Fg. 3 shows the performance of the two types of algorthms on synthetc 2D data of a three class (k = 3) ordnal regresson problem usng second order kernel nner-products (thus the separatng surfaces are concs). The value of the constant C changes the senstvty to the number of margn errors and the number of support vectors and as a result the margns themselves (more margn errors allow larger margns). The left column llustrates fxed-margn (dual functonal (35)) and the rght column llustrates sum-of-margns (dual functonal (26)). When the value of C s small (top row) the number of margn errors (and support vectors) s large n order to enable large margns,.e. b j ; a j are large. In the case of sum-of-margns (top rght dsplay) a small value of C makes b 1 = a 2 n order to maxmze the margns as a result the center class completely vanshes (the decson rule wll never make a classfcaton n favor of the center class). When the value of C s large (bottom row) the number of margn errors (and support vectors) s small and as a result the margns are tght. Fg. 4 shows the data structure of EachMove dataset [6] whch s used for collaboratve flterng tasks. In general, the goal n collaboratve flterng s to predct a person s ratng on new tems such as moves gven the person s past ratngs on smlar tems and the ratngs of other people of all the tems (ncludng the new tem). The ratngs are ordered, such

16 Crammer & Snger 2001 fxed-margn Fgure 5: The results of the fxed-margn prncple plotted aganst the results obtaned by usng the on-lne algorthm of [5] whch does not use a large-margn prncple. The average error between the predcted ratng and the correct ratng s much lower. as hghly recommended, good,..., very bad thus collaboratve flterng fall naturally under the doman of ordnal regresson (rather than general mult-class learnng). The EachMove dataset contans 1628 moves rated by 72,916 people arranged as a 2D array whose columns represent the moves and the rows represent the users about 5% of the entres of ths array are flled-n wth ratngs between 0 ::: 6 totalng 2,811,983 ratngs. Gven a new user, the ratngs of the user on the 1628 moves (not all moves would be rated) form the y and the th column of the array forms the x whch together form the tranng data (for that partcular user). Gven a new move represented by the vector x of ratngs of all the other 72,916 users (not all the users rated the new move), the learnng task s to predct the ratng f(x) of the new user. Snce the array contans empty entres, the ratngs were shfted by ;3:5 to have the possble ratngs f;2:5 ;1:5 ;0:5 0:5 1:5 2:5g whch allows to assgn the value of zero to the empty entres of the array (moves whch were not rated). For the tranng phase we chose users whch ranked about 450 moves and selected a subset f ::: 300g of those moves for tranng and tested the predcton on the remanng moves. We compared our results (collected over 100 runs) the average dstance between the correct ratng and the predcted ratng to the best on-lne algorthm of [5] called PRank (there s no use of large margn prncple). In ther work, PRank was compared to other known on-lne approaches and was found to be superor, thus we lmted our comparson to PRank alone. Attempts to compare our algorthms to other known rankng algorthms whch use a large-margn prncple ([7], for example) were not successful snce those square the tranng set sze whch made the experment wth the Eachmove dataset untractable computatonally. The graph n Fg. 5 shows that the large margn prncple (dual functonal 35) makes a sgnfcant dfference on the results compared to PRank. The results we obtaned wth PRank are consstent wth the reported results of [5] (best average error of about 1.25), whereas our fxed-margn algorthm provded an average error of about 0.7).

17 Correctly Classfed Badly Classfed Fgure 6: Classfcaton of vehcle type: Small, Medum and Large. On the left are typcal examples of correct classfcatons and on the rght are typcal examples of ncorrect classfcatons. We also appled the rankng learnng algorthms to a vsual classfcaton problem where we consder mages of vehcles taken from the rear where the task s to classfy each pcture to one of three classes: small (passenger cars), medum (SUVs, mnvans) and large (buses, trucks). There s a natural order Small, Medum, Large snce makng a mstake between Small and Large s worse than confusng Small and Medum, for example. The orderng Small, Medum, Large makes t natural for applyng rankng learnng (rather than general mult-class). The problem of classfyng vehcle types s relevant for applcatons n the area of Intellgent Traffc Transportaton (ITS) where on-board sensors such as Vsual and Radar would be responsble for a wde varety of drvng assstance applcatons ncludng actve safety related to arbag deployment n whch vehcle classfcaton data s one mportant pece of nformaton. The tranng data ncluded 1500 examples from each class where the nput vector was smply the raw pxel values down-sampled to 20x20 pxels per mage. The testng phase ncluded 8081 pctures of Small vehcles, 3453 pctures of Medum vehcles and 2395 pctures of Large vehcles. The classfcaton error (countng the number of mssclassfcatons) wth the fxed-margn polcy usng second-order kernel nner-products was 20% of all test data compared to 25% when performng the classfcaton usng three rounds of 2-class conventonal SVM (whch s the conventonal approach for usng large margn prncple for general mult-class). We also examned the rankng error by averagng the dfference between the true rank f1 2 3g and the predcted rank f(x) = support vectors j K(xj+1 x) ; support vectors j K(xj x) over all test vectors x. The average was compared to usng PRank. Fg. 6 shows a typcal collecton of correctly classfed and ncorrectly classfed pctures from the test set.

18 7 Summary We have ntroduced a number of algorthms of lnear sze wth the number of tranng examples for mplementng a large margn prncple for the task of ordnal regresson. The frst type of algorthms (dual functonals 11, 32, 35) ntroduces the constrant of a sngle margn determned by the closest adjacent par of classes. That partcular margn s maxmzed whle preservng (modulo margn errors) the separablty constrants. The support vectors le on the boundares of the closest adjacent par of classes only, thus a complete soluton requres frst a QLP for fndng the hyperplanes drecton w and an LP for fndng the thresholds. Ths type of algorthm comes n two flavors: the frst s a drect extenson of conventonal SVM (dual functonal 11) and the second s a drect extenson of SV M (dual functonals 32, 35). The second type of algorthm (dual functonal 26) allows for multple dfferent margns where the optmzaton crtera s the sum of the k ; 1 margns. The key observaton wth ths approach s that n order to accommodate dfferent margns the pre-scalng concept (canoncal hyperplane) used n conventonal SVM (and n fxed-margn algorthms above) s not approprate and nstead one must have 2(k ; 1) parallel hyperplanes where the margns are represented explctly by the ntervals b j ; a j (rather than by w w as wth conventonal SVM and fxed margn algorthms). A byproduct of the sum-of-margn approach s that the LP phase s not necessary any more, and that the role of the constant C has a natural nterpretaton. In fact when k =2the sum-of-margns algorthm s dentcal to SV M. The drawback of ths approach (a drawback shared wth SV M) s that unfortunate choces of the constant C mght lead to a dualty gap wth the QLP thus renderng the dual functonal rrelevant or degenerate. Experments performed on vsual classfcaton and collaboratve flterng show that both approaches outperform exstng ordnal regresson algorthms (on-lne approach) appled for rankng and mult-class SVM (appled to the vsual classfcaton problem). Acknowledgements Thanks for MoblEye Ltd. for the use of the vehcle data set. Ths work was done whle the authors were at the Computer Scence department at Stanford Unversty. A.S. especally thanks hs host Leo Gubas for makng hs vst to Stanford possble. References [1] J. Anderson. Regresson and ordered categorcal varables. Journal of the Royal Statstcal Socety Seres B, 46:1 30, [2] B.E. Boser, I.M. Guyon, and V.N. Vapnk. A tranng algorthm for optmal margn classfers. In Proc. of the 5th ACM Workshop on Computatonal Learnng Theory, pages ACM Press, [3] W.W. Cohen, R.E. Schapre, and Y. Snger. Learnng to order thngs. Journal of Artfcal Intellgence Research (JAIR), 10: , [4] K. Crammer and Y. Snger. On the algorthmc mplementaton of multclass kernelbased vector machnes. Journal of Machne Learnng Research, 2: , [5] K. Crammer and Y. Snger. Prankng wth rankng. In Proceedngs of the conference on Neural Informaton Processng Systems (NIPS), [6] [7] R. Herbrch, T. Graepel, and K. Obermayer. Large margn rank boundares for ordnal regresson. Advances n Large Margn Classfers, pp

19 [8] Y. Lee, Y. Ln, and G. Wahba. Multcategory support vector machnes. Techncal Report 1043, Unv. of Wsconsn, Dept. of Statstcs, Sep [9] P. McCullagh and J. A. Nelder. Generalzed Lnear Models. Chapman and Hall, London, 2nd edton edton, [10] B. Scholkopf, A. Smola, R.C. Wllamson, and P.L. Bartless. New support vector algorthms. Neural Computaton, 12: , [11] V.N. Vapnk. The nature of statstcal learnng. Sprnger, 2nd edton edton, [12] J. Weston and C. Watkns. Support vector machnes for mult-class pattern recognton. In Proc. of the 7th European Symposum on Artfcal Neural Networks, Aprl A A Closer Look at k =2: the Role of the Constant C In SV M the constant 0 < < 1 sets the tradeoff between the fracton of allowable margn errors (at most N data ponts could be margn errors) and the mnmal number of support vectors (at least N support vectors). Therefore, the constant C n the sum-ofmargns rankng learnng specalzed to k =2has a smlar nterpretaton: 2=N < C 2 s nversely proportonal to the allowable number of margn errors N =2=C. Thus, when C =2only a sngle margn error s tolerated (otherwse the optmzaton problem wll be n a weak dualty state to be dscussed later), and when C =2=N all the ponts could be margn errors (and n turn all the ponts are support vectors). The role of the constant C as a tradeoff between the mnmal number of support vectors and the allowable number of margn errors can be drectly observed through the prmal problem, as follows. Let w a b be a feasble soluton for the prmal problem. Let 0 be the smallest of the non-vanshng,.e., the dstance of the nearest margn error assocated wth the negatve tranng examples; and let 0 be the smallest of the non-vanshng,.e., the dstance of the nearest margn error assocated wth the postve tranng examples. Consder translatng the two hyperplanes such that ^a = a + 0 and ^b = b ; 0. The new feasble soluton conssts of: ^a ^b w ^ ^ where, ^ = ; 0 > 0 0 otherwse and ^ s defned smlarly. The value of the crteron functon becomes: ^a ; ^b + C = a ; b + C ^ + + ^!! + 0 (1 ; C) + 0 (1 ; C) where s the number of margn errors (where > 0) assocated wth the negatve tranng examples, and the number of margn errors assocated wth the postve examples. In order that the orgnal soluton would be optmal we must have that 1 ; C +1; C 0 (otherwse we could lower the crtera functon and obtan a better soluton). Therefore, 2 C + : We see that C = 2 when only a sngle margn error s allowed and C = 2=N when all tranng data, postve and negatve, are allowed to be margn errors. In other words the smaller C 2 s, the more margn errors are allowed n the fnal soluton.

20 To see the connecton between C and the necessary number of support vectors consder: 0 =mnfa ; w x j a ; w x > 0 =1 ::: 1 g whch s the smallest dstance between a negatve example (whch s not a support vector) and the left hyperplane. Lkewse, 0 =mn fw x ; b j w x ; b>0 = 1 +1 ::: Ng whch s the smallest dstance between a postve example (whch s not a support vector) and the rght hyperplane. Startng wth a feasble soluton w a b we create a new feasble soluton w ^a ^b ^ ^ as follows. Let ^a = a ; 0, ^b = b + 0, + ^ = 0 > 0 =1 ::: 1 0 otherwse and ^ = + 0 > 0 = 1 +1 ::: N 0 otherwse Note that the support vectors are assocated wth ponts on the hyperplanes and ponts labeled as margn errors ( > 0 covers both). Snce n the new soluton the hyperplanes are shfted, all the old support vectors become margn errors (thus ^ > 0). The value of the crtera functon becomes: ^a ; ^b + C = a ; b + C ^ + + ^!! : + 0 (C ; 1) + 0 ( C ; 1) where s the number of negatve support vectors and s the number of postve support vectors. In order that the orgnal soluton would be optmal we must have that C ; 1+ C ; 1 0 (otherwse we could lower the crtera functon and obtan a better soluton). Therefore, + 2 C : We see that when C =2(a sngle margn error s allowed), the number of support vectors s at least 1, and when C =2=N (all nstances are allowed to become margn errors) then the number of support vectors s N (.e., all nstances are support vectors). Taken together, C forms a tradeoff: the more margn errors are allowed, the more support vectors one wll have n the optmal soluton. Fnally, t s worth notng that wth a wrong selecton of the constant C (when there are more margn errors than the value of C allows for) would make the problem non-feasble as the prmal crtera functon would be postve (otherwse the constrants would not be satsfed). Snce the dual crtera functon s non-postve, a dualty gap would emerge. In other words, even n the presence of slack varables (soft margn), there could be an unfortunate stuaton where the optmzaton problem s not feasble and ths stuaton s related to the choce of the constant C. To conclude, the 2-parallel hyperplanes formulaton, or equvalently the SV M formulaton, carres wth t a tradeoff. On one hand, the role of the constant C s clear and ntutvely smple: there s a drect relatonshp between the value of C and the fracton of data ponts whch are allowed to be marked as margn errors. On the other hand, unlke conventonal SVM whch exhbts strong dualty under all choces of the regularzaton constant C, the 2-plane formulaton exhbts strong dualty only for values of C whch are consstent wth the worst case scenaro of margn errors.

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines. CS534 - Machine Learning Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming CEE 60 Davd Rosenberg p. LECTURE NOTES Dualty Theory, Senstvty Analyss, and Parametrc Programmng Learnng Objectves. Revew the prmal LP model formulaton 2. Formulate the Dual Problem of an LP problem (TUES)

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

General Vector Machine. Hong Zhao Department of Physics, Xiamen University General Vector Machne Hong Zhao (zhaoh@xmu.edu.cn) Department of Physcs, Xamen Unversty The support vector machne (SVM) s an mportant class of learnng machnes for functon approach, pattern recognton, and

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c Improvements to SMO Algorthm for SVM Regresson 1 S.K. Shevade S.S. Keerth C. Bhattacharyya & K.R.K. Murthy shrsh@csa.sc.ernet.n mpessk@guppy.mpe.nus.edu.sg cbchru@csa.sc.ernet.n murthy@csa.sc.ernet.n 1

More information

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION... Summary A follow-the-leader robot system s mplemented usng Dscrete-Event Supervsory Control methods. The system conssts of three robots, a leader and two followers. The dea s to get the two followers to

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Relevance Feedback Document Retrieval using Non-Relevant Documents

Relevance Feedback Document Retrieval using Non-Relevant Documents Relevance Feedback Document Retreval usng Non-Relevant Documents TAKASHI ONODA, HIROSHI MURATA and SEIJI YAMADA Ths paper reports a new document retreval method usng non-relevant documents. From a large

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Support Vector Machines for Business Applications

Support Vector Machines for Business Applications Support Vector Machnes for Busness Applcatons Bran C. Lovell and Chrstan J Walder The Unversty of Queensland and Max Planck Insttute, Tübngen {lovell, walder}@tee.uq.edu.au Introducton Recent years have

More information

Lecture 5: Probability Distributions. Random Variables

Lecture 5: Probability Distributions. Random Variables Lecture 5: Probablty Dstrbutons Random Varables Probablty Dstrbutons Dscrete Random Varables Contnuous Random Varables and ther Dstrbutons Dscrete Jont Dstrbutons Contnuous Jont Dstrbutons Independent

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Human Face Recognition Using Generalized. Kernel Fisher Discriminant Human Face Recognton Usng Generalzed Kernel Fsher Dscrmnant ng-yu Sun,2 De-Shuang Huang Ln Guo. Insttute of Intellgent Machnes, Chnese Academy of Scences, P.O.ox 30, Hefe, Anhu, Chna. 2. Department of

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A Robust LS-SVM Regression

A Robust LS-SVM Regression PROCEEDIGS OF WORLD ACADEMY OF SCIECE, EGIEERIG AD ECHOLOGY VOLUME 7 AUGUS 5 ISS 37- A Robust LS-SVM Regresson József Valyon, and Gábor Horváth Abstract In comparson to the orgnal SVM, whch nvolves a quadratc

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Discriminative classifiers for object classification. Last time

Discriminative classifiers for object classification. Last time Dscrmnatve classfers for object classfcaton Thursday, Nov 12 Krsten Grauman UT Austn Last tme Supervsed classfcaton Loss and rsk, kbayes rule Skn color detecton example Sldng ndo detecton Classfers, boostng

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

More information

Towards Semantic Knowledge Propagation from Text to Web Images

Towards Semantic Knowledge Propagation from Text to Web Images Guoun Q (Unversty of Illnos at Urbana-Champagn) Charu C. Aggarwal (IBM T. J. Watson Research Center) Thomas Huang (Unversty of Illnos at Urbana-Champagn) Towards Semantc Knowledge Propagaton from Text

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Multicriteria Decision Making

Multicriteria Decision Making Multcrtera Decson Makng Andrés Ramos (Andres.Ramos@comllas.edu) Pedro Sánchez (Pedro.Sanchez@comllas.edu) Sonja Wogrn (Sonja.Wogrn@comllas.edu) Contents 1. Basc concepts 2. Contnuous methods 3. Dscrete

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information