Nested Support Vector Machines

Size: px
Start display at page:

Download "Nested Support Vector Machines"

Transcription

1 Nested Support Vector Machnes 1 *Gyemn Lee, Student Member, IEEE, and Clayton Scott, Member, IEEE Abstract One-class and cost-senstve support vector machnes (SVMs) are state-of-the-art machne learnng methods for estmatng densty level sets and solvng weghted classfcaton problems, respectvely. However, the solutons of these SVMs do not necessarly produce set estmates that are nested as the parameters controllng the densty level or cost-asymmetry are contnuously vared. Such nestng not only reflects the true sets beng estmated, but s also desrable for applcatons requrng the smultaneous estmaton of multple sets, ncludng clusterng, anomaly detecton, and rankng. We propose new quadratc programs whose solutons gve rse to nested versons of one-class and cost-senstve SVMs. Furthermore, lke conventonal SVMs, the soluton paths n our constructon are pecewse lnear n the control parameters, although here the number of breakponts s drectly controlled by the user. We also descrbe decomposton algorthms to solve the quadratc programs. These methods are compared to conventonal (non-nested) SVMs on synthetc and benchmark data sets, and are shown to exhbt more stable rankngs and decreased senstvty to parameter settngs. Index Terms machne learnng, pattern classfcaton, one class support vector machne, cost senstve support vector machne, nested set estmaton, soluton paths. I. INTRODUCTION Many statstcal learnng problems may be characterzed as problems of set estmaton. In these problems, the nput takes the form of a random sample of ponts n a feature space, whle the desred output s a subset G of the feature space. For example, n densty level set estmaton, a random sample from a densty s gven and G s an estmate of a densty level set. In bnary classfcaton, labeled tranng data are avalable, and G s the set of all feature vectors predcted to belong to one of the classes. G. Lee and C. Scott are wth the Department of Electrcal Engneerng and Computer Scence, Unversty of Mchgan, Ann Arbor, MI, USA. E-mal: {gyemn, cscott}@eecs.umch.edu. Ths work was supported n part by NSF Award No

2 2 (a) one-class SVM (b) cost-senstve SVM Fg. 1. Two decson boundares from a one-class SVM (a) and a cost-senstve SVM (b) at two densty levels and cost asymmetres. The shaded regons ndcate the densty level set estmate at the hgher densty level and the postve decson set estmate at the lower cost asymmetry, respectvely. These regons are not completely contaned nsde the sold contours correspondng to the smaller densty level or the larger cost asymmetry, hence the two decson sets are not properly nested. In other statstcal learnng problems, the desred output s a famly of sets G θ wth the ndex θ takng values n a contnuum. For example, estmatng densty level sets at multple levels s an mportant task for many problems ncludng clusterng [1], outler rankng [2], mnmum volume set estmaton [3], and anomaly detecton [4]. Estmatng cost-senstve classfers at a range of dfferent cost asymmetres s mportant for rankng [5], Neyman-Pearson classfcaton [6], sem-supervsed novelty detecton [7], and ROC studes [8]. Support vector machnes (SVMs) are powerful nonparametrc approaches to set estmaton [9]. However, both the one-class SVM (OC-SVM) for level set estmaton and the standard two-class SVM for classfcaton do not produce set estmates that are nested as the parameter controllng the densty level or, respectvely, msclassfcaton cost s vared. As dsplayed n Fg. 1, set estmates from the orgnal SVMs are not properly nested. On the other hand, Fg. 2 shows nested counterparts obtaned from our proposed methods (see Secton III, IV). Snce the true sets beng estmated are n fact nested, estmators that enforce the nestng constrant wll not only avod nonsenscal solutons, but should also be more accurate and less senstve to parameter settngs and perturbatons of the tranng data. One way to generate nested SVM classfers s to tran a cost-nsenstve SVM and smply vary the offset. However, ths often leads to nferor performance as demonstrated n [8]. In ths paper, we develop nested varants of one-class and two-class SVMs by ncorporatng nestng constrants nto the dual quadratc programs assocated wth these methods. Decomposton algorthms for solvng these modfed duals are also presented. Lke the soluton paths for conventonal SVMs [10], [8],

3 3 (a) nested OC-SVM (b) nested CS-SVM Fg. 2. Fve decson boundares from our nested OC-SVM (a) and nested CS-SVM (b) at fve dfferent densty levels and cost asymmetres, respectvely. These decson boundares from nested SVMs do not cross each other, unlke the decson boundares from the orgnal SVMs (OC-SVM and CS-SVM). Therefore, the correspondng set estmates are properly nested. [11], nested SVM soluton paths are also pecewse lnear n the control parameters, but requre far fewer breakponts. We compare our nested paths to the unnested paths on synthetc and benchmark data sets. We also quantfy the degree to whch standard SVMs are unnested, whch s often qute hgh. The Matlab mplementaton of our algorthms s avalable at cscott/code/nestedsvm.zp. A prelmnary verson of ths work appeared n [12]. A. Motvatng Applcatons Wth the multple set estmates from nested SVMs over densty levels or cost asymmetres, the followng applcatons are envsoned. Rankng : In the bpartte rankng problem [13], we are gven labeled examples from two classes, and the goal s constructng a score functon that rates new examples accordng to ther lkelhood of belongng to the postve class. If the decson sets are not nested as cost asymmetres or densty levels vares, then the resultng score functon leads to ambguous rankng. Nested SVMs wll make the rankng unambguous and less senstve to perturbatons of the data. See Secton VI-C for further dscusson. Clusterng : Clusters may be defned as the connected components of a densty level set. The level at whch the densty s thresholded determnes a tradeoff between cluster number and cluster coverage. Varyng the level from 0 to yelds a cluster tree [14] that depcts the bfurcaton of clusters nto dsjont components and gves a herarchcal representaton of cluster structure. Anomaly Detecton : Anomaly detecton ams to dentfy devatons from nomnal data when combned observatons of nomnal and anomalous data are gven. Scott and Kolaczyk [4] and Scott and Blanchard

4 4 [7] present approaches to classfyng the contamnated, unlabeled data by solvng multple level set estmaton and multple cost-senstve classfcaton problems, respectvely. II. BACKGROUND ON CS-SVM AND OC-SVM In ths secton, we wll overvew two SVM varants and show how they can be used to learn set estmates. To establsh notaton and basc concepts, we brefly revew SVMs. Suppose that we have a random sample {(x, y )} N =1 where x R d s a feature vector and y { 1, +1} s ts class. An SVM fnds a separatng hyperplane wth a normal vector w n a hgh dmensonal space H by solvng mn w,ξ λ 2 w 2 + ξ s.t. y w, Φ(x ) 1 ξ, ξ 0, where λ s a regularzaton parameter and Φ s a nonlnear functon that maps each data pont nto H generated by a postve sem-defnte kernel k : R d R d R. Ths kernel corresponds to an nner product n H through k(x, x ) = Φ(x), Φ(x ). Then the two half-spaces of the hyperplane {Φ(x) : f(x) w, Φ(x) = 0} form postve and negatve decson sets. Snce the offset of the hyperplane s often omtted when Gaussan or nhomogeneous polynomal kernels are chosen [15], t s not consdered n ths formulaton. More detaled dscusson on SVMs can be found n [9]. A. Cost-Senstve SVM The SVM above, whch we call a cost-nsenstve SVM (CI-SVM), penalzes errors n both classes equally. However, there are many applcatons where the numbers of data samples from each class are not balanced, or false postves and false negatves ncur dfferent costs. The cost-senstve SVM (CS-SVM) handles ths ssue by controllng the cost asymmetry between false postves and false negatves [16]. Let I + = { : y = +1} and I = { : y = 1} denote the two ndex sets, and γ denote the cost asymmetry. Then a CS-SVM solves mn w,ξ λ 2 w 2 + γ I + ξ + (1 γ) I ξ (1) s.t. y w, Φ(x ) 1 ξ, ξ 0, where w s the normal vector of the hyperplane. When γ = 1 2, CS-SVMs reduce to CI-SVMs.

5 5 In practce ths optmzaton problem s solved va ts dual, whch depends only on a set of Lagrange multplers (one for each x ): mn α 1 2λ α α j y y j K,j j s.t. 0 α 1 {y<0} + y γ,. α (2) where K,j = k(x, x j ) and α = (α 1, α 2,..., α N ). The ndcator functon 1 {A} returns 1 f the condton A s true and 0 otherwse. Once an optmal soluton α (γ) = (α1 (γ),..., α N (γ)) s found, the sgn of the decson functon f γ (x) = 1 α (γ)y k(x, x ) (3) λ determnes the class of x. If k(, ) 0, then ths decson functon takes only non-postve values when γ = 0, and corresponds to (0, 0) n the ROC. On the other hand, γ = 1 penalzes only the volatons of postve examples, and corresponds to (1, 1) n the ROC. Bach et al. [8] extended the method of Haste et al. [10] to the CS-SVM. They showed that α (γ) are pecewse lnear n γ, and derved an effcent algorthm for computng the entre path of solutons to (2). Thus, a famly of classfers at a range of cost asymmetres can be found wth a computatonal cost comparable to solvng (2) for a sngle γ. B. One-Class SVM The OC-SVM was proposed n [17], [18] to estmate a level set of an underlyng probablty densty gven a data sample from the densty. In one-class problems, all the nstances are assumed from the same class, typcally the negatve class, y = 1,. The prmal quadratc program of the OC-SVM s λ 2 w N ξ (4) N mn w,ξ =1 s.t. w, Φ(x ) 1 ξ, ξ 0,. Ths problem s agan solved va ts dual n practce: mn α 1 2λ α α j K,j j α (5) s.t. 0 α 1 N,. Then a soluton α (λ) = (α1 (λ),..., α N (λ)) defnes a decson functon that determnes whether a pont s an outler or not. Here α (λ) are also pecewse lnear n λ [11]. From ths property, we can develop a path followng algorthm and generate a famly of level set estmates wth a small computatonal cost.

6 6 The set estmate conventonally assocated wth the OC-SVM s gven by Ĝ λ = {x : α (λ)k(x, x) > λ}. (6) Vert and Vert [19] showed that by modfyng ths estmate slghtly, substtutng α (ηλ) for α (λ) where η > 1, (6) leads to a consstent estmate of the true level set when a Gaussan kernel wth a wellcalbrated bandwdth s used. Regardless of whether η = 1 or η > 1, however, the obtaned estmates are not guaranteed to be nested as we wll see n Secton VI. Note also that when α (λ) = 1 N, (6) s equvalent to set estmaton based on kernel densty estmaton. III. NESTED CS-SVM In ths secton, we develop the nested cost-senstve SVM (NCS-SVM), whch ams to produce nested postve decson sets G γ = {x : f γ (x) > 0} as the cost asymmetry γ vares. Our constructon s a two stage process. We frst select a fnte number of cost asymmetres 0 = γ 1 < γ 2 <... < γ M = 1 a pror and generate a famly of nested decson sets at the preselected cost asymmetres. We acheve ths goal by ncorporatng nestng constrants nto the dual quadratc program of CS-SVM. Second, we lnearly nterpolate the soluton coeffcents of the fnte nested collecton to a contnuous nested famly defned for all γ. As an effcent method to solve the formulated problem, we present a decomposton algorthm. A. Fnte Famly of Nested Sets Our NCS-SVM fnds decson functons at cost asymmetres γ 1, γ 2,..., γ M smultaneously by mnmzng the sum of duals (2) at each γ and by mposng addtonal constrants that nduce nested sets. For a fxed λ and preselected cost asymmetres 0 = γ 1 < γ 2 < < γ M = 1, an NCS-SVM solves M mn 1 α α,m α j,m y y j K,j α,m (7) 1,...,α M 2λ m=1,j s.t. 0 α,m 1 {y<0} + y γ m,, m (8) y α,1 y α,2 y α,m, (9) where α m = (α 1,m,..., α N,m ) and α,m s a coeffcent for data pont x and cost asymmetry γ m. Then ts optmal soluton α m = (α1,m,..., α N,m ) defnes the decson functon f γ m (x) = 1 λ α,m y k(x, x) and ts correspondng decson set Ĝγ m = {x : f γm (x) > 0)} for each m. In Secton VII, the proposed quadratc program for NCS-SVMs s nterpreted as a dual of a correspondng prmal quadratc program.

7 7 B. Interpolaton For an ntermedate cost asymmetry γ between two cost asymmetres, say γ 1 and γ 2 wthout loss of generalty, we can wrte γ = ɛγ 1 + (1 ɛ)γ 2 for some ɛ [0, 1]. Then we defne new coeffcents α (γ) through lnear nterpolaton: Then the postve decson set at cost asymmetry γ s α (γ) = ɛα,1 + (1 ɛ)α,2. (10) Ĝ γ = {x : f γ (x) = 1 α (γ)y k(x, x) > 0}. (11) λ Ths s motvated by the pecewse lnearty of the Lagrange multplers of the CS-SVM, and s further justfed by the followng result. Proposton 1. The nested CS-SVM equpped wth a kernel such that k(, ) 0 (e.g., Gaussan kernels or polynomal kernels of even orders) generates nested decson sets. In other words, f 0 γ ɛ < γ δ 1, then Ĝγ ɛ Ĝγ δ. Proof: We prove the proposton n three steps. Frst, we show that sets from (7) satsfy Ĝγ 1 Ĝ γ2 Ĝγ M. Second, we show that f γ m < γ < γ m+1, then Ĝγ m Ĝγ Ĝγ m+1. Fnally, we prove that any two sets from the NCS-SVM are nested. Wthout loss of generalty, we show Ĝγ 1 Ĝγ 2. Let α 1 and α 2 denote the optmal solutons for γ 1 and γ 2. Then from k(, ) 0 and (9), we have α,1 y k(x, x) α,2 y k(x, x). Therefore, Ĝ γ1 = {x : f γ1 (x) > 0} Ĝγ 2 = {x : f γ2 (x) > 0}. Next, wthout loss of generalty, we show Ĝγ 1 Ĝγ Ĝγ 2 when γ 1 γ γ 2. The lnear nterpolaton (10) and the nestng constrants (9) mply y α,1 leads to α,1 y k(x, x) α (γ)y k(x, x) α,2 y k(x, x). y α (γ) y α,2, whch, n turn, Now consder arbtrary 0 γ ɛ < γ δ 1. If γ ɛ γ m γ δ for some m, then Ĝγ ɛ Ĝγ δ by the above results. Thus, suppose ths s not the case and assume γ 1 < γ ɛ < γ δ < γ 2 wthout loss of generalty. Then there exst ɛ > δ such that γ ɛ = ɛγ 1 + (1 ɛ)γ 2 and γ δ = δγ 1 + (1 δ)γ 2. Suppose x Ĝγ ɛ. Then x Ĝγ 2, hence f γɛ (x) = 1 λ (ɛα,1 + (1 ɛ)α,2 )y k(x, x) > 0 and f γ2 (x) = 1 λ α,2 y k(x, x) > 0. By addng δ ɛ f γ ɛ (x) + (1 δ ɛ )f γ 2 (x), we have f γδ (x) = (δα,1 + (1 δ)α,2 )y k(x, x) > 0. Thus, Ĝ γɛ Ĝγ δ. The assumpton that the kernel s postve can n some cases be attaned through pre-processng of the data. For example, a cubc polynomal kernel can be appled f the data support s shfted to le n the postve orthant, so that the kernel functon s n fact always postve.

8 8 C. Decomposton Algorthm The objectve functon (7) requres optmzaton over N M varables. Due to ts large sze, standard quadratc programmng algorthms are nadequate. Thus, we develop a decomposton algorthm that teratvely dvdes the large optmzaton problem nto subproblems and optmzes the smaller problems. A smlar approach also appears n a mult-class classfcaton algorthm [20], although the algorthm developed there s substantvely dfferent from ours. The decomposton algorthm follows: 1) Choose an example x from the data set. 2) Optmze coeffcents {α,m } M m=1 correspondng to x whle leavng other varables fxed. 3) Repeat 1 and 2 untl the optmalty condton error falls below a predetermned tolerance. The pseudo code gven n Fg. 3 ntalzes wth a feasble soluton α,m = 1 {y<0}+y γ m,, m. A smple way of selecton and termnaton s cyclng through all the x or pckng x randomly and stoppng after a fxed number of teratons. However, by checkng the Karush-Kuhn-Tucker (KKT) optmalty condtons and choosng x most volatng the condtons [21], the algorthm wll converge n far fewer teratons. In the Appendx, we provde a detaled dscusson of the data pont selecton scheme and termnaton crteron based on the KKT optmalty condtons. In step 2, the algorthm optmzes a set of varables assocated to the chosen data pont. Wthout loss of generalty, let us assume that the data pont x 1 s chosen and {α 1,m } M m=1 wll be optmzed whle fxng the other α,m. We rewrte the objectve functon (7) n terms of α 1,m : 1 α,m α j,m y y j K,j α,m 2λ m,j = 1 1 λ 2 α2 1,mK 1,1 + α 1,m α j,m y 1 y j K 1,j λ + C j 1 where f 1,m = 1 λ = 1 λ m [ 1 m = K 1,1 λ m 2 α2 1,mK 1,1 + α 1,m ( λy1 f 1,m α old 1,mK 1,1 λ )] + C [ 1 2 α2 1,m α 1,m ( α1,m old + λ(1 y )] 1f 1,m ) + C K 1,1 ( ) j 1 α j,my j K 1,j + α1,m old y 1K 1,1 and α1,m old denote the output and the varable precedng the update. These values can be easly computed from the prevous teraton result. C s a collecton of terms that do not depend on α 1,m.

9 9 Input: {(x, y )} N =1, {γ m} M m=1 Intalze: α,m 1 {y<0} + y γ m,, m repeat Choose a data pont x. Compute: Update {α,m } M m=1 f,m 1 α j,m y j K,j, λ j m α new,m α,m + λ(1 y f,m ) K,, m wth the soluton of the subproblem: [ ] 1 mn α,1,...,α,m m 2 α2,m α,m α,m new s.t. 0 α,m 1 {y<0} + y γ m, m y α,1 y α,2 y α,m untl Accuracy condtons are satsfed Output: Ĝγ m = {x : α,my k(x, x) > 0}, m Fg. 3. Decomposton algorthm for a nested cost-senstve SVM. Specfc strateges for data pont selecton and termnaton, based on the KKT condtons, are gven n the Appendx. Then the algorthm solves the new subproblem wth M varables, [ ] 1 mn α 1,1,...,α 1,M m 2 α2 1,m α 1,m α1,m new where α new 1,m = αold 1,m + λ(1 y1f1,m) K 1,1 s.t. 0 α 1,m 1 {y1<0} + y 1 γ m, m y 1 α 1,1 y 1 α 1,2 y 1 α 1,M be solved effcently va standard quadratc program solvers. s the soluton f feasble. Ths subproblem s much smaller and can

10 10 IV. NESTED OC-SVM In ths secton, we present a nested extenson of OC-SVM. The nested OC-SVM (NOC-SVM) estmates a famly of nested level sets over a contnuum of levels λ. Our approach here parallels the approach developed for the NCS-SVM. Frst, we wll ntroduce an objectve functon for nested set estmaton, and wll develop analogous nterpolaton and decomposton algorthms for the NOC-SVM. A. Fnte Famly of Nested Sets For M dfferent densty levels of nterest λ 1 > λ 2 > > λ M > 0, an NOC-SVM solves the followng optmzaton problem mn α 1,...,α M M 1 α,m α j,m K,j 2λ m=1 m,j α,m (12) s.t. 0 α,m 1,, m (13) N α,1 λ 1 α,2 λ 2 α,m λ M, (14) where α m = (α 1,m,..., α N,m ) and α,m corresponds to data pont x at level λ m. Its optmal soluton α m = (α 1,m,..., α N,m ) determnes a level set estmate Ĝλ m = {x : f λm (x) > 1} where f λm (x) = 1 λ m α,m k(x, x). In practce, we can choose λ 1 and λ M to cover the entre range of nterestng values of densty level (see Secton VI-B, Appendx C). In Secton VII, ths quadratc program for the NOC-SVM s nterpreted as a dual of a correspondng prmal quadratc program. B. Interpolaton and Extrapolaton We construct a densty level set estmate at an ntermedate level λ between two preselected levels, say λ 1 and λ 2. At λ = ɛλ 1 + (1 ɛ)λ 2 for some ɛ [0, 1], we set α (λ) = ɛα,1 + (1 ɛ)α,2. For λ > λ 1, we extrapolate the soluton by settng α (λ) = α,1 for. These are motvated by the facts that the OC-SVM soluton s pecewse lnear n λ and remans constant for λ > λ 1 as presented n Appendx C. Then the level set estmate becomes Ĝ λ = {x : α (λ)k(x, x) > λ}. (15) The level set estmates generated from the above process are shown to be nested n the next Proposton.

11 11 Proposton 2. The nested OC-SVM equpped wth a kernel such that k(, ) 0 (n partcular, a Gaussan kernel) generates nested densty level set estmates. That s, f 0 < λ ɛ < λ δ <, then Ĝλ ɛ Ĝλ δ. Proof: We prove the proposton n three steps. Frst, we show that sets from (12) satsfy Ĝλ 1 Ĝ λ2 Ĝλ M. Second, the nterpolated set (15) s shown to satsfy Ĝλ m Ĝλ Ĝλ m+1 when λ m > λ > λ m+1. Fnally, we prove the clam for any two sets from the NOC-SVM. Wthout loss of generalty, we frst show Ĝλ 1 Ĝλ 2. Let λ 1 > λ 2 denote two densty levels chosen a pror, and α 1 and α 2 denote ther correspondng optmal solutons. From (14), we have α,2 λ 2 k(x, x), so the two estmated level sets are nested Ĝλ 1 Ĝλ 2. α,1 λ 1 k(x, x) Next, wthout loss of generalty, we prove Ĝλ 1 Ĝλ Ĝλ 2 for λ 1 > λ > λ 2. From (14), we have α,1 λ 1 α,2 λ 2 and α,1 λ 1 Hence, f λ1 (x) f λ (x) f λ2 (x). = λ α,1 λ 1 λ = ɛα,1 ɛα,1 + (1 ɛ)α,2 λ ɛ λ1 λ 2 α,2 + (1 ɛ)α,2 λ + (1 ɛ) λ2 λ 1 α,1 λ = α (λ) λ = λ α,2 λ 2 λ = α,2. λ 2 Now consder arbtrary λ δ > λ ɛ > 0. By constructon, we can easly see that Ĝλ δ Ĝλ ɛ Ĝλ 1 for λ δ > λ ɛ > λ 1, and Ĝλ M Ĝλ δ Ĝλ ɛ for λ M > λ δ > λ ɛ. Thus we only need to consder the case λ 1 > λ δ > λ ɛ > λ M. Snce above results mply Ĝλ δ Ĝλ ɛ f λ δ > λ m > λ ɛ for some m, we can safely assume λ 1 > λ δ > λ ɛ > λ 2 wthout loss of generalty. Then there exst δ > ɛ such that λ δ = δλ 1 + (1 δ)λ 2 and λ ɛ = ɛλ 1 + (1 ɛ)λ 2. Suppose x Ĝλ δ. Then x Ĝλ 2 and (δα,1 + (1 δ)α,2)k(x, x) > λ δ (16) α,2k(x, x) > λ 2. (17) By ɛ δ (16) + (1 ɛ δ ) (17), we have (ɛα,1 + (1 ɛ)α,2 )k(x, x) > λ ɛ. Thus, Ĝλ δ Ĝλ ɛ. The statement of ths result focuses on the Gaussan kernel because ths s the prmary kernel for whch the OC-SVM has been successfully appled. C. Decomposton Algorthm We also use a decomposton algorthm to solve (12). The general steps are the same as explaned n Secton III-C for the NCS-SVM. Fg. 4 shows the outlne of the algorthm. In the algorthm, a feasble

12 12 soluton α,m = 1 N for, m s used as an ntal soluton. Here we present how we can dvde the large optmzaton problem nto a collecton of smaller problems. Suppose that the data pont x 1 s selected and ts correspondng coeffcents {α 1,m } M m=1 wll be updated. Wrtng the objectve functon only n terms of α 1,m, we have 1 α,m α j,m K,j α,m 2λ m m,j = 1 α 2 2λ 1,mK 1,1 + α 1,m 1 α j,m K 1,j 1 + C m m λ m j 1 = [ ( )] 1 α 2 2λ 1,mK 1,1 + α 1,m f 1,m αold 1,m K 1,1 1 + C m m λ m [ 1 =K 1,1 α1,m 2 α ( 1,m α1,m old + λ )] m(1 f 1,m ) + C 2λ m m λ m K 1,1 ( ) where α1,m old and f 1,m = 1 λ m j 1 α j,mk 1,j + α1,m old K 1,1 denote the varable from the prevous teraton step and the correspondng output, respectvely. C s a constant that does not affect the soluton. Then we obtan the reduced optmzaton problem of M varables, [ 1 mn α α 1,m 2 α ] 1,m α1,m new 1,1,...,α 1,M m 2λ m λ m where α new 1,m = αold 1,m + λm(1 f1,m) K 1,1 (18) s.t. 0 α 1,m 1, m (19) N α 1,1 λ 1 α 1,2 λ 2 α 1,M λ M (20). Notce that α new 1,m becomes the soluton f t s feasble. Ths reduced optmzaton problem can be solved through standard quadratc program solvers. V. COMPUTATIONAL CONSIDERATIONS Here we provde gudelnes for breakpont selecton and dscuss the effects of nterpolaton. A. Breakpont Selecton The constructon of an NCS-SVM begns wth the selecton of a fnte number of cost asymmetres. Snce the cost asymmetres take values wthn the range [0, 1], the two breakponts γ 1 and γ M should be at the two extremes so that γ 1 = 0 and γ M = 1. Then the rest of the breakponts γ 2,, γ M 1 can be set evenly spaced between γ 1 and γ M. On the other hand, the densty levels for NOC-SVMs should be strctly postve. Wthout coverng all postve reals, however, λ 1 and λ M can be chosen to cover practcally all the densty levels of nterest.

13 13 Input: {x } N =1, {λ m} M m=1 Intalze: α,m 1 N,, m repeat Choose a data pont x. Compute: Update {α,m } M m=1 f,m 1 α j,m K,j, λ m j m α new,m α,m + λ m(1 f,m ) K,, m wth the soluton of the subproblem: [ 1 mn α α,m 2 α ],m α,m new,1,...,α,m m 2λ m λ m untl Accuracy condtons are satsfed s.t. 0 α,m 1 N, m α,1 λ 1 α,2 λ 2 Output: Ĝλ m = {x : α,mk(x, x) > λ m }, m α,m λ M Fg. 4. Decomposton algorthm for a nested one-class SVM. Specfc strateges for data pont selecton and termnaton, based on the KKT condtons, are gven n the Appendx. The largest level λ 1 for the NOC-SVM s set as descrbed n Appendx C where we show that for λ > λ 1, the CS-SVM and OC-SVM reman unchanged. A very small number greater than 0 s set for λ M. Then the NOC-SVM s traned on evenly spaced breakponts between λ 1 and λ M. In our experments, we set the number of breakponts to be M = 5 for NCS-SVMs and M = 11 for NOC-SVMs. These values were chosen because ncreasng the number of breakponts M had dmnshng AUC gans whle causng tranng tme ncreases n our experments. Thus, the cost asymmetres for the NCS-SVM are (0, 0.25, 0.5, 0.75, 1) and the densty levels for NOC-SVM are 11 lnearly spaced ponts from λ 1 = 1 N max j K,j to λ 11 = 10 6.

14 14 B. Effects of Interpolaton Nested SVMs are traned on a fnte number of cost asymmetres or densty levels and then the soluton coeffcents are lnearly nterpolated over a contnuous range of parameters. Here we llustrate the effectveness of the lnear nterpolaton scheme of nested SVMs usng the two dmensonal banana data set. Consder two sets of cost asymmetres, γ = (0 : 0.25 : 1) and γ = (0 : 0.1 : 1), wth dfferent numbers of breakponts for the NCS-SVM. Let α (γ m) denote the lnearly nterpolated soluton at γ m from the soluton of the NCS-SVM wth γ, and let α (γ m) denote the soluton from the NCS-SVM wth γ. Fg. 5 compares these two soluton coeffcents α (γ m) and α (γ m). The box plots Fg. 5 (a) shows that values of α (γ m) α (γ m) tend to be very small. Indeed, for most γ m, the nterquartle range on these box plots s not even vsble. Regardless of these mnor dscrepances, what s most mportant s that the resultng decson sets are almost ndstngushable as llustrated n Fg. 5 (c) and (e). Smlar results can be observed n the NOC-SVM as well from Fg. 5 (b), (d) and (f). Here we consder two sets of densty levels λ wth 11 breakponts and λ wth 16 breakponts between λ 1 = 1 N max j K,j and λ M = C. Computatonal complexty Accordng to Haste et al. [10], the (non-nested) path followng algorthm has O(N) breakponts and complexty O(m 2 N + N 2 m), where m s the maxmum number of ponts on the margn along the path. On the other hand, our nested SVMs have a controllable number of breakponts M. To assess the complexty of the nested SVMs, we make a couple of assumptons based on expermental evdence. Frst, our experence has shown that the number of teratons of the decomposton algorthm s proportonal to the number of data ponts N. Second, we assume that the subproblem, whch has M varables, can be solved n O(M 2 ) operatons. Furthermore, each teraton of the decomposton algorthm also nvolves a varable selecton step. Ths nvolves checkng all varables for KKT condton volatons (as detaled n the Appendces), and thus entals O(M N) operatons. Thus, the computaton tme of nested SVMs are O(M 2 N + MN 2 ). In Secton VI-E, we expermentally compare the run tmes of the path followng algorthms to our methods. VI. EXPERIMENTS AND RESULTS In order to compare the algorthms descrbed above, we expermented on 13 benchmark data sets avalable onlne 1 [22]. Ther bref summary s provded n Fg. 6. Each feature s standardzed wth 1

15 15 1 banana 1/N banana α ~ * α * α ~ * α * Cost asymmetres γ 1/N Densty levels λ (a) α (γ m) α (γ m) (b) α (λ m) α (λ m) (c) Ĝγ m ( α (γ m)) (d) Ĝλ m ( α (λ m)) (e) Ĝγ m (α (γ m)) (f) Ĝλ m (α (λ m)) Fg. 5. detals. Smulaton results depctng the mpact of nterpolaton on the coeffcents and fnal set estmates. See Secton V-B for zero mean and unt varance. The frst eleven data sets are randomly permuted 100 tmes (the last two are permuted 20 tmes) and dvded nto tranng and test sets. In all of our experments, we used the Gaussan kernel k(x, x ) = exp ( x x 2 and searched for the bandwdth σ over 20 logarthmcally 2σ 2 ) spaced ponts from d avg /15 to 10 d avg where d avg s the average dstance between tranng data ponts. Ths control parameter s selected va 5-fold cross valdaton on the frst 10 permutatons, then the average of these values s used to tran the remanng permutatons. Each algorthm generates a famly of decson functons and set estmates. From these sets, we construct

16 16 Data set dm N tran N test banana breast-cancer dabetes flare-solar german heart rngnorm thyrod ttanc twonorm waveform mage splce Fg. 6. Descrpton of data sets. dm s the number of features, and N tran and N test are the numbers of tranng and testexamples. an ROC and compute ts area under the curve (AUC). We use the AUC averaged across permutatons to compare the performance of algorthms. As shown n Fg. 1, however, the set estmates from CS-SVMs or OC-SVMs are not properly nested, and cause ambguty partcularly n rankng. In Secton VI-C, we measure ths volaton of the nestng by defnng the rankng dsagreement of two rank scorng functons. Then n Secton VI-D, we combne ths rankng dsagreement and the AUC, and compare the algorthms over multple data sets usng the Wlcoxon sgned ranks test as suggested n [23]. A. Two-class Problems CS-SVMs and NCS-SVMs are compared n two-class problems. For NCS-SVMs, we set M = 5 and solved (7) at unformly spaced cost asymmetres γ = (0, 0.25, 0.50, 0.75, 1). In two-class problems, we also searched for the regularzaton parameter λ over 10 logarthmcally spaced ponts from 0.1 to λ max where λ max s λ max = max max y y j K,j, max y y j K,j. j I + j I Values of λ > λ max do not produce dfferent solutons n the CS-SVM (see Appendx C). We compared the descrbed algorthms by constructng ROCs and computng ther AUCs. The results are collected n Fg. 7

17 17 Two-class One-class: Postve One-class: Unform Data Set CS NCS OC NOC OC NOC banana ± ± ± ± ± ± breast-cancer ± ± ± ± ± ± dabetes ± ± ± ± ± ± flare-solar ± ± ± ± ± ± german ± ± ± ± ± ± heart ± ± ± ± ± ± rngnorm ± ± ± ± ± ± thyrod ± ± ± ± ± ± ttanc ± ± ± ± ± ± twonorm ± ± ± ± ± ± waveform ± ± ± ± ± ± mage ± ± ± ± ± ± splce ± ± ± ± ± ± Fg. 7. AUC values for the CS-SVM (CS) and NCS-SVM (NCS) n two-class problems, and OC-SVM (OC) and NOC-SVM (NOC) n one-class problems. In one-class problems, Postve ndcates that the alternatve hypotheses are from the postve class examples n the data sets, and Unform ndcated that the alternatve hypotheses are from a unform dstrbuton. B. One-class Problems For the NOC-SVM, we selected 11 densty levels spaced evenly from λ 1 = 1 N max j K,j (see Appendx C) to λ 11 = Among the two classes avalable n each data set, we chose the negatve class for tranng. Because the bandwdth selecton step requres computng AUCs, we smulated an artfcal second class from a unform dstrbuton. For evaluaton of the traned decson functons, both the postve examples n the test sets and a new unform sample were used as the alternatve class. Fg. 7 reports the results for both cases (denoted by Postve and Unform, respectvely). Fg. 8 shows the AUC of the two algorthms over a range of σ. Throughout the experments on oneclass problems, we observed that the NOC-SVM s more robust to the kernel bandwdth selecton than the OC-SVM. However, we dd not observe smlar results on two-class problems. C. Rankng dsagreement The decson sets from the OC-SVM and the CS-SVM are not properly nested, as llustrated n Fg. 1. Snce larger λ means hgher densty level, the densty level set estmate of the OC-SVM s expected to be contaned wthn the densty level set estmate at smaller λ. Lkewse, larger γ n the CS-SVM

18 breast cancer (Postve) 0.98 breast cancer (Unform) AUC OC NOC Kernel bandwdth σ AUC OC NOC Kernel bandwdth σ (a) Postve (b) Unform Fg. 8. The effect of kernel bandwdth σ on the performance (AUC). The AUC s evaluated when the alternatve class s from the postve class n the data sets (a) and from a unform dstrbuton (b). The NOC-SVM s less senstve to σ than the OC-SVM. penalzes msclassfcaton of postve examples more; thus, ts correspondng postve decson set should contan the decson set at smaller γ, and the two decson boundares should not cross. Ths undesred nature of the algorthms leads to non-unque rankng score functons. In the case of the CS-SVM, we can consder the followng two rankng functons: s + (x) = 1 mn γ, s (x) = 1 max {γ:f γ(x) 0} For the OC-SVM, we consder the next par of rankng functons, s + (x) = {γ:f γ(x) 0} γ. (21) max λ, s (x) = mn λ. (22) {λ:x Ĝ λ} {λ:x Ĝ λ} In words, s + ranks accordng to the frst set contanng a pont x and s ranks accordng to the last set contanng the pont. In ether case, t s easy to see s + (x) s (x). In order to quantfy the dsagreement of the two rankng functons, we defne the followng measure of rankng dsagreement: d(s +, s ) = 1 N max j 1 {(s +(x ) s +(x j))(s (x ) s (x j))<0}, whch s the proporton of data ponts ambguously ranked,.e., ranked dfferently wth respect to at least one other pont. Then d(s +, s ) = 0 f and only f s + and s nduce the same rankng. Wth these rankng functons, Fg. 9 reports the rankng dsagreements from the CS-SVM and OC- SVM. In the table, d 2 refers to the rankng dsagreement of the CS-SVM, and d p and d u respectvely refer to the rankng dsagreement of the OC-SVM when the second class s from the postve samples and from an artfcal unform dstrbuton. As can be seen n the table, for some data sets the volaton of the nestng causes severe dfferences between the above rankng functons.

19 19 Data set d 2(s +, s ) d p(s +, s ) d u(s +, s ) banana breast-cancer dabetes flare-solar german heart rngnorm thyrod ttanc twonorm waveform mage splce Fg. 9. The measure of dsagreement of the two rankng functons from the CS-SVM and OC-SVM. The meanng of each subscrpt s explaned n the text. s + and s are defned n (21) and (22). D. Statstcal comparson We employ the statstcal methodology of Demšar [23] to compare the algorthms across all data sets. Usng the Wlcoxon sgned ranks test, we compare the CS-SVM and the NCS-SVM for two-class problems, and the OC-SVM and the NOC-SVM for one-class problems. The Wlcoxon sgned ranks test s a non-parametrc method testng the sgnfcance of dfferences between pared observatons, and can be used to compare the performances between two algorthms over multple data sets. The dfference between the AUCs from the two algorthms are ranked gnorng the sgns, and then the ranks of postve and negatve dfferences are added. Fg. 10 and Fg. 11 respectvely report the comparson results of the algorthms for two-class problems and one-class problems. Here the numbers under NCS or NOC denote the sums of ranks of the data sets on whch the nested SVMs performed better than the orgnal SVMs; the values under CS or OC are for the opposte. T s the smaller of the two sums. For a confdence level of α = 0.01 and 13 data sets, the dfference between algorthms s sgnfcant f T s less than or equal to 9 [24]. Therefore, any sgnfcant performance dfference between the CS-SVM and the NCS-SVM was not detected n the test. Lkewse, no dfference between the OC-SVM and the NOC-SVM was detected. However, the AUC alone does not hghlght the rankng dsagreement of the algorthms. Therefore, we merge the AUC and the dsorder measurement, and consder AUC d(s +, s ) for algorthm comparson.

20 20 CS NCS T Fg. 10. Comparson of the AUCs of the two-class problem algorthms: CS-SVM (CS) and NCS-SVM (NCS) usng the Wlcoxon sgned ranks test (see text for detal.) The test statstc T s greater than the crtcal dfference 9, hence no sgnfcant dfference s detected n the test. OC NOC T Postve Unform Fg. 11. Comparson of the OC-SVM (OC) and NOC-SVM (NOC). In the one-class problems, both cases of alternatve hypothess are consdered. Here no sgnfcant dfference s detected. Fg. 12 shows the results of the Wlcoxon sgned-ranks test usng ths combned performance measure. From the results, we can observe clearly the performance dfferences between algorthms. Snce the test statstc T s smaller than the crtcal dfference 9, the NCS-SVM outperforms the CS-SVM. Lkewse, the performance dfference between the OC-SVM and the NOC-SVM s also detected by the Wlcoxon test for both cases of the second class. Therefore, we can conclude that the nested algorthms perform better than ther unnested counterparts. E. Run tme comparson Fg. 13 shows the tranng tmes for each algorthm. The results for the CS-SVM and OC-SVM are based on our Matlab mplementaton of soluton path algorthms [8], [11] avalable at eecs.umch.edu/ cscott/code/svmpath.zp. We emphasze here that our decomposton algorthm reles on Matlab s quadprog functon as the basc subproblem solver, and that ths functon s n no way optmzed for our partcular subproblem. A dscusson of computatonal complexty was gven n V-C. CS NCS T OC NOC T Postve Unform Fg. 12. Comparson of the algorthms based on the AUC along wth the rankng dsagreement. Left: CS-SVM and NCS-SVM. Rght: OC-SVM and NOC-SVM. T s less than the crtcal values 9, hence the nested SVMs outperforms the orgnal SVMs.

21 21 Data set CS NCS OC NOC banana breast-cancer dabetes flare-solar german heart rngnorm thyrod ttanc twonorm waveform mage splce Fg. 13. Average tranng tmes (sec) for the CS-SVM, NCS-SVM, OC-SVM, and NOC-SVM on benchmark data sets. Ths result s based on our mplementaton of soluton path algorthms for the CS-SVM and OC-SVM. VII. PRIMAL OF NESTED SVMS Although not essental for our approach, we can fnd a prmal optmzaton problem of the NCS-SVM f we thnk of (7) as a dual problem: M mn λ w,ξ 2 w m 2 + γ m ξ,m + (1 γ m ) m=1 I + s.t. M w k, Φ(x ) k=m m k=1 w k, Φ(x ) ξ,m 0,, m. I M (1 ξ,k ), I +, m k=m m k=1 ξ,m (1 ξ,k ), I, m The dervaton of (7) from ths prmal can be found n [25]. Note that the above prmal of the NCS-SVM reduces to the prmal of the CS-SVM (1) when M = 1.

22 22 Lkewse, the prmal correspondng to the NOC-SVM s [ M λm 2 w m N mn w,ξ s.t. m=1 M k=m λ k w k, Φ(x ) ξ,m 0,, m, M k=m ξ,m ] λ k (1 ξ,m ), whch also bols down to the prmal of the OC-SVM (4) when M = 1., m (23) Wth these formulatons, we can see the geometrc meanng of w and ξ. For smplcty, consder (23) when M = 2: λ 2 mn {w m},{ξ,m} 2 w N ξ,2 + λ 1 2 w N ξ,1 s.t. λ 2 w 2, Φ(x ) λ 2 (1 ξ,2 ), λ 2 w 2 + λ 1 w 1, Φ(x ) λ 2 (1 ξ,2 ) + λ 1 (1 ξ,1 ), ξ,m 0,, m. Here ξ,1 > 0 when x les between the hyperplane P λ 2 w 2 +λ 1 w 1 and the orgn, and ξ,2 > 0 when the pont λ 2 +λ 1 les between P w2 and the orgn where we used P w to denote {Φ(x) : w, Φ(x) = 1}, a hyperplane n H. Note that from the nestng structure, the hyperplane P λ 2 w 2 +λ 1 w 1 s located between P w1 and P w2. λ 2 +λ 1 λ Then we can show that 1ξ,1+λ 2ξ,2 λ s the dstance between the pont x 1w 1+λ 2w 2 and the hyperplane P λ 2 w 2 +λ 1 w 1. λ 2 +λ 1 VIII. CONCLUSION In ths paper, we ntroduced a novel framework for buldng a famly of nested support vector machnes for the tasks of cost-senstve classfcaton and densty level set estmaton. Our approach nvolves formng new quadratc programs nspred by the cost-senstve and one-class SVMs, wth addtonal constrants that enforce nestng structure. Our constructon generates a fnte number of nested set estmates at a pre-selected set of parameter values, and lnearly nterpolates these sets to a contnuous nested famly. We also developed effcent algorthms to solve the proposed quadratc problems. Thus, the NCS-SVM yelds a famly of nested classfers ndexed by cost asymmetry γ, and the NOC-SVM yelds a famly of nested densty level set estmates ndexed by densty level λ. Unlke the orgnal SVMs, whch are not nested, our methods can be readly appled to problems requrng multple set estmaton ncludng clusterng, rankng, and anomaly detecton.

23 23 In expermental evaluatons, we found that non-nested SVMs can yeld hghly ambguous rankngs for many datasets, and that nested SVMs offer consderable mprovements n ths regard. Nested SVMs also exhbt greater stablty wth respect to model selecton crtera such as cross-valdaton. In terms of area under the ROC (AUC), we found that enforcement of nestng appears to have a bgger mpact on one-class problems. However, nether cost-senstve nor one-class classfcaton problems dsplayed sgnfcantly dfferent AUC values between nested and non-nested methods. Recently Clémençon and Vayats [26] developed a method for bpartte rankng that also nvolves computng nested estmates of cost-senstve classfers at a fnte grd of costs. Ther set estmates are computed ndvdually, and nestng s mposed subsequently through an explct process of successve unons. These sets are then extended to a complete scorng functon through pecewse constant nterpolaton. Ther nterest s prmarly theoretcal, as ther estmates ental emprcal rsk mnmzaton, and ther results assume the underlyng Bayes classfers les n a Vapnk-Chervonenks class. The statstcal consstency of our nested SVMs s an nterestng open queston. Such a result would lkely depend on the consstency of the orgnal CS-SVM or OC-SVM at fxed values of γ or λ, respectvely. We are unaware of consstency results for the CS-SVM at fxed γ [27]. However, consstency of the OC-SVM for fxed λ has been establshed [19]. Thus, suppose Ĝλ 1,..., Ĝλ M are (non-nested) OC-SVMs at a grd of ponts. Snce these estmators are each consstent, and the true levels sets they approxmate are nested, t seems plausble that for a suffcently large sample sze, these OC-SVMs are also nested. In ths case, they would be feasble for the NOC-SVM, whch would suggest that the NOC- SVM estmates the true level sets at least as well, asymptotcally, at these estmates. Takng the grd of levels {λ } to be ncreasngly dense, the error of the nterpolaton scheme should also vansh. We leave t as future work to determne whether ths ntuton can be formalzed. APPENDIX A DATA POINT SELECTION AND TERMINATION CONDITION OF NCS-SVM On each round, the algorthm n Fg. 3 selects an example x, updates ts correspondng varables {α,m } M m=1, and checks the termnaton condton. In ths appendx, we employ the KKT condtons to derve an effcent varable selecton strategy and a termnaton condton of NCS-SVM. We use the KKT condtons to fnd the necessary condtons of the optmal soluton of (7). Before we proceed, we defne α,0 = 0 for I + and α,m+1 = 0 for I for notatonal convenence. Then the

24 24 Lagrangan of the quadratc program s L(α, u, v) = 1 α,m α j,m y y j K,j α,m m 2λ,j + u,m (α,m 1 {y<0} y γ m ) m + v,m (α,m 1 α,m ) m I + v,m (α,m α,m+1 ) m I where u,m 0 and v,m 0 for, m. At the global mnmum, the dervatve of the Lagrangan wth respect to α,m vanshes L v,m + v,m+1, I + = y f,m 1 + u,m α,m +v,m 1 v,m, I where, recall, f,m = 1 λ = 0 (24) j α j,my j K,j and we ntroduced auxlary varables v,m+1 = 0 for I + and v,0 = 0 for I. Then we obtan the followng set of constrants from the KKT condtons v,m v,m+1, I + y f,m 1 + u,m = (25) v,m 1 + v,m, I 0 α,m 1 {y<0} + y γ m,, m (26) y α,1 y α,2 y α,m, (27) ( ) u,m α,m 1 {y<0} y γ m = 0,, m (28) v,m (α,m 1 α,m ) = 0, I +, m (29) v,m (α,m α,m+1 ) = 0, I, m (30) u,m 0, v,m 0,, m. (31) Snce (7) s a convex program, the KKT condtons are also suffcent [21]. That s, α,m, u,m, and v,m satsfyng (25)-(31) s ndeed optmal. Therefore, at the end of each teraton, we assess a current soluton wth these condtons and decde whether to stop or to contnue. We evaluate the amount of error for x by defnng e = m L α,m,.

25 25 α,m 1 < α,m α,m 1 = α,m α,m < mn(γ m, α,m+1) u,m = 0 u,m = 0 v,m = 0 v,m = max(f,m 1, 0) α,m = γ m < α,m+1 u,m = max(1 f,m, 0) - v,m = 0 - α,m = α,m+1 < γ m u,m = 0 u,m = 0 v,m = 0 v,m = max(f,m 1 + v,m+1, 0) α,m = α,m+1 = γ m u,m = max(1 f,m v,m+1, 0) - v,m = 0 - α,m 1 < α,m α,m 1 = α,m α,m < γ M u,m = 0 u,m = 0 v,m = 0 v,m = max(f,m 1, 0) α,m = γ M u,m = max(1 f,m, 0) - v,m = 0 - Fg. 14. The optmalty condtons of NCS-SVM when I +. (Upper: m = 1, 2,..., M 1, Lower: m = M.) Assumng α,m are optmal, u,m and v,m are solved as above from the KKT condtons. Empty entres ndcate cases that cannot occur. α,m+1 < α,m α,m+1 = α,m α,m < mn(1 γ m, α,m 1) u,m = 0 u,m = 0 v,m = 0 v,m = max( f,m 1, 0) α,m = 1 γ m < α,m 1 u,m = max(1 + f,m, 0) - v,m = 0 - α,m = α,m 1 < 1 γ m u,m = 0 u,m = 0 v,m = 0 v,m = max( f,m 1 + v,m 1, 0) α,m = α,m 1 = 1 γ m u,m = max(1 + f,m v,m 1, 0) - v,m = 0 - α,2 < α,1 α,2 = α,1 α,1 < 1 γ 1 u,1 = 0 u,1 = 0 v,1 = 0 v,1 = max( f,1 1, 0) α,1 = 1 γ 1 u,1 = max(1 + f,1, 0) - v,1 = 0 - Fg. 15. The optmalty condtons of NCS-SVM when I. (Upper: m = 2,..., M, Lower: m = 1.)

26 26 An optmal soluton makes these quanttes zero. In practce, when ther sum e decreases below a predetermned tolerance, the algorthm stops and returns the current soluton. If not, the algorthm chooses the example wth the largest e and contnues the loop. Computng e nvolves unknown varables u,m and v,m (see (24)), whereas f,m can be easly computed from the known varables α,m. Fg. 14 and Fg. 15 are for determnng these u,m and v,m. These tables are obtaned by frstly assumng the current soluton α,m s optmal and secondly solvng u,m and v,m such that they satsfy the KKT condtons. Thus, dependng on the value α,m between ts upper and lower bounds, u,m and v,m can be smply set as drected n the tables. For example, f I +, then we fnd u,m and v,m by referrng Fg. 14 teratvely from m = M down to m = 1. If I, we use Fg. 15 and terate from m = 1 up to m = M. Then the obtaned e takes a non-zero value only when the assumpton s false and the current soluton s sub-optmal. APPENDIX B DATA POINT SELECTION AND TERMINATION CONDITION OF NOC-SVM As n NCS-SVM, we nvestgate the optmalty condton of NOC-SVM (12) and fnd a data pont selecton method and a termnaton condton. Wth a slght modfcaton, we rewrte (12), M mn 1 α α,m α j,m K,j 1,...,α M 2λ m=1 m,j α,m (32) s.t. α,m 1 N,, m 0 α,1 λ 1 α,2 λ 2 α,m λ M, We then use the KKT condtons to fnd the necessary condtons of the optmal soluton of (32). The Lagrangan s M L(α, u, v) = 1 α,m α j,m K,j 2λ m=1 m,j. α,m M + u,m (α,m 1 N ) α,1 v,1 λ m=1 1 + M ( α,m 1 v,m α ),m λ m=2 m 1 λ m where u,m 0 and v,m 0 for, m. At the global mnmum, the dervatve of the Lagrangan wth

27 27 respect to α,m vanshes L = f,m 1 + u,m α,m v,m λ m v,m + v,m+1 λ m, m M λ M, m = M = 0. (33) where, recall, f,m = 1 λ m j α j,mk,j. Then, from the KKT condtons, we obtan the followng set of constrants for x : f,m 1 + u,m = v,m λ m v,m+1 λ m, m M v,m λ M, m = M (34) α,m 1, m (35) N 0 α,1 λ 1 α,2 λ 2 α,m λ M (36) u,m (α,m 1 ) = 0, m (37) N v,m ( α,m 1 λ m 1 α,m λ m ) = 0, m (38) u,m 0, v,m 0, m. (39) Snce (32) s a convex program, the KKT condtons are suffcent [21]. That s, α,m, u,m, and v,m satsfyng (34)-(39) s ndeed optmal. Therefore, at the end of each teraton, we assess a current soluton wth these condtons and decde whether to stop or to contnue. We evaluate the amount of error for x by defnng e = L m α,m,. An optmal soluton makes these quanttes zero. In practce, when ther sum e decreases below a predetermned tolerance, the algorthm stops and returns the current soluton. If not, the algorthm chooses the example wth the largest e and contnues the loop. Computng e nvolves unknown varables u,m and v,m (see (33)), whereas f,m can be easly computed from the known varables α,m. Fg. 16 are for determnng these u,m and v,m. These tables are obtaned by frstly assumng the current soluton α,m s optmal and secondly solvng u,m and v,m such that they satsfy the KKT condtons. Thus, dependng on the value α,m between ts upper and lower bounds, u,m and v,m can be smply set by referrng Fg. 16 teratvely from m = M down to m = 1. Then the obtaned e takes a non-zero value only when the assumpton s false and the current soluton s not optmal.

28 28 Fg. 16. α,m < mn( 1 N, α,m = 1 N < α,m = α,m = λ m λ m 1 α,m 1 < α,m λ m λ m 1 α,m 1 = α,m λ m λ m+1 α,m+1) u,m = 0 u,m = 0 v,m = 0 v,m = max(λ m(f,m 1), 0) λm λ m+1 α,m+1 u,m = max(1 f,m, 0) - v,m = 0 - λm λ m+1 α,m+1 < 1 u N,m = 0 u,m = 0 λm λ m+1 α,m+1 = 1 N v,m = 0 v,m = max(λ m(f,m 1 + v,m+1 λ m ), 0) u,m = max(1 f,m v,m+1 λ m, 0) - λ M λ M 1 α,m 1 < α,m v,m = 0 - λ M λ M 1 α,m 1 = α,m α,m < 1 N u,m = 0 u,m = 0 v,m = 0 v,m = max(λ M (f,m 1), 0) α,m = 1 N u,m = max(1 f,m, 0) - v,m = 0 - The optmalty condtons of NOC-SVM. (Upper: m = 1, 2,..., M 1, and Lower: m = M.) Empty entres ndcate cases that cannot occur. APPENDIX C MAXIMUM VALUE OF λ OF CS-SVM AND OC-SVM In ths appendx, we fnd the values of the regularzaton parameter λ over whch OC-SVM or CS-SVM generate the same solutons. Frst, we consder OC-SVM. The decson functon of OC-SVM s f λ (x) = 1 λ j α jk(x j, x) and f λ (x) = 1 forms the margn. For suffcently large λ, every data pont x falls nsde the margn (f λ (x ) 1). Snce the KKT optmalty condtons of (4) mply α = 1 N for the data ponts such that f λ(x ) < 1, we obtan λ 1 N j K,j for. Therefore, f the maxmum row sum of the kernel matrx s denoted 1 as λ OC = max N j K,j, then for any λ λ OC, the optmal soluton of OC-SVM becomes α = 1 N for. Next, we consder the regularzaton parameter λ of n the formulaton (1) of CS-SVM. The decson functon of CS-SVM s f γ (x) = 1 λ j α jy j k(x j, x), and the margn s yf γ (x) = 1. Thus, f λ s suffcently large, all the data ponts are nsde the margn and satsfy y f γ (x ) 1. Then λ j I + γy y j K,j + j I (1 γ)y y j K,j for because α = 1 {y<0} + y γ for all the data ponts

29 29 such that y f γ (x ) < 1 from the KKT condtons. For a gven γ, let λ CS (γ) = max γ y y j K,j + (1 γ) y y j K,j. j I + j I Then for λ > λ CS (γ), the soluton of CS-SVM becomes α = 1 {y<0} + y γ for. Therefore, snce λ CS (γ) (1 γ)λ CS (0) + γλ CS (1) for all γ [0, 1], values of λ > max (λ CS (0), λ CS (1)) generate the same solutons n CS-SVM. REFERENCES [1] J. A. Hartgan, Consstency of sngle lnkage for hgh-densty clusters, J. of the Amercan Stat. Assocaton, vol. 76, pp , [2] R. Lu, J. Parelus, and K. Sngh, Multvarate analyss by data depth: descrptve statstcs, graphcs and nference, Annals of Statstcs, vol. 27, pp , [3] C. Scott and R. Nowak, Learnng mnmum volume sets, Journal of Machne Learnng Research, vol. 7, pp , [4] C. Scott and E. D. Kolaczyk, Annotated mnmum volume sets for nonparametrc anomaly dscovery, n IEEE Workshop on Statstcal Sgnal Processng, 2007, pp [5] R. Herbrch, T. Graepel, and K. Obermayer, Large margn rank boundares for ordnal regresson, Advances n Large Margn Classfers, pp , [6] C. Scott and R. Nowak, A Neyman-Pearson approach to statstcal learnng, IEEE Trans. Inf. Theory, vol. 51, pp , [7] C. Scott and G. Blanchard, Novelty detecton: Unlabeled data defntely help, Proceedngs of the Twelfth Internatonal Conference on Artfcal Intellgence and Statstcs, vol. 5, pp , [8] F. R. Bach, D. Heckerman, and E. Horvtz, Consderng cost asymmetry n learnng classfers, Journal of Machne Learnng Research, vol. 7, pp , [9] B. Schölkopf and A. Smola, Learnng wth Kernels. Cambrdge, MA: MIT Press, [10] T. Haste, S. Rosset, R. Tbshran, and J. Zhu, The entre regularzaton path for the support vector machne, Journal of Machne Learnng Research, vol. 5, pp , [11] G. Lee and C. Scott, The one class support vector machne soluton path, n IEEE Intl. Conf. on Acoustcs, Speech and Sgnal Proc. (ICASSP), vol. 2, 2007, pp. II 521 II 524. [12], Nested support vector machnes, n IEEE Intl. Conf. on Acoustcs, Speech and Sgnal Proc. (ICASSP), 2008, pp [13] S. Agarwal, T. Graepel, R. Herbrch, S. Har-Peled, and D. Roth, Generalzaton bounds for the area under the roc curve, Journal of Machne Learnng Research, vol. 6, pp , [14] W. Stuetzle, Estmatng the cluster tree of a densty by analyzng the mnmal spannng tree of a sample, Journal of Classfcaton, vol. 20, no. 5, pp , [15] V. Kecman, Learnng and Soft Computng, Support Vector Machnes, Neural Networks, and Fuzzy Logc Models. Cambrdge, MA: MIT Press, 2001.

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines. CS534 - Machine Learning Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems Taxonomy of Large Margn Prncple Algorthms for Ordnal Regresson Problems Amnon Shashua Computer Scence Department Stanford Unversty Stanford, CA 94305 emal: shashua@cs.stanford.edu Anat Levn School of Computer

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Intra-Parametric Analysis of a Fuzzy MOLP

Intra-Parametric Analysis of a Fuzzy MOLP Intra-Parametrc Analyss of a Fuzzy MOLP a MIAO-LING WANG a Department of Industral Engneerng and Management a Mnghsn Insttute of Technology and Hsnchu Tawan, ROC b HSIAO-FAN WANG b Insttute of Industral

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c Improvements to SMO Algorthm for SVM Regresson 1 S.K. Shevade S.S. Keerth C. Bhattacharyya & K.R.K. Murthy shrsh@csa.sc.ernet.n mpessk@guppy.mpe.nus.edu.sg cbchru@csa.sc.ernet.n murthy@csa.sc.ernet.n 1

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming CEE 60 Davd Rosenberg p. LECTURE NOTES Dualty Theory, Senstvty Analyss, and Parametrc Programmng Learnng Objectves. Revew the prmal LP model formulaton 2. Formulate the Dual Problem of an LP problem (TUES)

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Machine Learning. K-means Algorithm

Machine Learning. K-means Algorithm Macne Learnng CS 6375 --- Sprng 2015 Gaussan Mture Model GMM pectaton Mamzaton M Acknowledgement: some sldes adopted from Crstoper Bsop Vncent Ng. 1 K-means Algortm Specal case of M Goal: represent a data

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Discriminative classifiers for object classification. Last time

Discriminative classifiers for object classification. Last time Dscrmnatve classfers for object classfcaton Thursday, Nov 12 Krsten Grauman UT Austn Last tme Supervsed classfcaton Loss and rsk, kbayes rule Skn color detecton example Sldng ndo detecton Classfers, boostng

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Fitting: Deformable contours April 26 th, 2018

Fitting: Deformable contours April 26 th, 2018 4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Relevance Feedback Document Retrieval using Non-Relevant Documents

Relevance Feedback Document Retrieval using Non-Relevant Documents Relevance Feedback Document Retreval usng Non-Relevant Documents TAKASHI ONODA, HIROSHI MURATA and SEIJI YAMADA Ths paper reports a new document retreval method usng non-relevant documents. From a large

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Automatic selection of reference velocities for recursive depth migration

Automatic selection of reference velocities for recursive depth migration Automatc selecton of mgraton veloctes Automatc selecton of reference veloctes for recursve depth mgraton Hugh D. Geger and Gary F. Margrave ABSTRACT Wave equaton depth mgraton methods such as phase-shft

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation College of Engneerng and Coputer Scence Mechancal Engneerng Departent Mechancal Engneerng 309 Nuercal Analyss of Engneerng Systes Sprng 04 Nuber: 537 Instructor: Larry Caretto Solutons to Prograng Assgnent

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information