Pattern Recognition 46 (2013) Contents lists available at SciVerse ScienceDirect. Pattern Recognition

Size: px
Start display at page:

Download "Pattern Recognition 46 (2013) Contents lists available at SciVerse ScienceDirect. Pattern Recognition"

Transcription

1 attern Recognton 46 (3) Contents lsts avalable at ScVerse ScenceDrect attern Recognton journal homepage: Localzed algorthms for multple kernel learnng Mehmet Gönen n, Ethem Alpaydın Department of Computer Engneerng, Boğazc- Unversty, TR-3434 Bebek, _ Istanbul, Turkey artcle nfo abstract Artcle hstory: Receved September Receved n revsed form 8 March Accepted September Avalable onlne September Keywords: Multple kernel learnng Support vector machnes Support vector regresson Classfcaton Regresson Selectve attenton Instead of selectng a sngle kernel, multple kernel learnng (MKL) uses a weghted sum of kernels where the weght of each kernel s optmzed durng tranng. Such methods assgn the same weght to a kernel over the whole nput space, and we dscuss localzed multple kernel learnng (LMKL) that s composed of a kernel-based learnng algorthm and a parametrc gatng model to assgn local weghts to kernel functons. These two components are traned n a coupled manner usng a two-step alternatng optmzaton algorthm. Emprcal results on benchmark classfcaton and regresson data sets valdate the applcablty of our approach. We see that LMKL acheves hgher accuracy compared wth canoncal MKL on classfcaton problems wth dfferent feature representatons. LMKL can also dentfy the relevant parts of mages usng the gatng model as a salency detector n mage recognton problems. In regresson tasks, LMKL mproves the performance sgnfcantly or reduces the model complexty by storng sgnfcantly fewer support vectors. & Elsever Ltd. All rghts reserved.. Introducton us to obtan the followng dual formulaton: Support vector machne (SVM) s a dscrmnatve classfer based on the theory of structural rsk mnmzaton [33]. Gven a sample of ndependent and dentcally dstrbuted tranng nstances fðx,y Þg N, where x AR D and y Af, þg s ts class label, SVM fnds the lnear dscrmnant wth the maxmum margn n the feature space nduced by the mappng functon FðÞ. The dscrmnant functon s f ðxþ¼/w,fðxþsþb whose parameters can be learned by solvng the followng quadratc optmzaton problem: mn: XN JwJ þc x war S, nar N þ, bar s:t: y ð/w,fðx ÞSþbÞZ x 8 where w s the vector of weght coeffcents, S s the dmensonalty of the feature space obtaned by FðÞ, C s a predefned postve trade-off parameter between model smplcty and classfcaton error, n s the vector of slack varables, and b s the bas term of the separatng hyperplane. Instead of solvng ths optmzaton problem drectly, the Lagrangan dual functon enables n Correspondng author. E-mal addresses: gonen@boun.edu.tr (M. Gönen), alpaydn@boun.edu.tr (E. Alpaydın). max: s:t: a aa½,cš N a y ¼ j ¼ a a y y j kðx,x j Þ where a s the vector of dual varables correspondng to each separaton constrant and the obtaned kernel matrx of kðx,x j Þ¼ /Fðx Þ,Fðx j ÞS s postve semdefnte. Solvng ths, we get w ¼ N a y Fðx Þ and the dscrmnant functon can be wrtten as f ðxþ¼ XN a y kðx,xþþb: There are several kernel functons successfully used n the lterature such as the lnear kernel (k L ), the polynomal kernel (k ), and the Gaussan kernel (k G ) k L ðx,x j Þ¼/x,x j S k ðx,x j Þ¼ð/x,x j SþÞ q qan k G ðx,x j Þ¼expð Jx x j J =s Þ sar þþ : There are also kernel functons proposed for partcular applcatons, such as natural language processng [4] and bonformatcs [3]. Selectng the kernel functon kð,þ and ts parameters (e.g., q or s) s an mportant ssue n tranng. Generally, a cross-valdaton procedure s used to choose the best performng kernel functon 3-33/$ - see front matter & Elsever Ltd. All rghts reserved.

2 796 M. Gönen, E. Alpaydın / attern Recognton 46 (3) among a set of kernel functons on a separate valdaton set dfferent from the tranng set. In recent years, multple kernel learnng (MKL) methods are proposed, where we use multple kernels nstead of selectng one specfc kernel functon and ts correspondng parameters k Z ðx,x j Þ¼f Z ðfk m ðx m Þg Þ ðþ where the combnaton functon f Z ðþ can be a lnear or a nonlnear functon of the nput kernels. Kernel functons, fk m ð,þg, take feature representatons (not necessarly dfferent) of data nstances, where x ¼fx m g, xm AR Dm, and D m s the dmensonalty of the correspondng feature representaton. The reasonng s smlar to combnng dfferent classfers: Instead of choosng a sngle kernel functon and puttng all our eggs n the same basket, t s better to have a set and let an algorthm do the pckng or combnaton. There can be two uses of MKL: () Dfferent kernels correspond to dfferent notons of smlarty and nstead of tryng to fnd whch works best, a learnng method does the pckng for us, or may use a combnaton of them. Usng a specfc kernel may be a source of bas, and n allowng a learner to choose among a set of kernels, a better soluton can be found. () Dfferent kernels may be usng nputs comng from dfferent representatons possbly from dfferent sources or modaltes. Snce these are dfferent representatons, they have dfferent measures of smlarty correspondng to dfferent kernels. In such a case, combnng kernels s one possble way to combne multple nformaton sources. Snce ther orgnal concepton, there s sgnfcant work on the theory and applcaton of multple kernel learnng. Fxed rules use the combnaton functon n () as a fxed functon of the kernels, wthout any tranng. Once we calculate the combned kernel, we tran a sngle kernel machne usng ths kernel. For example, we can obtan a vald kernel by takng the summaton or multplcaton of two kernels as follows []: k Z ðx,x j Þ¼k ðx,x j Þþk ðx,x j Þ k Z ðx,x j Þ¼k ðx,x j Þk ðx,x j Þ: The summaton rule s appled successfully n computatonal bology [7] and optcal dgt recognton [5] to combne two or more kernels obtaned from dfferent representatons. Instead of usng a fxed combnaton functon, we can have a functon parameterzed by a set of parameters H and then we have a learnng procedure to optmze H as well. The smplest case s to parameterze the sum rule as a weghted sum k Z ðx,x j 9H ¼ gþ¼ X Z m k m ðx m Þ wth Z m AR. Dfferent versons of ths approach dffer n the way they put restrctons on the kernel weghts [,4,9,9]. For example, we can use arbtrary weghts (.e., lnear combnaton), nonnegatve kernel weghts (.e., conc combnaton), or weghts on a smplex (.e., convex combnaton). A lnear combnaton may be restrctve and nonlnear combnatons are also possble [3,3,8]; our proposed approach s of ths type and we wll dscuss these n more detal later. We can learn the kernel combnaton weghts usng a qualty measure that gves performance estmates for the kernel matrces calculated on tranng data. Ths corresponds to a functon that assgns weghts to kernel functons g ¼ g Z ðfk m ðx m Þg Þ: The qualty measure used for determnng the kernel weghts could be kernel algnment [,] or another smlarty measure such as the Kullback Lebler dvergence [36]. Another possblty nspred from ensemble and boostng methods s to teratvely update the combned kernel by addng a new kernel as tranng contnues [5,9]. In a traned combner parameterzed by H, f we assume H to contan random varables wth a pror, we can use a Bayesan approach. For the case of a weghted sum, we can, for example, have a pror on the kernel weghts [,,8]. A recent survey of multple kernel learnng algorthms s gven n [8]. Ths paper s organzed as follows: We formulate our proposed nonlnear combnaton method localzed MKL (LMKL) wth detaled mathematcal dervatons n Secton. We gve our expermental results n Secton 3 where we compare LMKL wth MKL and sngle kernel SVM. In Secton 4, we dscuss the key propertes of our proposed method together wth related work n the lterature. We conclude n Secton 5.. Localzed multple kernel learnng Usng a fxed unweghted or weghted sum assgns the same weght to a kernel over the whole nput space. Assgnng dfferent weghts to a kernel n dfferent regons of the nput space may produce a better classfer. If the data has underlyng local structure, dfferent smlarty measures may be suted n dfferent regons. We propose to dvde the nput space nto regons usng a gatng functon and assgn combnaton weghts to kernels n a data-dependent way [3]; n the neural network lterature, a smlar archtecture s prevously proposed under the name mxture of experts [,3]. The dscrmnant functon for bnary classfcaton s rewrtten as f ðxþ¼ X Z m ðx9vþ/w m,f m ðx m ÞSþb where Z m ðx9vþ s a parametrc gatng model that assgns a weght to F m ðx m Þ as a functon of x and V s the matrx of gatng model parameters. Note that unlke n MKL, n LMKL, t s not oblgatory to combne dfferent feature spaces; we can also use multple copes of the same feature space (.e., kernel) n dfferent regons of the nput space and thereby obtan a more complex dscrmnant functon. For example, as we wll see shortly, we can combne multple lnear kernels to get a pecewse lnear dscrmnant... Gatng models In order to assgn kernel weghts n a data-dependent way, we use a gatng model. Orgnally, we nvestgated the softmax gatng model [3] Z m ðx9vþ¼ expð/v m,x G Sþv m Þ h ¼ expð/v 8m ð3þ h,x G Sþv h Þ where x G AR DG s the representaton of the nput nstance n the ðdg þ Þ feature space n whch we learn the gatng model and VAR contans the gatng model parameters fv m,v m g. The softmax gatng model uses kernels n a compettve manner and generally a sngle kernel s actve for each nput. It s possble to use other gatng models and below, we dscuss two new ones, namely sgmod and Gaussan. The gatng model defnes the shape of the regon of expertse of the kernels. The sgmod functon allows multple kernels to be used n a cooperatve manner Z m ðx9vþ¼=ðþexpð /v m,x G S v m ÞÞ 8m: ð4þ Instead of parameterzng the boundares of the local regons for kernels, we can also parameterze ther centers and spreads usng Gaussan gatng Z m ðx9vþ¼ expð JxG l m J =s m Þ h ¼ expð JxG l h J =s h Þ 8m ð5þ ðþ

3 M. Gönen, E. Alpaydın / attern Recognton 46 (3) where VAR ðdg þ Þ contans the means, fl m g, and the spreads, fs m g ; we do not experment any further wth ths n ths current work. If we combne the same feature representaton wth dfferent kernels (.e., x ¼ x ¼ x ¼... ¼ x ), we can smply use t also n the gatng model (.e., x G ¼ x) [3]. If we combne dfferent feature representatons wth the same kernel, the gatng model representaton x G can be one of the representatons fx m g, a concatenaton of a subset of them, or a completely dfferent representaton. In some applcaton areas such as bonformatcs where data nstances may appear n a non-vectoral format such as sequences, trees, and graphs, where we can calculate kernel matrces but cannot represent the data nstances as x vectors drectly, we may use an emprcal kernel map [3, Chapter ], whch corresponds to usng the kernel values between x and tranng ponts as the feature vector for x, and defne x G n terms of the kernel values [5] x G ¼½k G ðx,xþ k G ðx,xþ k G ðx N,xÞŠ > where the gatng kernel, k G ð,þ, can be one of the combned kernels, fk m ð,þg, a combnaton of them, or a completely dfferent kernel used only for determnng the gatng boundares... Mathematcal model Usng the dscrmnant functon n () and regularzng the dscrmnant coeffcents of all the feature spaces together, LMKL obtans the followng optmzaton problem: mn: X Jw m J XN þc x w m AR Sm, nar N þ, VARðDG þ Þ, bar s:t: y f ðx ÞZ x 8 ð6þ where nonconvexty s ntroduced to the model due to the nonlnearty formed usng the gatng model outputs n the separaton constrants. Instead of tryng to solve (6) drectly, we can use a two-step alternatng optmzaton algorthm [3], also used for choosng kernel parameters [6] and obtanng Z m parameters of MKL [9]. Ths procedure conssts of two basc steps: () solvng the model wth a fxed gatng model, and, () updatng the gatng model parameters wth the gradents calculated from the current soluton. Note that f we fx the gatng model parameters, the optmzaton problem (6) becomes convex and we can fnd the correspondng dual optmzaton problem usng dualty. For a fxed V, we obtan the Lagrangan dual of the prmal problem (6) as follows: L D ðvþ¼ X Jw m J XN þc x XN b x XN a y f ðx Þ þx and takng the dervatves of L D ðvþ wth respect to the prmal varables D ðvþ ¼ ) w m ¼ XN y Z m ðx 9VÞF m ðx m Þ D ðvþ ¼ ) XN y D ðvþ ¼ ) C ¼ þb 8: ð7þ From L D ðvþ and (7), the dual formulaton s obtaned as max: JðVÞ¼ XN aa½,cš N a j ¼ a a y y j k Z ðx,x j Þ 8m s:t: a y ¼ where the locally combned kernel functon s defned as k Z ðx,x j Þ¼ X Z m ðx 9VÞk m ðx m ÞZ m ðx j 9VÞ: Note that f the nput kernel matrces are postve semdefnte, the combned kernel matrx s also postve semdefnte by constructon. Locally combned kernel matrx s the summaton of matrces obtaned by pre- and post-multplyng each kernel matrx by the vector that contans gatng model outputs for ths kernel. Usng the support vector coeffcents obtaned from (8) and the gatng model parameters, we obtan the followng dscrmnant functon: f ðxþ¼ XN a y k Z ðx,xþþb: For a gven V, the gradents of the objectve functon n (8) are equal to the gradents of the objectve functon n (6) due to strong dualty, whch guarantees that, for a convex quadratc optmzaton, the dual problem has the same optmum value as ts prmal problem. These gradents are used to update the gatng model parameters at each step..3. Tranng wth alternatng optmzaton We can fnd the gradents of JðVÞ wth respect to the parameters of all three gatng models. The gradents of (8) wth respect to the parameters of the softmax gatng model (3) are ¼ U m Z h ðx 9VÞk h ðx h,xhþz j h ðx j9vþ j ¼ h ¼ ðx G ðdh Z m m ðx 9VÞÞþx G j ðdh Z m m ðx j9vþþþ ¼ U m Z h ðx 9VÞk h ðx h,xhþz j h ðx j9vþ j ¼ h ¼ ðd h Z m m ðx 9VÞþd h Z m m ðx j9vþþ where U j ¼ a a j y y j, and d h m s f m¼h and otherwse. The same gradents wth respect to the parameters of the sgmod gatng model (4) ¼ U m Z m ðx 9VÞk m ðx m ÞZ m ðx j 9VÞ j ¼ ðx G ð Z m ðx 9VÞÞþx G ð Z j m ðx ¼ U m Z m ðx 9VÞk m ðx m ÞZ m ðx j 9VÞ j ¼ ð Z m ðx 9VÞþ Z m ðx j 9VÞÞ where the gatng model parameters for a kernel functon are updated ndependently. We can also fnd the gradents wth respect to the means and the spreads of the Gaussan gatng model m m ¼ XN X j ¼ h ¼ U j Z h ðx 9VÞk h ðx h,xh j ÞZ h ðx j9vþ ððx G l m Þðdh m Z m ðx 9VÞÞþðx G j l m Þðdh m Z m ðx j9vþþþ=s m X j ¼ h ¼ U j Z h ðx 9VÞk h ðx h,xh j ÞZ h ðx j9vþ ðjx G l m J ðdh m Z m ðx 9VÞÞþJx G j l m J ðdh m Z m ðx j9vþþþ=s 3 m : The complete algorthm of our proposed LMKL s summarzed n Algorthm. revously, we used to perform a predetermned number ð8þ ð9þ

4 798 M. Gönen, E. Alpaydın / attern Recognton 46 (3) of teratons [3]; now, we calculate a step sze at each teraton usng a lne search method and catch the convergence of the algorthm by observng the change n the objectve functon value of (8). Ths allows convergng to a better soluton and hence a better learner. Our algorthm s guaranteed to converge n a fnte number of teratons. At each teraton, we pck the step sze usng a lne search method and there s no chance of ncreasng the objectve functon value. After a fnte number of teratons, our algorthm converges to one of local optma due to nonconvexty of the prmal problem n (6). Algorthm. Localzed Multple Kernel Learnng (LMKL). : Intalze V ðþ randomly : repeat 3: Calculate K ðtþ Z ¼fk Zðx,x j Þg N,j ¼ usng VðtÞ 4: Solve kernel machne wth K ðtþ Z 5: Calculate 6: Determne step sze, D ðtþ, usng a lne search method 7: V ðt þ Þ ( V ðtþ 8: untl convergence.4. Extensons to other algorthms We extend our proposed LMKL framework for two-class classfcaton [3] to other kernel-based algorthms, namely support vector regresson (SVR) [6], multclass SVM (MCSVM), and one-class SVM (OCSVM). Note that any kernel machne that has a hyperplane-based decson functon can be localzed by replacng /w,fðxþs wth Z m ðx9vþ/w m,f m ðx m ÞS and dervng the correspondng update rules..4.. Support vector regresson We can also apply the localzed kernel dea to E-tube SVR [6]. The decson functon s rewrtten as f ðxþ¼ X Z m ðx9vþ/w m,f m ðx m ÞSþb and the modfed prmal optmzaton problem s X mn: Jw m J XN þc ðx þ þx Þ w m AR Sm, n þ AR N þ, n AR N þ, þ Þ, bar VARðDG s:t: Eþx þ Zy f ðx Þ 8 Eþx Zf ðx Þ y 8 where fn þ,n g are the vectors of slack varables and E s the wdth of the regresson tube. For a gven V, the correspondng dual formulaton s max: s:t: JðVÞ¼ XN y ða þ j ¼ a Þ E XN a þ A½,CŠ N, a A½,CŠ N ða þ a Þ¼ ða þ þa Þ ða þ a Þða þ j a j Þk Zðx,x j Þ and the resultng decson functon s The same learnng algorthm gven for two-class classfcaton problems can be appled to regresson problems by smply replacng U j n gradent-descent of the gatng model (see Secton.3) wth ða þ a Þða þ a j j Þ..4.. Multclass support vector machne In a multclass classfcaton problem, a data nstance can belong to one of K classes and the class label s gven as y Af,,...,Kg. There are two basc approaches n the lterature to solve multclass problems. In the multmachne approach, the orgnal multclass problem s converted to a number of ndependent, uncoupled two-class problems. In the sngle-machne approach, the constrants due to havng multple classes are coupled n a sngle formulaton [33]. We can easly apply LMKL to the multmachne approach by solvng (8) for each two-class problem separately. In such a case, we obtan dfferent gatng models parameters and hence, dfferent kernel weghng strateges for each of the problems. Another possblty s to solve these uncoupled problems separately but learn a common gatng model; a smlar approach s used for obtanng common kernel weghts n MKL for multclass problems [9]. For the sngle-machne approach, for class l, we wrte the dscrmnant functon as follows: f l ðxþ¼ X Z m ðx9vþ/w l m,f mðx m ÞSþb l : The modfed prmal optmzaton problem s mn: X X K l ¼ Jw l XN m J þc X K x l l ¼ w l m ARSm, n l AR N þ, VARðDG þ Þ, b l AR s:t: f y ðx Þ f l ðx ÞZ x l 8ð,lay Þ x y ¼ 8: We can obtan the dual formulaton for a gven V by followng the same dervaton steps: max: s:t: JðVÞ¼ XN a l AR N þ þ X K l ¼ a l X K j ¼ l ¼ a l XN d l y A ¼ 8l ð d l y ÞC Za l Z 8ð,lÞ j ¼ d y j y A A j k Z ðx,x j Þ a l ðay j a l j Þk Zðx,x j Þ where A ¼ K l ¼ a l. The resultng dscrmnant functons that use the locally combned kernel functon are gven as f l ðxþ¼ XN U j ðd l y A a l Þk Zðx,xÞþb l : should be replaced wth ðd y j y A A j K l ¼ a l ðay j a l jþþ n learnng the gatng model parameters for multclass classfcaton problems One-class support vector machne OCSVM s a dscrmnatve method proposed for novelty detecton problems [3]. The task s to learn the smoothest hyperplane that puts most of the tranng nstances to one sde of the hyperplane whle allowng other nstances remanng on the other sde wth a cost. In the localzed verson, we rewrte the dscrmnant functon as f ðxþ¼ XN ða þ a Þk Z ðx,xþþb: f ðxþ¼ X Z m ðx9vþ/w m,f m ðx m ÞSþb,

5 M. Gönen, E. Alpaydın / attern Recognton 46 (3) and the modfed prmal optmzaton problem s X mn: Jw m J XN þc x þb w m AR Sm, nar N þ, þ Þ, bar VARðDG s:t: f ðx Þþx Z 8: For a gven V, we obtan the followng dual optmzaton problem: max: JðVÞ¼ s:t: aa½,cš N a j ¼ a a j k Z ðx,x j Þ and the resultng dscrmnant functon s f ðxþ¼ XN a k Z ðx,xþþb: In the learnng algorthm, U j should be replaced wth a a j when calculatng the gradents wth respect to the gatng model parameters. 3. Experments In ths secton, we report emprcal performance of LMKL for classfcaton and regresson problems on several data sets and compare LMKL wth SVM, SVR, and MKL (usng the lnear formulaton of [4]). We use our own mplementatons of SVM, SVR, MKL, and LMKL wrtten n MATLAB and the resultng optmzaton problems for all these methods are solved usng the MOSEK optmzaton software [6]. Except otherwse stated, our expermental methodology s as follows: A random one-thrd of the data set s reserved as the test set and the remanng two-thrds s resampled usng 5 crossvaldaton to generate ten tranng and valdaton sets, wth stratfcaton (.e., preservng class ratos) for classfcaton problems. The valdaton sets of all folds are used to optmze C by tryng values {.,.,,, } and for regresson problems, E, the wdth of the error tube. The best confguraton (measured as the hghest average classfcaton accuracy or the lowest mean square error (MSE) for regresson problems) on the valdaton folds s used to tran the fnal classfers/regressors on the tranng folds and ther performance s measured over the test set. We have test set results, and we report ther averages and standard devatons, as well as the percentage of nstances stored as support vectors and the total tranng tme (n seconds) ncludng the cross-valdaton. We use the 5 cv pared F test for comparson []. In the experments, we normalze the kernel matrces to unt dagonal before tranng. 3.. Classfcaton experments 3... Illustratve classfcaton problem In order to llustrate our proposed algorthm, we use the toy data set GAUSS4 [3] consstng of data nstances generated from four Gaussan components (two for each class) wth the followng pror probabltes, mean vectors and covarance matrces: p ¼ :5 m ¼ 3:! :8 : S ¼ þ: : : p ¼ :5 m ¼ þ:! þ: p ¼ :5 m ¼ : : p ¼ :5 m ¼ þ3: : :8 : S ¼ : : :8 : S ¼ : 4: :8 : S ¼ : 4: where data nstances from the frst two components are labeled as postve and others are labeled as negatve. Frst, we tran both MKL and LMKL wth softmax gatng to combne a lnear kernel, k L, and a second-degree polynomal kernel, k (q¼). Fg. (b) shows the classfcaton boundares calculated and the support vectors stored on one of the tranng folds by MKL that assgns combnaton weghts.3 and.68 to k L and k, respectvely. We see that usng the kernel matrx obtaned by combnng k L and k wth these weghts, we do not acheve a good approxmaton to the optmal Bayes boundary. As we see n Fg. (c), LMKL dvdes the nput space nto two regons and uses the polynomal kernel to separate one component from two others quadratcally n one regon and the lnear kernel for the other component n the other regon. We see that we get a very good approxmaton of the optmal Bayes boundary. The softmax functon n the gatng model acheves a smooth transton between the two kernels. The superorty of the localzed approach s also apparent n the smoothness of the ft that uses fewer support vectors: MKL acheves per cent average test accuracy by storng per cent of tranng nstances as support vectors, whereas LMKL acheves per cent average test accuracy by storng per cent support vectors. Wth LMKL, we can also combne multple copes of the same kernel, as shown n Fg. (d), whch shows the classfcaton and gatng model boundares of LMKL usng three lnear kernels and approxmates the optmal Bayes boundary n a pecewse lnear manner. For ths confguraton, LMKL acheves per cent average test accuracy by storng per cent support vectors. Instead of usng complex kernels such as polynomal kernels of hgh-degree or the Gaussan kernel, local combnaton of smple kernels (e.g., lnear or low-degree polynomal kernels) can produce accurate classfers and avod overfttng. Fg. shows the average test accuraces, support vector percentages, and tranng tmes wth one standard devaton for LMKL wth dfferent number of lnear kernels. We see that even f we provde more kernels than needed, LMKL uses only as many support vectors as requred and does not overft. LMKL obtans nearly the same average test accuraces and support vector percentages wth three or more lnear kernels. We also see that the tranng tme of LMKL s ncreasng lnearly wth ncreasng number of kernels Combnng multple feature representatons of benchmark data sets We compare SVM, MKL, and LMKL n terms of classfcaton performance, model complexty (.e., stored support vector percentage), and tranng tme. We tran SVMs wth lnear kernels calculated on each feature representaton separately. We also tran an SVM wth a lnear kernel calculated on the concatenaton of all feature representatons, whch s referred to as ALL. MKL and LMKL combne lnear kernels calculated on each feature representaton. LMKL uses a sngle feature representaton or the concatenaton of all feature representatons n the gatng model. We use both softmax and sgmod gatng models n our experments. We perform experments on the Multple Features (MULTIFEAT) dgt recognton data set from the UCI Machne Learnng Repostory, Avalable at Avalable at

6 8 M. Gönen, E. Alpaydın / attern Recognton 46 (3) Fg.. MKL and LMKL solutons on the GAUSS4 data set. (a) The dashed ellpses show the Gaussans from whch data are sampled and the sold lne shows the optmal Bayes dscrmnant. (b) (d) The sold lnes show the dscrmnants learned. The crcled data ponts represent the support vectors stored. For LMKL solutons, the dashed lnes shows the gatng boundares, where the gatng model outputs of neghborng kernels are equal. (a) GAUSS4 data set. (b) MKL wth (k L -k ). (c) LMKL wth (k L -k ). (d) LMKL wth (k L -k L -k L ) test accuracy support vector tranng tme Fg.. The average test accuraces, support vector percentages, and tranng tmes on the GAUSS4 data set obtaned by LMKL wth multple copes of lnear kernels and softmax gatng.

7 M. Gönen, E. Alpaydın / attern Recognton 46 (3) composed of sx dfferent data representatons for handwrtten numerals. The propertes of these feature representatons are summarzed n Table. A bnary classfcaton problem s generated from the MULTIFEAT data set to separate small ( 4 ) dgts from large ( 5 9 ) dgts. We use the concatenaton of all feature representatons n the gatng model for ths data set. Table lsts the classfcaton results on the MULTIFEAT data set obtaned by SVM, MKL, and LMKL. We see that SVM (ALL) s sgnfcantly more accurate than the best SVM wth sngle feature representaton, namely SVM (FAC), but wth a sgnfcant ncrease n the number of support vectors. MKL s as accurate as SVM (ALL) but stores sgnfcantly more support vectors. LMKL wth softmax gatng s as accurate as SVM (ALL) usng sgnfcantly fewer support vectors. LMKL wth sgmod gatng s sgnfcantly more accurate than MKL, SVM (ALL), and sngle kernel SVMs. It stores Table Multple feature representatons n the MULTIFEAT data set. Name Dmenson Data source FAC 6 rofle correlatons FOU 76 Fourer coeffcents of the shapes KAR 64 Karhunen Loeve coeffcents MOR 6 Morphologcal features IX 4 xel averages n 3 wndows ZER 47 Zernke moments Table Classfcaton results on the MULTIFEAT data set. Method Test accuracy Support vector Tranng tme (s) SVM (FAC) SVM (FOU) SVM (KAR) SVM (MOR) SVM (IX) SVM (ZER) SVM (ALL) sgnfcantly fewer support vectors than MKL and SVM (ALL), and tes wth SVM (FAC). For the MULTIFEAT data set, the average kernel weghts and the average number of actve kernels (whose gatng values are nonzero) calculated on the test set are gven n Table 3. We see that both LMKL wth softmax gatng and LMKL wth sgmod gatng use fewer kernels than MKL n the decson functon. MKL uses all kernels wth the same weght for all nputs; LMKL uses a dfferent smaller subset for each nput. By storng sgnfcantly fewer support vectors and usng fewer actve kernels, LMKL s sgnfcantly faster than MKL n the testng phase. MKL and LMKL are teratve methods and need to solve SVM problems at each teraton. LMKL also needs to update the gatng parameters and that s why t requres sgnfcantly longer tranng tmes than MKL when the dmensonalty of the gatng model representaton s hgh (649 n ths set of experments) LMKL needs to calculate the gradents of (8) wth respect to the parameters of the gatng model and to perform a lne search usng these gradents. Learnng wth sgmod gatng s faster than softmax gatng because wth the sgmod durng the gradentupdate only a sngle value s used and updatng takes OðÞ tme, whereas wth the softmax, all gatng outputs are used and updatng s Oð Þ. When learnng tme s crtcal, the tme complexty of ths step can be reduced by decreasng the dmensonalty of the gatng model representaton usng an unsupervsed dmensonalty reducton method. Note also that both the output calculatons and the gradents n separate kernels can be effcently parallelzed when parallel hardware s avalable. Instead of combnng dfferent feature representatons, we can combne multple copes of the same feature representaton wth LMKL. We combne multple copes of lnear kernels on the sngle best FAC representaton usng the sgmod gatng model on the same representaton (see Fg. 3). Even f we ncrease accuracy (not sgnfcantly) by ncreasng the number of copes of the kernels compared to SVM (FAC), we could not acheve the performance obtaned by combnng dfferent representatons wth sgmod gatng. Table 3 Average kernel weghts and number of actve kernels on the MULTIFEAT data set. MKL LMKL (softmax) LMKL (sgmod) LMKL (6 FAC and sgmod) Method FAC FOU KAR MOR IX ZER MKL LMKL (softmax) LMKL (sgmod) The average numbers of actve kernels are 6.,.43, and 5.36, respectvely. 4 test accuracy support vector tranng tme 3 Fg. 3. The average test accuraces, support vector percentages, and tranng tmes on the MULTIFEAT data set obtaned by LMKL wth multple copes of lnear kernels and sgmod gatng on the FAC representaton.

8 8 M. Gönen, E. Alpaydın / attern Recognton 46 (3) Table 4 Multple feature representatons n the ADVERT data set. Name Dmenson Data source URL 457 hrases occurrng n the URL ORIGURL 495 hrases occurrng n the URL of the mage ANCURL 47 hrases occurrng n the anchor text ALT hrases occurrng n the alternatve text CATION 9 hrases occurrng n the capton terms Table 5 Classfcaton results on the ADVERT data set. Method Test accuracy Support vector Tranng tme (s) SVM (URL) SVM (ORIGURL) SVM (ANCURL) SVM (ALT) SVM (CATION) SVM (ALL) MKL LMKL (softmax) LMKL (sgmod) LMKL (5 ANCURL and sgmod) For example, LMKL wth sgmod gatng and kernels over sx dfferent feature representatons s better than LMKL wth sgmod gatng and sx copes of the kernel over the FAC representaton n terms of both classfcaton accuracy (though not sgnfcantly) and the number of support vectors stored (sgnfcantly) (see Table ). We also see that the tranng tme of LMKL s ncreasng (though not monotoncally) wth ncreasng number of kernels. We also perform experments on the Internet Advertsements (ADVERT) data set 3 from the UCI Machne Learnng Repostory, composed of fve dfferent feature representatons (dfferent bags of words) wth some addtonal geometry nformaton of the mages, whch s gnored n our experments due to mssng values. The propertes of these feature representatons are summarzed n Table 4. The classfcaton task s to predct whether an mage s an advertsement or not. We use the CATION representaton n the gatng model due to ts lower dmensonalty compared to the other representatons. Table 5 gves the classfcaton results on the ADVERT data set obtaned by SVM, MKL, and LMKL. We see that SVM (ALL) s sgnfcantly more accurate than the best SVM wth sngle feature representaton, namely SVM (ANCURL), and uses sgnfcantly fewer support vectors. MKL has comparable classfcaton accuracy to SVM (ALL) and the dfference between the number of support vectors s not sgnfcant. LMKL wth softmax/sgmod gatng has comparable accuracy to MKL and SVM (ALL). LMKL wth sgmod gatng stores sgnfcantly fewer support vectors than SVM (ALL). The average kernel weghts and the average number of actve kernels on the ADVERT data set are gven n Table 6. The dfference between the runnng tmes of MKL and LMKL s not as sgnfcant as on the MULTIFEAT data set because the gatng model representaton (CATION) has only 9 dmensons. Dfferent from the MULTIFEAT data set, LMKL uses approxmately the same number Table 6 Average kernel weghts and number of actve kernels on the ADVERT data set. Method URL ORIGURL ANCURL ALT CATION MKL LMKL (softmax) LMKL (sgmod) The average numbers of actve kernels are 4., 4.4, and 4.96, respectvely. of or more kernels compared to MKL on ths data set. (On one of the ten folds, MKL chooses fve and on the remanng nne folds, t chooses four kernels, leadng to an average of 4..) When we combne multple copes of lnear kernels on the ANCURL representaton wth LMKL usng the sgmod gatng model on the same representaton (see Fg. 4), we see that LMKL stores much fewer support vectors compared to the sngle kernel SVM (ANCURL) wthout sacrfcng from accuracy. But, as before on the MULTIFEAT data set, we could not acheve the classfcaton accuracy obtaned by combnng dfferent representatons wth sgmod gatng. For example, LMKL wth sgmod gatng and kernels over fve dfferent feature representatons s sgnfcantly better than LMKL wth sgmod gatng and fve copes of the kernel over the ANCURL representaton n terms of classfcaton accuracy but the latter stores sgnfcantly fewer support vectors (see Table 5). We agan see that the tranng tme of LMKL s ncreasng lnearly wth ncreasng number of kernels Combnng multple nput patches for mage recognton problems For mage recognton problems, only some parts of the mages contan meanngful nformaton and t s not necessary to examne the whole mage n detal. Instead of defnng kernels over the whole nput mage, we can dvde the mage nto non-overlappng patches and use separate kernels n these patches. The kernels calculated on the parts wth relevant nformaton take nonzero weghts and the kernels over the non-relevant patches are gnored. We use a low-resoluton (smpler) verson of the mage as nput to the gatng model, whch selects a subset of the hghresoluton localzed kernels. In such a case, t s not a good dea to use softmax gatng n LMKL because softmax gatng would choose one or very few patches and a patch by tself does not carry enough dscrmnatve nformaton. We tran SVMs wth lnear kernels calculated on the whole mage n dfferent resolutons. MKL and LMKL combne lnear kernels calculated on each mage patch. LMKL uses the whole mage wth dfferent resolutons n the gatng model [4]. We perform experments on the OLIVETTI data set, 4 whch conssts of dfferent grayscale mages of 4 subjects. We construct a two-class data set by combnng male subjects (36 subjects) nto one class versus female subjects (four subjects) n another class. Our expermental methodology for ths data set s slghtly dfferent: We select two mages of each subject randomly and reserve these total 8 mages as the test set. Then, we apply 8-fold cross-valdaton on the remanng 3 mages by puttng one mage of each subject to the valdaton set at each fold. MKL and LMKL combne 6 lnear kernels calculated on mage patches of sze 6 6. Table 7 shows the results of MKL and LMKL combnng kernels calculated over non-overlappng patches of face mages. MKL acheves sgnfcantly hgher classfcaton accuracy than all sngle kernel SVMs except n 3 3 resoluton. LMKL wth softmax gatng has comparable classfcaton accuracy to MKL and stores sgnfcantly fewer support vectors when 4 4or66 mages 3 Avalable at 4 Avalable at

9 M. Gönen, E. Alpaydın / attern Recognton 46 (3) test accuracy support vector tranng tme 4 Fg. 4. The average test accuraces, support vector percentages, and tranng tmes on the ADVERT data set obtaned by LMKL wth multple copes of lnear kernels and sgmod gatng on the ANCURL representaton. Table 7 Classfcaton results on the OLIVETTI data set. Method Test accuracy Support vector Tranng tme (s) SVM (x ¼ 4 4) SVM (x ¼ 8 8) SVM (x ¼ 6 6) SVM (x ¼ 3 3) SVM (x ¼ 64 64) MKL LMKL (softmax and x G ¼ 4 4) LMKL (softmax and x G ¼ 8 8) LMKL (softmax and x G ¼ 6 6) LMKL (sgmod and x G ¼ 4 4) LMKL (sgmod and x G ¼ 8 8) LMKL (sgmod and x G ¼ 6 6) to consder only regons of hgh salency []. For example, f we use LMKL wth softmax gatng (see Fg. 5(d) (f)), the gatng model generally actvates a sngle patch contanng a part of eyes or eyebrows dependng on the subject. Ths may not be enough for good dscrmnaton and usng sgmod gatng s more approprate. When we use LMKL wth sgmod gatng (see Fg. 5(g) ()), multple patches are gven nonzero weghts n a data-dependent way. Fg. 6 gves the average kernel weghts on the test set for MKL, LMKL wth softmax gatng, and LMKL wth sgmod gatng. We see that MKL and LMKL wth softmax gatng use fewer hghresoluton patches than LMKL wth sgmod gatng. We can generalze ths dea even further: Let us say that we have a number of nformaton sources that are costly to extract or process, and a relatvely smpler one. In such a case, we can feed the smple representaton to the gatng model and feed the costly representatons to the actual kernels and tran LMKL. The gatng model then chooses a costly representaton only when t s needed and chooses only a subset of the costly representatons. Note that the representaton used by the gatng model does not need to be very precse, because t does not do the actual decson, but only chooses the representaton(s) that do the actual decson. are used n the gatng model. Ths s manly due to the normalzaton property of softmax gatng that generally actvates a sngle patch and gnores the others; ths uses fewer support vectors but s not as accurate. LMKL wth sgmod gatng sgnfcantly mproves the classfcaton accuracy over MKL by lookng at the 8 8 mages n the gatng model and choosng a subset of the hgh-resoluton patches. We see that the tranng tme of LMKL s monotoncally ncreasng wth the dmensonalty of the gatng model representaton. Fg. 5 llustrates the example uses of MKL and LMKL wth softmax and sgmod gatng. Fg. 5(b) (c) show the combnaton weghts found by MKL and sample face mages stored as support vectors weghted wth those. MKL uses the same weghts over the whole nput space and thereby the parts whose weghts are nonzero are used n the decson process for all subjects. When we look at the results of LMKL, we see that the gatng model actvates mportant parts of each face mage and these parts are used n the classfer wth nonzero weghts, whereas the parts whose gatng model outputs are zero are not consdered. That s, lookng at the output of the gatng model, we can skp processng the hgh-resoluton versons of these parts. Ths can be consdered smlar to a selectve attenton mechansm whereby the gatng model defnes a salency measure and drves a hgh-resoluton fovea / eye 3.. Regresson experments 3... Illustratve regresson problem We llustrate the applcablty of LMKL to regresson problems on the MOTORCYCLE data set dscussed n [3]. We tran LMKL wth three lnear kernels and softmax gatng (C¼ and E ¼ 6) usng -fold cross-valdaton. Fg. 7 shows the average of global and local fts obtaned for these folds. We learn a pecewse lnear ft through three local models that are obtaned usng lnear kernels n each regon and we combne them usng the softmax gatng model (shown by dashed lnes). The softmax gatng model dvdes the nput space between kernels, generally selects a sngle kernel to use, and also ensures a smooth transton between local fts Combnng multple kernels for benchmark data sets We compare SVR and LMKL n terms of regresson performance (.e., mean square error), model complexty (.e., stored support vector percentage), and tranng tme. We tran SVRs wth dfferent kernels, namely lnear kernel and polynomal kernels up to ffth degree. LMKL combnes these fve kernels wth both softmax and sgmod gatng models.

10 84 M. Gönen, E. Alpaydın / attern Recognton 46 (3) Fg. 5. Example uses of MKL and LMKL on the OLIVETTI data set. (a) F mðx m Þ: features fed nto kernels, (b) Z m : combnaton weghts, and (c) Z m F mðx m Þ: features weghted wth combnaton weghts, (d) x G : features fed nto softmax gatng model, (e) Z m ðx9vþ: softmax gatng model outputs, (f) Z m ðx9vþf mðx m Þ: features weghted wth softmax gatng model outputs, (g) x G : features fed nto sgmod gatng model, (h) Z m ðx9vþ: sgmod gatng model outputs, and () Z m ðx9vþf mðx m Þ: features weghted wth sgmod gatng model outputs. η η η 3 5 Fg. 6. Average kernel weghts on the OLIVETTI data set. (a) MKL, (b) LMKL wth softmax gatng on 6 6 resoluton, and (c) LMKL wth sgmod gatng on 8 8 resoluton. 5 k η k 3 We perform experments on the Concrete Compressve Strength (CONCRETE) dataset 5 andthewnequalty(whitewine) data set 6 from the UCI Machne Learnng Repostory. E s selected from {,, 4, 8, 6} for the CONCRETE data set and {.8,.6,.3,.64,.8} for the WHITEWINE data set. Table 8 lsts the regresson results on the CONCRETE data set obtaned by SVR and LMKL. We see that both LMKL wth softmax gatng and LMKL wth sgmod gatng are sgnfcantly more accurate than all of the sngle kernel SVRs. LMKL wth softmax gatng uses k L, k (q¼4), and k (q¼5) wth relatvely hgher weghts but LMKL wth sgmod gatng uses all of the kernels wth sgnfcant weghts (see Table 9). When we combne multple copes of the lnear kernel usng the softmax gatng model (shown n Fg. 8), we see that LMKL does not overft and we get sgnfcantly lower error than the best sngle kernel SVR (k and q¼3). For example, LMKL wth fve copes of k L and softmax gatng gets sgnfcantly lower error than SVR (k and q¼3) and stores sgnfcantly fewer support vectors. Smlar to the bnary classfcaton results, the tranng tme of LMKL s ncreasng lnearly wth ncreasng number of kernels. 5 Avalable at Strength. 6 Avalable at k Fg. 7. Global and local fts (sold lnes) obtaned by LMKL wth three lnear kernels and softmax gatng on the MOTORCYCLE data set. The dashed lnes show gatng model outputs, whch are multpled by 5 for vsual clarty. Table lsts the regresson results on the WHITEWINE data set obtaned by SVR and LMKL. We see that both LMKL wth softmax gatng and LMKL wth sgmod gatng obtan sgnfcantly less error than SVR (k L ), SVR (k and q¼), and SVR (k and q¼3), and have comparable error to SVR (k and q¼4) and SVR (k and q¼5) but store sgnfcantly fewer support vectors than all sngle kernel SVRs. Even f we do not decrease the error, we learn computatonally smpler models by storng much fewer support vectors. We see from Table that LMKL wth softmax gatng assgns relatvely hgher weghts to k L, k (q¼3), and k (q¼5), whereas LMKL wth sgmod gatng uses the polynomal kernels nearly everywhere n the nput space and the lnear kernel for some of the test nstances. k

11 M. Gönen, E. Alpaydın / attern Recognton 46 (3) Dscusson We dscuss the key propertes of the proposed method and compare t wth smlar MKL methods n the lterature. 4.. Computatonal complexty When we are tranng LMKL, we need to solve a canoncal kernel machne problem wth the combned kernel obtaned wth the current gatng model parameters and calculate the gradents of JðVÞ at each teraton. The gradents calculatons are made usng the support vectors of the current teraton. The gradent calculaton step has lower tme complexty compared to the kernel machne solver when the gatng model representaton s low-dmensonal. If we have a hgh-dmensonal gatng model representaton, we can apply an unsupervsed dmensonalty reducton method (e.g., prncpal component analyss) on ths representaton n order to decrease the tranng tme. The computatonal complexty of LMKL also depends on the complexty of the canoncal kernel machne solver used n the man loop, whch Table 8 Regresson results on the CONCRETE data set. Method MSE Support vector Tranng tme (s) SVR (k L ) SVR (k and q¼) SVR (k and q¼3) SVR (k and q¼4) SVR (k and q¼5) LMKL (softmax) LMKL (sgmod) LMKL (5 k L and softmax) can be reduced usng a hot-start procedure (.e., startng from the prevous soluton). The number of teratons before convergence clearly depends on the tranng data and the step sze selecton procedure. The key ssue for faster convergence s to select good gradent-descent step szes at each teraton. The step sze of each teraton should be determned wth a lne search method (e.g., Armjo s rule whose search procedure allows backtrackng and does not use any curve fttng method), whch requres solvng addtonal kernel machne problems. Clearly, the tme complexty for each teraton ncreases but the algorthm converges n fewer teratons. In practce, we see convergence n 5 teratons. One man advantage of LMKL s n reducng the tme complexty for the testng phase as a result of localzaton. When calculatng the locally combned kernel functon, k Z ðx,xþ, n(9), k m ðx m,x m Þ needs to be evaluated or calculated only f both Z m ðx Þ and Z m ðxþ are actve (.e., nonzero). 4.. Knowledge extracton The kernel weghts obtaned by MKL can be used to extract knowledge about the relatve contrbutons of kernel functons used n combnaton. Dfferent kernels defne dfferent smlarty mea- Table Regresson results on the WHITEWINE data set. Method MSE Support vector Tranng tme (s) SVR (k L ) SVR (k and q¼) SVR (k and q¼3) SVR (k and q¼4) SVR (k and q¼5) LMKL (softmax) LMKL (sgmod) Table 9 Average kernel weghts and number of actve kernels on the CONCRETE data set. Table Average kernel weghts and number of actve kernels on the WHITEWINE data set. Method k L k k k k q¼ q¼3 q¼4 q¼5 LMKL (softmax) LMKL (sgmod) The average numbers of actve kernels are 4.5 and 4.68, respectvely. Method k L k k k k q¼ q¼3 q¼4 q¼5 LMKL (softmax) LMKL (sgmod) The average numbers of actve kernels are.5 and 4.58, respectvely. 5 test error 5 support vector 5 6 tranng tme 4 Fg. 8. The average test mean square errors, support vector percentages, and tranng tmes on the CONCRETE data set obtaned by LMKL wth multple copes of lnear kernels and softmax gatng.

12 86 M. Gönen, E. Alpaydın / attern Recognton 46 (3) sures and we can deduce whch smlarty measures are approprate for the task at hand. If kernel functons are evaluated over dfferent feature subsets or feature representatons, the mportant ones have hgher combnaton weghts. Wth our LMKL framework, we can extract smlar nformaton for dfferent regons of the nput space. Ths enables us to extract nformaton about kernels (smlarty measures), feature subsets, and/or feature representatons n a data-dependent manner Regularzaton Canoncal kernel machnes learn sparse models as a result of regularzaton on the weght vector but the underlyng complexty of the kernel functon s the man factor for determnng the model complexty. The man advantage of LMKL n terms of regularzaton over canoncal kernel machnes s the nherent regularzaton effect on the gatng model. When we regularze the sum of the hyperplane weght vectors n (6), because these weght vectors are wrtten n terms of the gatng model as n (7), we also regularze the gatng model as a sde effect. MKL can combne only dfferent kernel functons and more complex kernels are favored over the smpler ones n order to get better performance. However, LMKL can also combne multple copes of the same kernel and t can dynamcally construct a more complex locally combned kernel usng the kernels n a data-dependent way. LMKL elmnates some of the kernels by assgnng zero weghts to the correspondng gatng outputs n order to get a more regularzed soluton. Fgs. 4 and 8 gve emprcal support to ths regularzaton effect, where we see that LMKL does not overft even f we ncrease the number of kernels up to Dmensonalty reducton The localzed kernel dea can also be combned wth dmensonalty reducton. If the tranng nstances have a local structure (.e., le on low-dmensonal manfolds locally), we can learn lowdmensonal local projectons n each regon, whch we can also use for vsualzaton. revously, t had been proposed to ntegrate a projecton matrx nto the dscrmnant functon [6] and we extended ths dea to project data nstances nto dfferent feature spaces usng local projecton matrces combned wth a gatng model, and calculate the combned kernel functon wth the dot product n the combned feature space [7]. The local projecton matrces can be learned wth the other parameters, as before, usng a two-step alternatng optmzaton algorthm Related work LMKL fnds a nonlnear combnaton of kernel functons wth the help of the gatng model. The dea of learnng a nonlnear combnaton s also dscussed n dfferent studes. For example, a latent varable generatve model usng the maxmum entropy dscrmnaton to learn data-dependent kernel combnaton weghts s proposed n [3]. Ths method combnes a generatve probablstc model wth a dscrmnatve large margn method usng a log-rato of Gaussan mxtures as the classfer. In a more recent work, a nonlnear kernel combnaton method based on kernel rdge regresson and polynomal combnaton of kernels s proposed [8] k Z ðx,x j Þ¼ X q A QZ q...zq k ðx,x j Þq...k ðx,x j Þq where Q ¼fq : qaz þ, q m ¼ dg and the kernel weghts are optmzed over a postve, bounded, and convex set usng a projecton-based gradent-descent algorthm. Smlar to LMKL, a Bayesan approach s developed for combnng dfferent feature representatons n a data-dependent way under the Gaussan process framework [7]. A common covarance functon s obtaned by combnng the covarances of feature representatons n a nonlnear manner. Ths formulaton can dentfy the nosy data nstances for each feature representaton and prevent them from beng used. Classfcaton s performed usng the standard Gaussan processes approach wth the common covarance functon. Inspred from LMKL, two methods that learn a data-dependent kernel functon are used for mage recognton applcatons [34,35]; they dffer n ther gatng models that are constants rather than functons of the nput. In [34], the tranng set s dvded nto clusters as a preprocessng step and then cluster-specfc kernel weghts are learned usng an alternatng optmzaton method. The combned kernel functon can be wrtten as k Z ðx,x j Þ¼ X Z m c k m ðx m ÞZ m c j where Z m c corresponds to the weght of kernel k m ð,þ n the cluster x belongs to. The kernel weghts of the cluster whch a test nstance s assgned to are used n the testng phase. In [35], nstance-specfc kernel weghts are used nstead of cluster-specfc weghts. The correspondng combned kernel functon s k Z ðx,x j Þ¼ X Z m k m ðx m ÞZ m j where Z m corresponds to the weght of kernel k m ð,þ for x and nstance-specfc weghts are optmzed usng an alternatng optmzaton problem for the tranng set. But, n the testng phase, the kernel weghts for a test nstance are all taken to be equal. 5. Conclusons Ths work ntroduces a localzed multple kernel learnng framework for kernel-based algorthms. The proposed algorthm has two man ngredents: () a gatng model that assgns weghts to kernels for a data nstance, () a kernel-based learnng algorthm wth the locally combned kernel. The tranng of these two components s coupled and the parameters of both components are optmzed together usng a two-step alternatng optmzaton procedure. We derve the learnng algorthm for three dfferent gatng models (softmax, sgmod, and Gaussan) and apply the localzed multple kernel learnng framework to four dfferent machne learnng problems (two-class classfcaton, regresson, multclass classfcaton, and one-class classfcaton). We perform experments for several two-class classfcaton and regresson problems. We compare the emprcal performance of LMKL wth sngle kernel SVM and SVR as well as MKL. For classfcaton problems defned on dfferent feature representatons, LMKL s able to construct better classfers than MKL by combnng the kernels on these representatons locally. In our experments, LMKL acheves hgher average test accuraces and stores fewer support vectors compared to MKL. If the combned feature representatons are complementary and do not contan redundant nformaton, the sgmod gatng model should be selected nstead of softmax gatng, n order to have the possblty of usng more than one representaton. We also see that, as expected, combnng heterogeneous feature representatons s more advantageous than combnng multple copes of the same representaton. For mage recognton problems, LMKL dentfes the relevant parts of each nput mage separately usng the gatng model as a salency detector on the kernels on the mage patches, and we see that LMKL obtans better classfcaton results than

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

DIMENSIONALITY reduction algorithms try to find lowdimensional

DIMENSIONALITY reduction algorithms try to find lowdimensional IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 10, OCTOBER 2013 2381 Supervsed Multple Kernel Embeddng for Learnng Predctve Subspaces Mehmet Gönen Abstract For supervsed learnng problems,

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines. CS534 - Machine Learning Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Discriminative classifiers for object classification. Last time

Discriminative classifiers for object classification. Last time Dscrmnatve classfers for object classfcaton Thursday, Nov 12 Krsten Grauman UT Austn Last tme Supervsed classfcaton Loss and rsk, kbayes rule Skn color detecton example Sldng ndo detecton Classfers, boostng

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Human Face Recognition Using Generalized. Kernel Fisher Discriminant Human Face Recognton Usng Generalzed Kernel Fsher Dscrmnant ng-yu Sun,2 De-Shuang Huang Ln Guo. Insttute of Intellgent Machnes, Chnese Academy of Scences, P.O.ox 30, Hefe, Anhu, Chna. 2. Department of

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A Robust LS-SVM Regression

A Robust LS-SVM Regression PROCEEDIGS OF WORLD ACADEMY OF SCIECE, EGIEERIG AD ECHOLOGY VOLUME 7 AUGUS 5 ISS 37- A Robust LS-SVM Regresson József Valyon, and Gábor Horváth Abstract In comparson to the orgnal SVM, whch nvolves a quadratc

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming CEE 60 Davd Rosenberg p. LECTURE NOTES Dualty Theory, Senstvty Analyss, and Parametrc Programmng Learnng Objectves. Revew the prmal LP model formulaton 2. Formulate the Dual Problem of an LP problem (TUES)

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Histogram of Template for Pedestrian Detection

Histogram of Template for Pedestrian Detection PAPER IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85-A/B/C/D, No. xx JANUARY 20xx Hstogram of Template for Pedestran Detecton Shaopeng Tang, Non Member, Satosh Goto Fellow Summary In

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Machine Learning. K-means Algorithm

Machine Learning. K-means Algorithm Macne Learnng CS 6375 --- Sprng 2015 Gaussan Mture Model GMM pectaton Mamzaton M Acknowledgement: some sldes adopted from Crstoper Bsop Vncent Ng. 1 K-means Algortm Specal case of M Goal: represent a data

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

General Vector Machine. Hong Zhao Department of Physics, Xiamen University General Vector Machne Hong Zhao (zhaoh@xmu.edu.cn) Department of Physcs, Xamen Unversty The support vector machne (SVM) s an mportant class of learnng machnes for functon approach, pattern recognton, and

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

A Multivariate Analysis of Static Code Attributes for Defect Prediction

A Multivariate Analysis of Static Code Attributes for Defect Prediction Research Paper) A Multvarate Analyss of Statc Code Attrbutes for Defect Predcton Burak Turhan, Ayşe Bener Department of Computer Engneerng, Bogazc Unversty 3434, Bebek, Istanbul, Turkey {turhanb, bener}@boun.edu.tr

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Margin-Constrained Multiple Kernel Learning Based Multi-Modal Fusion for Affect Recognition

Margin-Constrained Multiple Kernel Learning Based Multi-Modal Fusion for Affect Recognition Margn-Constraned Multple Kernel Learnng Based Mult-Modal Fuson for Affect Recognton Shzh Chen and Yngl Tan Electrcal Engneerng epartment The Cty College of New Yor New Yor, NY USA {schen, ytan}@ccny.cuny.edu

More information

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers

Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers Effcent Dstrbuted Lnear Classfcaton Algorthms va the Alternatng Drecton Method of Multplers Caoxe Zhang Honglak Lee Kang G. Shn Department of EECS Unversty of Mchgan Ann Arbor, MI 48109, USA caoxezh@umch.edu

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information