Support Vector Machines. CS534 - Machine Learning

Support Vector Machnes CS534 - Machne Learnng

Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b)

Lnear Separators Whch of the lnear separators s optmal?

Intuton of Margn Consder ponts A, B, and C We are qute confdent n our predcton for A because t s far from the decson boundar. In contrast, e are not so confdent n our predcton for C because a slght change n the decson boundar ma flp the decson. A B C Gven a tranng set, e ould lke to make all predctons correct and confdent! Ths leads to the concept of margn.

Functonal Margn Gven a lnear classfer parameterzed b (, b), e defne ts functonal margn.r.t tranng eample (, ) as: If e rescale (, b) b a factor, functonal margn gets multpled b e can make t arbtrarl large thout change anthng meanngful Instead, e ll look at geometrc margn

Geometrc Margn The geometrc margn of (, b).r.t. () s the dstance from () to the decson surface Ths dstance can be computed as ( b) γ mn γ L ( ) B C A γ A Gven tranng set S{(, ):,, }, the geometrc margn of the classfer.r.t. S s γ Ponts closest to the boundar are called Support vectors e ll see that these are the ponts that reall matters

Mamum Margn Classfer Gven a lnearl separable tranng set S{( (), () ):,, }, e ould lke to fnd a lnear classfer th mamum margn. Ths can be represented as an optmzaton problem. maγ, b, γ subect to : b) Let γ γ, ths s equvalent to () ( () γ,, L, ast optmzaton problem! Let s make t look ncer! ma, b, γ ' γ ' subect to : ( b) γ ',, L,

Mamum Margn Classfer ote that rescalng and b b (/γ ) ll not change the classfer, e can thus further reformulate the optmzaton problem ma, b γ ' subect to : ( b) γ ',, L, ma, b subect to : (or equvalentl mn (, b b), 2 ), L, Mamzng the geometrc margn s equvalent to mnmzng the magntude of subect to mantanng a functonal margn of at least

Solvng the Optmzaton Problem 2 mn, b 2 subect to : ( b) Ths results n a quadratc optmzaton problem th lnear nequalt constrants. Ths s a ell-knon class of mathematcal programmng problems for hch several (non-trval) algorthms est. One could solve for usng an of these methods We ll see that t s useful to frst formulate an equvalent dual optmzaton problem and solve t nstead Ths requres a bt of machner,, L,

Asde: Constraned Optmzaton To solve the follong optmzaton problem Consder the follong functon knon as the Lagrangan Under certan condtons t can be shon that for a soluton to the above problem e have Prmal form Dual form

Back to the Orgnal Problem The Lagrangan s We ant to solve Settng the gradent of.r.t. and b to zero, e have b,, 0, ) ( - : subect to L 0 subect to )}, ( { 2 ),, ( b b L 0 0

The Dual Problem If e substtute to, e have ote that Ths s a functon of onl > < > < > < b b L 2 2 } ) ( { 2 ) ( 0

The Dual Problem The ne obectve functon s n terms of onl It s knon as the dual problem: f e kno all, e kno The orgnal problem s knon as the prmal problem The obectve functon of the dual problem needs to be mamzed! The dual problem s therefore: ma L( ) < > 2 subect to 0,,..., n, 0 Propertes of hen e ntroduce the Lagrange multplers The result hen e dfferentate the orgnal Lagrangan.r.t. b

The Dual Problem Ths s also quadratc programmng (QP) problem A global mamum of can alas be found can be recovered b b can also be recovered as ell (at for a bt) > < n L 0,,..., 0, subect to 2 ) ( ma

Characterstcs of the Soluton Man of the are zero s a lnear combnaton of onl a small number of data ponts In fact, optmzaton theor requres that the soluton to satsf the follong KKT condtons: 0, ( { (,..., n, < < > b) > b) -} Functonal margn th non-zero are called support vectors (SV) The decson boundar s determned onl b the SV Let t (,..., s) be the ndces of the s support vectors. We can s rte t t t 0 s nonzero onl hen functonal margn

Solve for b ote that e kno that for support vectors the functonal margn We can use ths nformaton to solve for b We can use an support vector to acheve ths ( s t t < t > b) A numercall more stable soluton s to use all support vectors (detals n the book)

Classfng ne eamples For classfng th a ne nput z T b s Compute t and classf z > b as postve f the sum s postve, and negatve otherse ote: need not be formed eplctl, rather e can classf z b takng a eghted sum of the nner products th the support vectors (useful hen e generalze from nner product to kernel functons later) t < t

The Quadratc Programmng Problem Man approaches have been proposed Loqo, cple, etc. (see http://.numercal.rl.ac.uk/qp/qp.html) Most are nteror-pont methods Start th an ntal soluton that can volate the constrants Improve ths soluton b optmzng the obectve functon and/or reducng the amount of constrant volaton For SVM, sequental mnmal optmzaton (SMO) seems to be the most popular A QP th to varables s trval to solve Each teraton of SMO pcks a par of (, ) and solve the QP th these to varables; repeat untl convergence In practce, e can ust regard the QP solver as a black-bo thout botherng ho t orks

A Geometrcal Interpretaton Class 2 8 0.6 0 0 5 0 7 0 2 0 4 0 9 0 Class 3 0 6.4 0.8