Classification / Regression Support Vector Machines

Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04

Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM classfers for nonlnear decson boundares kernel functons Other applcatons of SVMs Software Jeff Howbert Introducton to Machne Learnng Wnter 04

Lnearly separable classes Goal: fnd a lnear decson boundary (hyperplane) that separates the classes Jeff Howbert Introducton to Machne Learnng Wnter 04 3

One possble soluton Jeff Howbert Introducton to Machne Learnng Wnter 04 4

Another possble soluton Jeff Howbert Introducton to Machne Learnng Wnter 04 5

Other possble solutons Jeff Howbert Introducton to Machne Learnng Wnter 04 6

Whch one s better? B or B? How do you defne better? Jeff Howbert Introducton to Machne Learnng Wnter 04 7

Hyperplane that mamzes the margn wll have better generalzaton > B s better than B Jeff Howbert Introducton to Machne Learnng Wnter 04 8

B test sample B b b margn b Hyperplane that mamzes the margn wll have better generalzaton > B s better than B Jeff Howbert Introducton to Machne Learnng Wnter 04 9 b

B test sample B b b margn b Hyperplane that mamzes the margn wll have better generalzaton > B s better than B Jeff Howbert Introducton to Machne Learnng Wnter 04 0 b

B W w + b 0 w + b w + b + b y + f w + b f ( ) f w + b Jeff Howbert Introducton to Machne Learnng Wnter 04 b margn w

We want to mamze: margn w Whch s equvalent to mnmzng: L( w ) w But subject to the followng constrants: y f ( ) + Ths s a constraned conve optmzaton problem Solve wth numercal approaches, e.g. quadratc programmng Jeff Howbert Introducton to Machne Learnng Wnter 04 f f w w + + b b

Jeff Howbert Introducton to Machne Learnng Wnter 04 3 Solvng for w that gves mamum margn:. Combne objectve functon and constrants nto new objectve functon, usng Lagrange multplers λ. To mnmze ths Lagrangan, we take dervatves of w and b and set them to 0: Support vector machnes ( ) + N prmal b y L ) ( w w λ N p N p y b L y L 0 0 0 λ λ w w

Solvng for w that gves mamum margn: 3. Substtutng and rearrangng gves the dual of the Lagrangan: L N λ λ λ y y dual whch we try to mamze (not mnmze). 4. Once we have the λ, we can substtute nto prevous equatons to get w and b. 5. Ths defnes w and b as lnear combnatons of the tranng data., j j j j Jeff Howbert Introducton to Machne Learnng Wnter 04 4

Optmzng the dual s easer. Functon of λ only, not λ and w. Conve optmzaton guaranteed to fnd global optmum. Most of the λ go to zero. The for whch λ 0 are called the support vectors. These support (le on) the margn boundares. The for whch λ 0 le away from the margn boundares are not requred for defnng the mamum margn hyperplane. Jeff Howbert Introducton to Machne Learnng Wnter 04 5

Eample of solvng for mamum margn hyperplane Jeff Howbert Introducton to Machne Learnng Wnter 04 6

What f the classes are not lnearly separable? Jeff Howbert Introducton to Machne Learnng Wnter 04 7

Now whch one s better? B or B? How do you defne better? Jeff Howbert Introducton to Machne Learnng Wnter 04 8

Jeff Howbert Introducton to Machne Learnng Wnter 04 9 + + + + + b b f y ξ ξ f f ) ( w w What f the problem s not lnearly separable? Soluton: ntroduce slack varables Need to mnmze: Subject to: C s an mportant hyperparameter, whose value s usually optmzed by cross-valdaton. + N k C L ) ( ξ w w Support vector machnes

Slack varables for nonseparable data Jeff Howbert Introducton to Machne Learnng Wnter 04 0

What f decson boundary s not lnear? Jeff Howbert Introducton to Machne Learnng Wnter 04

Soluton: nonlnear transform of attrbutes Φ :[, ] [,( + ) 4 ] Jeff Howbert Introducton to Machne Learnng Wnter 04

Soluton: nonlnear transform of attrbutes Φ :[, ] [( ),( )] Jeff Howbert Introducton to Machne Learnng Wnter 04 3

Issues wth fndng useful nonlnear transforms Not feasble to do manually as number of attrbutes grows (.e. any real world problem) Usually nvolves transformaton to hgher dmensonal space ncreases computatonal burden of SVM optmzaton curse of dmensonalty Wth SVMs, can crcumvent all the above va the kernel trck Jeff Howbert Introducton to Machne Learnng Wnter 04 4

Kernel trck Don t need to specfy the attrbute transform Φ( ) Only need to know how to calculate the dot product of any two transformed samples: k(, ) Φ( ) Φ( ) Jeff Howbert Introducton to Machne Learnng Wnter 04 5

Kernel trck (cont d.) The kernel functon k(, ) s substtuted nto the dual of the Lagrangan, allowng determnaton of a mamum margn hyperplane n the (mplctly) transformed space Φ( ): N L dual λ λλ j y y jφ( ) Φ( j ), j N λ λλ j y y j k(, j ), j All subsequent calculatons, ncludng predctons on test samples, are done usng the kernel n place of Φ( ) Φ( ) Jeff Howbert Introducton to Machne Learnng Wnter 04 6

Jeff Howbert Introducton to Machne Learnng Wnter 04 7 Common kernel functons for SVM lnear polynomal Gaussan or radal bass sgmod Support vector machnes ( ) ) tanh( ), ( ep ), ( ) ( ), ( ), ( c k k c k k d + + γ γ γ

For some kernels (e.g. Gaussan) the mplct transform Φ( ) s nfnte-dmensonal! But calculatons wth kernel are done n orgnal space, so computatonal burden and curse of dmensonalty aren t a problem. Jeff Howbert Introducton to Machne Learnng Wnter 04 8

Jeff Howbert Introducton to Machne Learnng Wnter 04 9

Applcatons of SVMs to machne learnng Classfcaton bnary multclass one-class Regresson Transducton (sem-supervsed learnng) Rankng Clusterng Structured labels Jeff Howbert Introducton to Machne Learnng Wnter 04 30

Software SVM lght http://svmlght.joachms.org/ lbsvm http://www.cse.ntu.edu.tw/~cjln/lbsvm/ ncludes MATLAB / Octave nterface MATLAB svmtran / svmclassfy only supports bnary classfcaton Jeff Howbert Introducton to Machne Learnng Wnter 04 3

Onlne demos Support vector machnes http://cs.stanford.edu/people/karpathy/svmjs/demo/ Jeff Howbert Introducton to Machne Learnng Wnter 04 3