Support Vector Machines

Similar documents
Support Vector Machines

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Support Vector Machines

Classification / Regression Support Vector Machines

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Announcements. Supervised Learning

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Support Vector Machines. CS534 - Machine Learning

GSLM Operations Research II Fall 13/14

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Edge Detection in Noisy Images Using the Support Vector Machines

Using Neural Networks and Support Vector Machines in Data Mining

Smoothing Spline ANOVA for variable screening

Machine Learning 9. week

Discriminative classifiers for object classification. Last time

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

The Research of Support Vector Machine in Agricultural Data Classification

Feature Reduction and Selection

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Classifier Selection Based on Data Complexity Measures *

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Machine Learning: Algorithms and Applications

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Parallelism for Nested Loops with Non-uniform and Flow Dependences

INF 4300 Support Vector Machine Classifiers (SVM) Anne Solberg

Lecture 5: Multilayer Perceptrons

S1 Note. Basis functions.

Solving two-person zero-sum game by Matlab

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems

Discriminative Dictionary Learning with Pairwise Constraints

Face Recognition Based on SVM and 2DPCA

Data Mining: Model Evaluation

LECTURE : MANIFOLD LEARNING

Multi-stable Perception. Necker Cube

Cluster Analysis of Electrical Behavior

TN348: Openlab Module - Colocalization

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Machine Learning. Topic 6: Clustering

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

Face Recognition Method Based on Within-class Clustering SVM

SVM-based Learning for Multiple Model Estimation

Lecture 4: Principal components

Hierarchical clustering for gene expression data analysis

Efficient Text Classification by Weighted Proximal SVM *

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 5 Luca Trevisan September 7, 2017

Collaboratively Regularized Nearest Points for Set Based Recognition

An Optimal Algorithm for Prufer Codes *

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE

Unsupervised Learning

A Robust LS-SVM Regression

CS 534: Computer Vision Model Fitting

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Network Intrusion Detection Based on PSO-SVM

Categorizing objects: of appearance

Polyhedral Compilation Foundations

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

5 The Primal-Dual Method

The Study of Remote Sensing Image Classification Based on Support Vector Machine

CMPSCI 670: Computer Vision! Object detection continued. University of Massachusetts, Amherst November 10, 2014 Instructor: Subhransu Maji

Lecture #15 Lecture Notes

Computer Vision. Pa0ern Recogni4on Concepts Part II. Luis F. Teixeira MAP- i 2012/13

CLASSIFICATION OF ULTRASONIC SIGNALS

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Recognizing Faces. Outline

An AAM-based Face Shape Classification Method Used for Facial Expression Recognition

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Unsupervised Learning and Clustering

Image Alignment CSC 767

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Evolutionary Support Vector Regression based on Multi-Scale Radial Basis Function Kernel

Flatten a Curved Space by Kernel: From Einstein to Euclid

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

INF Repetition Anne Solberg INF

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Classifying Acoustic Transient Signals Using Artificial Intelligence

A Binarization Algorithm specialized on Document Images and Photos

On Multiple Kernel Learning with Multiple Labels

Solving the SVM Problem. Christopher Sentelle, Ph.D. Candidate L-3 CyTerra Corporation

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

11. APPROXIMATION ALGORITHMS

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints

Hermite Splines in Lie Groups as Products of Geodesics

Semi-Supervised Discriminant Analysis Based On Data Structure

Optimizing Document Scoring for Query Retrieval

Adaptive Virtual Support Vector Machine for the Reliability Analysis of High-Dimensional Problems

APPLICATION OF A SUPPORT VECTOR MACHINE FOR LIQUEFACTION ASSESSMENT

An Anti-Noise Text Categorization Method based on Support Vector Machines *

Applications of Support Vector Machines for Pattern Recognition: A Survey

Transcription:

Support Vector Machnes Some sldes adapted from Alfers & Tsamardnos, Vanderblt Unversty http://dscover1.mc.vanderblt.edu/dscover/publc/ml_tutoral_ol d/ndex.html Rong Jn, Language Technology Insttute www.contrb.andrew.cmu.edu/~jn/r_proj/svm.ppt ABDBM Ron Shamr 1

Support Vector Machnes Decson surface: a hyperplane n feature space One of the most mportant tools n the machne learnng toolbox In a nutshell: map the data to a predetermned very hghdmensonal space va a kernel functon Fnd the hyperplane that maxmzes the margn between the two classes If data are not separable - fnd the hyperplane that maxmzes the margn and mnmzes the (weghted average of the) msclassfcatons ABDBM Ron Shamr 2

Support Vector Machnes Three man deas: 1. Defne what an optmal hyperplane s (takng nto account that t needs to be computed effcently): maxmze margn 2. Generalze to non-lnearly separable problems: have a penalty term for msclassfcatons 3. Map data to hgh dmensonal space where t s easer to classfy wth lnear decson surfaces: reformulate problem so that data are mapped mplctly to ths space ABDBM Ron Shamr 3

Support Vector Machnes Three man deas: 1. Defne what an optmal hyperplane s (takng nto account that t needs to be computed effcently): maxmze margn 2. Generalze to non-lnearly separable problems: have a penalty term for msclassfcatons 3. Map data to hgh dmensonal space where t s easer to classfy wth lnear decson surfaces: reformulate problem so that data are mapped mplctly to ths space ABDBM Ron Shamr 4

Whch Separatng Hyperplane Var 1 to Use? ABDBM Ron Shamr Var 2 5

Maxmzng the Margn Var 1 IDEA 1: Select the separatng hyperplane that maxmzes the margn! Margn Wdth Margn Wdth ABDBM Ron Shamr Var 2 6

Support Vectors Var 1 Support Vectors Margn Wdth ABDBM Ron Shamr Var 2 7

Settng Up the Optmzaton Problem Var 1 The wdth of the margn s: 2 k w w x b k k k w x w x b k b 0 Var 2 w So, the problem s: 2 k max w s. t. ( w x b) k, x of class 1 ( w x b) k, x of class 2 ABDBM Ron Shamr 8

Settng Up the Optmzaton Problem Var 1 Scalng w, b so that k=1, the problem becomes: w w x b 1 1 1 w x w x b 1 b 0 Var 2 2 max w s. t. ( w x b) 1, x of class 1 ( w x b) 1, x of class 2 ABDBM Ron Shamr 9

Settng Up the Optmzaton Problem If class 1 corresponds to 1 and class 2 corresponds to -1, we can rewrte ( w x b) 1, x wth y 1 ( w x b) 1, x wth y 1 as y ( w x b) 1, x So the problem becomes: max 2 w s. t. y ( w x b) 1, x or 1 2 mn w 2 s. t. y ( w x b) 1, x ABDBM Ron Shamr 10

Lnear, Hard-Margn SVM Formulaton Fnd w,b that solve 1 2 mn w 2 s. t. y ( w x b) 1, x Quadratc program: quadratc objectve, lnear (n)equalty constrants Problem s convex there s a unque global mnmum value (when feasble) There s also a unque mnmzer,.e. w and b values that provde the mnmum No soluton f the data are not lnearly separable Objectve s PD polynomal-tme soln Very effcent soln wth modern optmzaton software (handles 1000s of constrants and tranng nstances). ABDBM Ron Shamr 11

Lagrange multplers Mnmze s. t. 1 w, b, ( w) 0,..., 0 l 1 2 w 2 y ( w x b) Convex quadratc programmng problem Dualty theory apples! ABDBM Ron Shamr 12

13 Dual Space Dual Problem Representaton for w Decson functon j j j n l y y D y y y D F x x y Λ Λ y Λ Λ Λ Λ Λ ),,...,, ( ),,...,, ( where 0 0 subject to 2 1 1 ) ( Maxmze 2 1 2 1 ) ) ( ( ) ( 1 l y b sgn f x x x l y 1 x w ABDBM Ron Shamr

Comments Representaton of vector w Lnear combnaton of examples x # parameters = # examples : the mportance of each examples Only the ponts closest to the bound have 0 Core of the algorthm: xx Both matrx D and decson functon requre the knowledge of xx (More on ths soon) w l D j 1 y y y j x x x j ABDBM Ron Shamr 14

Support Vector Machnes Three man deas: 1. Defne what an optmal hyperplane s (takng nto account that t needs to be computed effcently): maxmze margn 2. Generalze to non-lnearly separable problems: have a penalty term for msclassfcatons 3. Map data to hgh dmensonal space where t s easer to classfy wth lnear decson surfaces: reformulate problem so that data are mapped mplctly to ths space ABDBM Ron Shamr 15

Non-Lnearly Separable Data Var 1 Introduce slack varables w j w x b 1 Allow some nstances to fall wthn the margn, but penalze them w x b 1 1 1 w x b 0 Var 2 ABDBM Ron Shamr 16

Formulatng the Optmzaton Problem Constrant becomes : Var 1 y ( w x b) 1, x 0 w w x b 1 ABDBM Ron Shamr j 1 1 w x w x b 1 b 0 Var 2 Objectve functon penalzes for msclassfed nstances and those wthn the margn 1 mn 2 C trades-off margn wdth & msclassfcatons 2 w C 17

Lnear, Soft-Margn SVMs 1 mn 2 2 w C y ( w x b) 1, x 0 Algorthm tres to keep at zero whle maxmzng margn Alg does not mnmze the no. of msclassfcatons (NP-complete problem) but the sum of dstances from the margn hyperplanes Other formulatons use 2 nstead C: penalty for msclassfcaton As C, we get closer to the hard-margn soluton ABDBM Ron Shamr 18

19 Dual Space j j j n l y y D y y y C D F x x y Λ 1 Λ Λ y Λ Λ Λ Λ Λ ),,...,, ( ),,...,, ( where 0 0 subject to 2 1 1 ) ( Maxmze 2 1 2 1 ) ) ( ( ) ( 1 l y b sgn f x x x l y 1 x w Dual Problem Only dfference: upper bound C on Representaton for w Decson functon ABDBM Ron Shamr

Param C Comments Controls the range of avods over emphaszng some examples (C - ) = 0 ( complementary slackness ) C can be extended to be case-dependent Weght < C = 0 -th example s correctly classfed not qute mportant = C can be nonzero -th tranng example may be msclassfed very mportant ABDBM Ron Shamr 20

Robustness of Soft vs Hard Margn SVMs Var 1 Var 1 Soft Margn SVM w x b 0 Var 2 w x b 0 Hard Margn SVM Var 2 ABDBM Ron Shamr 21

Soft vs Hard Margn SVM Soft-Margn always has a soluton Soft-Margn s more robust to outlers Smoother surfaces (n the non-lnear case) Hard-Margn does not requre to guess the cost parameter (requres no parameters at all) ABDBM Ron Shamr 22

Support Vector Machnes Three man deas: 1. Defne what an optmal hyperplane s (takng nto account that t needs to be computed effcently): maxmze margn 2. Generalze to non-lnearly separable problems: have a penalty term for msclassfcatons 3. Map data to hgh dmensonal space where t s easer to classfy wth lnear decson surfaces: reformulate problem so that data are mapped mplctly to ths space ABDBM Ron Shamr 23

Dsadvantages of Lnear Decson Surfaces Var 1 ABDBM Ron Shamr Var 2 24

Advantages of Non-Lnear Surfaces Var 1 ABDBM Ron Shamr Var 2 25

Lnear Classfers n Hgh- Dmensonal Spaces Var 1 Constructed Feature 2 Var 2 Constructed Feature 1 Fnd functon (x) to map to a dfferent space ABDBM Ron Shamr 26

Mappng Data to a Hgh- Dmensonal Space Fnd functon (x) to map to a dfferent space, then SVM formulaton becomes: 1 mn 2 2 w C s. t. y ( w( x ) b) 1, x 0 Data appear as (x), weghts w are now weghts n the new space Explct mappng expensve f (x) s very hgh dmensonal Can we solve the problem wthout explctly mappng the data? ABDBM Ron Shamr 27

The Dual of the SVM Formulaton Orgnal SVM formulaton n nequalty constrants n postvty constrants n number of varables The (Wolfe) dual of ths problem one equalty constrant n postvty constrants n number of varables (Lagrange multplers) Objectve functon more complcated But: Data only appear as (x ) (x j ) 1 mn 2 s. t. 0 1 mn w, b 2 a, j y ( w ( x) b) 1, x s t. w y 2. C C 0, x y 0 j y j ( ( x ) ( x j )) ABDBM Ron Shamr 28

The Kernel Trck (x ) t (x j ) means: map data nto new space, then take the nner product of the new vectors Suppose we can fnd a functon such that: K(x, x j ) = (x ) t (x j ),.e., K s the nner product of the mages of the data For tranng, no need to explctly map the data nto the hgh-dmensonal space to solve the optmzaton problem How do we classfy wthout explctly mappng the new nstances? Turns out sgn( wx b) sgn( where b solves ( y for any j wth j j 0 j y K( x y K( x, x) b), x j ) b 1) 0, ABDBM Ron Shamr 29

Examples of Kernels Assume we measure x 1,x 2 mappng: Consder the functon: Then: and we use the : x, x { x, x, 2 x x, 2 x, 2 x,1} 2 2 1 2 2 1 2 1 2 φ x t φ z = x 1 2 z 1 2 + x 2 2 z 2 2 + 2x 1 x 2 z 1 z 2 + 2x 1 z 1 + 2x 2 z 2 + 1 = x 1 z 1 + x 2 z 2 + 1 2 = x z + 1 2 = K(x, z) 1 K( x, z) ( x z 1) 2 ABDBM Ron Shamr 30

Polynomal and Gaussan Kernels K ( x, z) ( x z 1) s called the polynomal kernel of degree p. For p=2, wth 7,000 genes usng the kernel once: nner product wth 7,000 terms, squarng Mappng explctly to the hgh-dmensonal space: calculatng ~50,000,000 new features for both tranng nstances, then takng the nner product of that (another 50,000,000 terms to sum) In general, usng the Kernel trck provdes huge computatonal savngs over explct mappng! Another common opton: Gaussan kernel (maps to l dmensonal space wth l=no of tranng ponts): K( x, z) exp( x z p 2 / 2 ) ABDBM Ron Shamr 31

The Mercer Condton Is there a mappng (x) for any symmetrc functon K(x,z)? No The SVM dual formulaton requres calculaton K(x, x j ) for each par of tranng nstances. The matrx G j = K(x,x j ) s called the Gram matrx Theorem (Mercer 1908): There s a feature space (x) ff the Kernel s such that G s postve-sem defnte Recall: M PSD ff z 0 z T Mz>0 ff M has non-negatve egenvalues ABDBM Ron Shamr 32

Support Vector Machnes Three man deas: 1. Defne what an optmal hyperplane s (takng nto account that t needs to be computed effcently): maxmze margn 2. Generalze to non-lnearly separable problems: have a penalty term for msclassfcatons 3. Map data to hgh dmensonal space where t s easer to classfy wth lnear decson surfaces: reformulate problem so that data are mapped mplctly to ths space ABDBM Ron Shamr 33

Complexty (for one mplementaton, Burges 98) Notaton: l tranng pts of dmenson d, N support vectors (Nl) When most SVs are not at the upper bound: O(N 3 +N 2 l+ndl) f N<<l O(N 3 +nl+ndl) f N~l When most SVs are at the upper bound: O(N 2 + Ndl) f N<<l O(dl 2 ) f N~l ABDBM Ron Shamr 34

Other Types of Kernel Methods SVMs that perform regresson SVMs that perform clusterng -Support Vector Machnes: maxmze margn whle boundng the number of margn errors Leave One Out Machnes: mnmze the bound of the leave-one-out error SVM formulatons that allow dfferent cost of msclassfcaton for dfferent classes Kernels sutable for sequences of strngs, or other specalzed kernels ABDBM Ron Shamr 35

Feature Selecton wth SVMs Recursve Feature Elmnaton Tran a lnear SVM Remove the x% of varables wth the lowest weghts (those varables affect classfcaton the least) Retran the SVM wth remanng varables and repeat untl classfcaton qualty s reduced Very successful Other formulatons exst where mnmzng the number of varables s folded nto the optmzaton problem Smlar algs for non-lnear SVMs Qute successful ABDBM Ron Shamr 36

Why do SVMs Generalze? Even though they map to a very hgh-dmensonal space They have a very strong bas n that space The soluton has to be a lnear combnaton of the tranng nstances Large theory on Structural Rsk Mnmzaton provdng bounds on the error of an SVM Typcally the error bounds too loose to be of practcal use ABDBM Ron Shamr 37

Conclusons SVMs formulate learnng as a mathematcal program takng advantage of the rch theory n optmzaton SVM uses kernels to map ndrectly to extremely hgh dmensonal spaces SVMs are extremely successful, robust, effcent, and versatle, and have a good theoretcal bass ABDBM Ron Shamr 39

Vladmr Vapnk Vladmr Naumovch Vapnk s one of the man developers of Vapnk Chervonenks theory. He was born n the Sovet Unon. He receved hs master's degree n mathematcs at the Uzbek State Unversty, Samarkand, Uzbek SSR n 1958 and Ph.D n statstcs at the Insttute of Control Scences, Moscow n 1964. He worked at ths nsttute from 1961 to 1990 and became Head of the Computer Scence Research Department. At the end of 1990, he moved to the USA and joned the Adaptve Systems Research Department at AT&T Bell Labs n Holmdel, New Jersey. The group later became the Image Processng Research Department of AT&T Laboratores when AT&T spun off Lucent Technologes n 1996. Vapnk Left AT&T n 2002 and joned NEC Laboratores n Prnceton, New Jersey, where he currently works n the Machne Learnng group. He also holds a Professor of Computer Scence and Statstcs poston at Royal Holloway, Unversty of London snce 1995, as well as an Adjunct Professor poston at Columba Unversty, New York Cty snce 2003. He was nducted nto the U.S. Natonal Academy of Engneerng n 2006. He receved the 2008 Pars Kanellaks Award. Whle at AT&T, Vapnk and hs colleagues developed the theory of the support vector machne. They demonstrated ts performance on a number of problems of nterest to the machne learnng communty, ncludng handwrtng recognton. http://en.wkpeda.org/wk/vladmr_vapnk 40 ABDBM Ron Shamr

Suggested Further Readng http://www.kernel-machnes.org/tutoral.html http://www.svms.org/tutorals/ - many tutorals C. J. C. Burges. "A Tutoral on Support Vector Machnes for Pattern Recognton." Knowledge Dscovery and Data Mnng, 2(2), 1998. E. Osuna, R. Freund, and F. Gros. "Support vector machnes: Tranng and applcatons." Techncal Report AIM-1602, MIT A.I. Lab., 1996. P.H. Chen, C.-J. Ln, and B. Schölkopf. A tutoral on nu -support vector machnes. 2003. N. Crstann. ICML'01 tutoral, 2001. K.-R. Müller, S. Mka, G. Rätsch, K. Tsuda, and B. Schölkopf. An ntroducton to kernel-based learnng algorthms. IEEE Neural Networks, 12(2):181-201, May 2001. (PDF) B. Schölkopf. SVM and kernel methods, 2001. Tutoral gven at the NIPS Conference. Haste, Tbshran, Fredman, The Elements of Statstcal Learnng, Sprngel 2001 ABDBM Ron Shamr 41

Analyss of mcroarray GE data usng SVM Brown, Grundy, Ln, Crstann, Sugnet, Furey, Ares Jr., Haussler PNAS 97(1) 262-7 (2000) ABDBM Ron Shamr 42

Data Expresson patterns of n=2467 annotated yeast genes over m=79 dfferent condtons Sx gene functonal classes: 5 related to transcrpt levels, trcarboxylc acd (TCA) cycle, respraton, cytoplasmc rbosomes, proteasome, hstones, and 1 unrelated (control) helx-turn-helx protens. For gene x, condton : E level of x n tested condton R level of x n reference condton Normalzed pattern (X 1,,X m ) of gene x: X = log(e /R )/( k log 2 (E k /R k )) 0.5 ABDBM Ron Shamr 43

Goal Classfy genes based on gene expresson Tred SVM and other classfers 1/ w ABDBM Ron Shamr www.contrb.andrew.cmu.edu/~jn/r_proj/ 44

Kernel functons used Smplest : K(X,Y)=X Y+1 (dot product; lnear kernel) Kernel of degree d: K(X,Y)=(X Y+1) d Radal bass (Gaussan) kernel: exp(- X-Y 2 /2 2 ) n + / n - : no. of postve / negatve examples Problem: n + << n - Overcomng mbalance: modfy K s dagonal: K j =K(X,X j )+c/n + for postve ex, K j =K(X,X j )+c/n - for negatve ex ABDBM Ron Shamr 48

Measurng performance True Classfer + - + TP FP - FN TN The mbalance problem: very few postves Performance of method M: C(M) =FP+2FN C(N) = cost of classfyng all as negatves S(M) =C(N)-C(M) (how much we save by the classfer). 3-way cross valdaton: 2/3 learn, 1/3 test ABDBM Ron Shamr 49

Results TCA class Method FP FN TP TN S(M) D-p-1-SVM 18 5 12 2,432 6 D-p-2-SVM 7 9 8 2,443 9 D-p-3-SVM 4 9 8 2,446 12 Radal-SVM 5 9 8 2,445 11 Parzen 4 12 5 2,446 6 FLD 9 10 7 2,441 5 C4.5 7 17 0 2,443-7 MOC1 3 16 1 2,446-1 D-p--SVM: dot product kernel, degree Other methods used: Parzen wndows, Fsher lnear dscrmnant, C4.5+MOC1: decson trees ABDBM Ron Shamr 50

Results: Rbo Class Method FP FN TP TN S(M) D-p-1-SVM 14 2 119 2,332 224 D-p-2-SVM 9 2 119 2,337 229 D-p-3-SVM 7 3 118 2,339 229 Radal-SVM 6 5 116 2,340 226 Parzen 6 8 113 2,340 220 FLD 15 5 116 2,331 217 C4.5 31 21 100 2,315 169 MOC1 26 26 95 2,320 164 ABDBM Ron Shamr 51

Results: Summary SVM outperformed the other methods Ether hgh-dm dot-product or Gaussan kernels worked best Insenstve to specfc cost weghtng Consstently msclassfed genes requre specal attenton Does not always reflect proten levels and post-translatonal modfcatons Can use classfers for functonal annotaton ABDBM Ron Shamr 52

Davd Haussler ABDBM Ron Shamr 53

Gene Selecton va the BAHSIC Famly of Algorthms Le Song, Justn Bedo, Karsten M. Borgwardt, Arthur Gretton, Alex Smola ISMB 07 ABDBM Ron Shamr 54

Testng 15 two-class datasets (mostly cancer), 2K-25K genes, 50-300 samples 10-fold cross valdaton Selected the 10 top features accordng to each method pc=pearson s correlaton, snr=sgnal-to-nose rato, pam=shrunken centrod, t=t-statstcs, m-t = moderated t- statstcs, lods=b-statstcs, ln=centrod, RBF= SVM w Gaussan kernel, rfe=svm recursve feature elmnaton, l1=l 1 norm SVM, m=mutual nformaton) Selecton method: RFE: Tran, remove 10% of features that are least relevant, repeat. ABDBM Ron Shamr 55

Classfcaton error % Overlap btw the 10 genes selected n each fold Lnear kernel has best overall performance L2 dst from best 56 # tmes alg was best ABDBM Ron Shamr

Multclass datasets In a smlar comparson on 13 multclass datasets, lnear kernel was agan best. ABDBM Ron Shamr 58

Rules of thumb Always apply the lnear kernel for general purpose gene selecton Apply a Gaussan Kernel f nonlnear effects are present, such as multmodalty or complementary effects of dfferent genes Not a bg surprse, gven the hgh dmenson of mcroarray datasets, but pont drven home by broad expermentaton. ABDBM Ron Shamr 59