CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Similar documents
Support Vector Machines

Support Vector Machines

Classification / Regression Support Vector Machines

Support Vector Machines. CS534 - Machine Learning

The Research of Support Vector Machine in Agricultural Data Classification

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning 9. week

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

GSLM Operations Research II Fall 13/14

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Edge Detection in Noisy Images Using the Support Vector Machines

Announcements. Supervised Learning

Lecture 5: Multilayer Perceptrons

DECISION SUPPORT SYSTEM FOR HEART DISEASE BASED ON SEQUENTIAL MINIMAL OPTIMIZATION IN SUPPORT VECTOR MACHINE

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Solving the SVM Problem. Christopher Sentelle, Ph.D. Candidate L-3 CyTerra Corporation

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Smoothing Spline ANOVA for variable screening

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems

An Anti-Noise Text Categorization Method based on Support Vector Machines *

5 The Primal-Dual Method

Network Intrusion Detection Based on PSO-SVM

Discriminative classifiers for object classification. Last time

Classifier Selection Based on Data Complexity Measures *

Face Recognition Method Based on Within-class Clustering SVM

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c

Feature Reduction and Selection

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

Using Neural Networks and Support Vector Machines in Data Mining

Support Vector Machines

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Data Mining: Model Evaluation

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Programming in Fortran 90 : 2017/2018

Correlative features for the classification of textural images

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

The Study of Remote Sensing Image Classification Based on Support Vector Machine

Lecture 4: Principal components

Quadratic Program Optimization using Support Vector Machine for CT Brain Image Classification

Efficient Text Classification by Weighted Proximal SVM *

Parallel matrix-vector multiplication

Discriminative Dictionary Learning with Pairwise Constraints

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

Hierarchical clustering for gene expression data analysis

Learning to Project in Multi-Objective Binary Linear Programming

Automated method for scoring breast tissue microarray spots using Quadrature mirror filters and Support vector machines

An Entropy-Based Approach to Integrated Information Needs Assessment

Journal of Process Control

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Adaptive Virtual Support Vector Machine for the Reliability Analysis of High-Dimensional Problems

Classification Of Heart Disease Using Svm And ANN

Classification and clustering using SVM

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

LECTURE : MANIFOLD LEARNING

Face Detection with Deep Learning

Biostatistics 615/815

ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES. Dissertation.com

Japanese Dependency Analysis Based on Improved SVM and KNN

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

Binary classification posed as a quadratically constrained quadratic programming and solved using particle swarm optimization

Relevance Feedback Document Retrieval using Non-Relevant Documents

Alternating Direction Method of Multipliers Implementation Using Apache Spark

Multiple optimum values

CS 534: Computer Vision Model Fitting

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Minimization of the Expected Total Net Loss in a Stationary Multistate Flow Network System

Face Recognition Based on SVM and 2DPCA

A Binarization Algorithm specialized on Document Images and Photos

S1 Note. Basis functions.

Review of approximation techniques

A Novel Term_Class Relevance Measure for Text Categorization

Three supervised learning methods on pen digits character recognition dataset

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Unsupervised Learning

A User Selection Method in Advertising System

Fast Feature Value Searching for Face Detection

Protein Secondary Structure Prediction Using Support Vector Machines, Nueral Networks and Genetic Algorithms

An Optimal Algorithm for Prufer Codes *

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Optimizing Document Scoring for Query Retrieval

Collaboratively Regularized Nearest Points for Set Based Recognition

Histogram of Template for Pedestrian Detection

Complex System Reliability Evaluation using Support Vector Machine for Incomplete Data-set

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Support Vector Machine Algorithm applied to Industrial Robot Error Recovery

Modular PCA Face Recognition Based on Weighted Average

Control strategies for network efficiency and resilience with route choice

Random Kernel Perceptron on ATTiny2313 Microcontroller

Message-Passing Algorithms for Quadratic Programming Formulations of MAP Estimation

Transcription:

48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue et al 004) of DNAs expressed at dfferent condtons. The mage s further converted nto numercal data as pxel ntensty reflectng deally the count of photons correspondng to the amount of transcrpts genetcally. Ths data s analyzed to study the cause of dsease, effectveness of treatments and so on. Data mnng s becomng an ncreasngly mportant tool to transform ths data nto nformaton. The challenge n studyng the mcroarray dataset s that t ncludes a large number of features, typcally 000 3000. But, not all of these genes are requred for classfcaton (Wang and Palade 007). As these genes do not nfluence the performance of the classfcaton task, takng them nto account durng classfcaton ncreases the dmenson of the classfcaton problem, poses computatonal dffcultes and ntroduces unnecessary nose n the process. In dagnostc research, procedures are based on mcroarray wth enough probes to detect a certan dsease. So the process of selectng nformatve genes, that s, genes related to the partcular study or dsease s called Gene Selecton (GS) (L et al 001). Ths process s smlar to feature selecton for machne learnng n general.

49 The classfcaton accuracy acheved for classfyng gene expresson s hgher for supervsed learnng methods lke SVM and NN (Lee et al 005). Many studes based on SVM for classfyng gene expresson s avalable n the lterature (Furey et al 000; Fujarewcz and Wench 003; We et al 010). The SVM algorthm s a powerful supervsed learnng algorthm. In ths chapter t s proposed to mplement SMO traned support vector classfer wth poly kernel and compare the classfcaton effcency wth Naïve Bayes and CART classfers on colon cancer data avalable from Kent Rdge Bomedcal Data Repostory. 3. METHODOLOGY The colon cancer data s avalable n Kent Rdge Bomedcal Data Repostory. The gene expresson samples were analyzed wth an Affymetrx Olgonucleotde array complementary to more than 6500 human genes. Colon epthelal cell samples taken from 6 colon-cancer patents form the dataset. The orgnal data on each sample conssts of 6000 gene expresson levels of whch 4000 were removed based on the confdence n the measured expresson levels. Thus each sample contans 000 gene expresson levels. Of the 6 samples n the dataset, 40 samples are normal samples and the remanng are samples wth colon cancer. Each sample was taken from tumors and normal healthy parts of the colons of the same patents and measured usng hgh densty olgonucleotde arrays (Ben-Dor et al 000). 3..1 Support Vector Machne for Cancer Predcton Support vector machne s a machne learnng technque based on the structural rsk mnmzaton prncple (Vapnk 1995). SVM uses a hyperplane to separate the postve examples from negatve examples.

50 SVM s wdely used for classfcaton as the classfer has to calculate only the nner product between the two vectors of the tranng data. It s wdely appled n bomedcal research for classfcaton. SVMs perform better than the neural networks (Zen et al 000). SVM wth lnear kernel, polynomal kernel or Radal Bass Functon (RBF) kernel s used to classfy genes usng gene expresson data. 3..1.1 SVM algorthm The modelng of SVM s shown n Equaton (3.1) through Equaton (3.7). For lnear SVMs, K s lnear and the output of SVM can be expressed as and u w* x t (3.1) w yx (3.) where u s the SVM output; wxx,, are vectors and t s the threshold. dual quadratc form: Tranng of SVMs s done by fndng, expressed as mnmzng a 1 y y Kx x mn mn, (3.3) j j j j subject to box constrant, 0 C and lnear equalty constrant, y 0 (3.4)

51 The are the Lagrange multplers. Data sets are not always lnearly separable. In such a case, a hyperplane cannot splt the tranng set nto postve and negatve examples. In such cases, modfcaton to the orgnal optmzaton s gven by (Cortes and Vapnk 1995): 1 mn wb,, w C N 1 (3.5) subject to, y wx. b 1, (3.6) where are slack varables and C s a parameter whch trades off wde margn wth a small number of margn falures. The output of a non-lnear SVM s computed from the Lagrange multplers: N u a K x, x t (3.7) j1 j j j where K s a kernel functon that measures the smlarty between nput vector x and stored tranng vector x j. The Lagrange multplers Equaton (3.8) are computed usng quadratc programs. The non-lnearty alters the quadratc form: N N N 1 mn mn a a K( x, x ), (3.8) j j j 1 j1 1 0 C,, N y 0 1

5 The Quadratc Programmng (QP) problem s solved usng SMO algorthm. 3..1. Tranng of SVM The QP problems n SVM cannot be solved usng standard QP technques due to ther huge sze, as the matrx has I number of elements, where I s the number of tranng examples. Chunkng algorthm (Vapnk 1995) s used to solve the SVM QP, whch removes the rows and columns of the matrx that corresponds to zero Lagrange multplers, thus breakng down the QP nto smaller QP. At every step, a QP problem s solved by takng examples of every non-zero Lagrange multpler from the last step and the worst examples that volate the Karush, Kuhn, Tucker (KKT) condtons (Chrstopher Burges 1998). The process s repeated tll the entre set of nonzero Lagrange multplers are dentfed, thus the last step solves the QP. Yet the major dsadvantage of chunkng s that large scale tranng problems cannot be handled as the reduced sze tself wll not ft nto memory. 3.. Sequental Mnmal Optmzaton Sequental Mnmal Optmzaton s a smple algorthm that solves the SVM QP problem durng tranng of SVM. The advantage of SMO s that the QP s solved wthout usng numercal optmzaton steps and extra matrx storage s not requred. Usng Osuna s theorem, SMO decomposes the overall QP problem nto QP sub-problems. For solvng SVM QP problem, two Lagrange multplers whch comply wth lnear equalty constrant are used for a small optmzaton. At each step, SMO chooses two Lagrange multplers to jontly optmze, fnds the optmal values for these multplers and updates the SVM to reflect the new optmal values.

53 The two Lagrange multplers can be solved analytcally and thus numercal QP optmzaton can be entrely avoded. The solvng of multplers can be expressed n the algorthm n the form of loop usng Vsual C++ code, thus each sub-problem s solved quckly and QP problem s solved fast. Thus, very large SVM tranng problems can be easly processed and stored n the memory of an ordnary personal computer or workstaton. Because no matrx algorthms are used n SMO, t s less susceptble to numercal precson problems. There are two components to SMO: an analytc method for solvng the two Lagrange multplers and a heurstc for choosng whch multplers to optmze. 3...1 Analytcal method of solvng multplers SMO solves the QP, expressed n solvng mn and the lnear constrant by decomposng the QP problem nto fxed sze QP sub-problems. SMO computes the constrants of the multplers and solves the constraned mnmum. As there are only two multplers, the constrants can be shown n two dmensons. Due to box constrant, the multplers le wthn a box and the lnear constrant makes the multpler le on a dagonal n the box. As the multplers le on the dagonal, the algorthm computes and the ends of the segment are expressed n terms of. gven by: If target a 1 does not equal target a, then the bounds for are M max 0,, Nmn CC, 1 1 When target a 1 equals target a, then the bounds for are gven by:

54 M C, N mn C, max 0, 1 1 The mnmum along the drecton of constrant s computed by SMO as shown n Equaton (3.9): a S S new 1 (3.9) where S s the error on the th tranng example and s gven by S u a and s the second dervatve of the objectve functon along the dagonal gven n Equaton (3.10): K( x, x ) K( x, x ) K( x, x ) (3.10) 1 1 1 Now the constraned mnmum s found by clppng Equaton (3.11) and (3.1): N f N new new, clpped new new new f M N M f M (3.11), then the value of a 1 s computed from the new, clpped a as: If z aa 1 new new, clpped 1 1 z (3.1) where u s the SVM output for the th tranng example. SMO termnates when all of the KKT optmalty condtons are fulflled: 0 au 1 0 C au 1 C au 1

55 3... Heurstcs for choosng multplers for optmzaton Convergence s assured usng SMO as t optmzes and alters multplers at every step and each step decreases the objectve functons. Heurstcs chooses whch multpler to optmze the speed of convergence. There are two separate choce of heurstc, one for each of the Lagrange multpler. The loop once agan terates to check for examples whose Lagrange multplers are nethe 0 or C, and the examples whch volate are optmzed. The outer loop makes repeated passes over the non-bound examples untl all examples obey the KKT condtons wthn. The outer loop termnates when all the examples obey the KKT condtons 3 wthn. Typcally the value of s set to be10. In the frst choce heurstc, the examples most lkely to volate KKT condtons are concentrated on. So only the set of non-bound examples are terated untl the set s self-consstent. Then the SMO scans the entre set of examples for KKT volaton. On selecton of the frst Lagrange multpler, SMO selects the second Lagrange multpler to maxmze the sze of the step taken durng jont optmzaton. As the evaluaton of kernel functon K s tme new consumng, SMO approxmates the value S1 S when computng. SMO records cached error value S for non-bound example n the tranng set and then chooses an error to approxmately maxmze the step sze. Thus f S 1 s postve, the example wth mnmum error S s chosen and f S 1 s negatve then the example wth maxmum error S s chosen.

56 3.3 RESULT AND DISCUSSION The colon cancer dataset was traned and tested usng 10 fold cross valdaton. The test bench wth the colon cancer dataset s furnshed n Fgure 3.1. Fgure 3.1 Snapshot of the colon cancer dataset used n the experment

57 The classfcaton accuracy of the three classfers under test s represented n Fgure 3. wth the senstvty and specfcty plots shown n Fgure 3.3. Fgure 3. Classfcaton accuracy of varous classfers It s seen that the performance of sequental mnmal optmzaton gves the best n the overall classfcaton accuracy. Though the classfcaton accuracy of CART s lower than SMO, the senstvty of the CART predctor outperforms SMO. Senstvty measures the actual true postves that are actually measured and t s gven n Equaton (3.13) number of true postves senstvty = number of true postves + number of false negatves (3.13)

58 The specfcty for a bnary class problem s gven n Equaton (3.14). Specfcty s used to measure the classfers ablty to predct the negatve results number of true negatves specfcty = number of true negatves + number of false postves (3.14) Fgure 3.3 Senstvty and specfcty plots A confuson matrx represents the obtaned predcted values n supervsed learnng and s used to show the correct labels and mslabels. The confuson matrx obtaned for all the three classfers s gven n Table 3.1. Table 3.1 Confuson matrx Confuson matrx Naïve Bayes CART SMO Postve Negatve Postve Negatve Postve Negatve Postve 14 8 9 13 17 5 Negatve 1 19 38 4 36

59 The algorthm computaton tme for all the three classfers s shown n Fgure 3.4. The computaton tme was measured n an ntel core 3 M350 processor runnng at.7 Ghz wth 3 Gb RAM and Wndows 7 operatng system. Fgure 3.4 The algorthm computaton tme 3.4 SUMMARY In ths chapter, a support vector machne classfer traned usng sequental mnmal optmzaton s nvestgated. The classfcaton accuracy s good compared to Nave Bayes or CART classfers. However, one drawback of the proposed support vector machne based classfers for cancer predcton s the convergence of SMO for hgher values of complexty parameter 'C' (trades off wde margn wth a small number of margn falures). Thus the performance tmng degrades as the value of 'C' ncreases. The next chapter proposes to focus on LVQ, a hghly ntutve learnng model whch s based on a dfferent tranng paradgm.