An Experimental Multi-Objective Study of the SVM Model Selection problem

Similar documents
Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

A Practical Guide to Support Vector Classification

Multi-Objective Optimization for SVM Model Selection

Multi-Objective Optimization using Evolutionary Algorithms

Combining SVMs with Various Feature Selection Strategies

Dynamic Ensemble Construction via Heuristic Optimization

Multi-objective Optimization

Multi-Objective Optimization using Evolutionary Algorithms

Using ɛ-dominance for Hidden and Degenerated Pareto-Fronts

Improving Performance of Multi-objective Genetic for Function Approximation through island specialisation

Mechanical Component Design for Multiple Objectives Using Elitist Non-Dominated Sorting GA

Multiobjective Formulations of Fuzzy Rule-Based Classification System Design

Second Order SMO Improves SVM Online and Active Learning

Multi-objective Model Selection for Support Vector Machines

Multi-objective Optimization

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

Evolutionary Algorithms: Lecture 4. Department of Cybernetics, CTU Prague.

Lamarckian Repair and Darwinian Repair in EMO Algorithms for Multiobjective 0/1 Knapsack Problems

Multi-objective Optimization and Meta-learning for SVM Parameter Selection

Experimental Study on Bound Handling Techniques for Multi-Objective Particle Swarm Optimization

Multiobjective RBFNNs Designer for Function Approximation: An Application for Mineral Reduction

A Practical Guide to Support Vector Classification

Leave-One-Out Support Vector Machines

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

Evolutionary Computation

A gradient-based multiobjective optimization technique using an adaptive weighting method

Multi-objective optimization of support vector machines

Can Support Vector Machine be a Major Classification Method?

A Similarity-Based Mating Scheme for Evolutionary Multiobjective Optimization

DEMO: Differential Evolution for Multiobjective Optimization

Kernel Methods and Visualization for Interval Data Mining

Reference Point Based Evolutionary Approach for Workflow Grid Scheduling

Recombination of Similar Parents in EMO Algorithms

CHAPTER 6 REAL-VALUED GENETIC ALGORITHMS

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

Balancing Survival of Feasible and Infeasible Solutions in Evolutionary Optimization Algorithms

Mechanical Component Design for Multiple Objectives Using Elitist Non-Dominated Sorting GA

Evaluation of Performance Measures for SVR Hyperparameter Selection

Multi-Objective Optimization Using Genetic Algorithms

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.

Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach

Discovering Knowledge Rules with Multi-Objective Evolutionary Computing

Project Presentation. Pattern Recognition. Under the guidance of Prof. Sumeet Agar wal

Multi-Objective Pipe Smoothing Genetic Algorithm For Water Distribution Network Design

Classification by Support Vector Machines

Classification by Support Vector Machines

Good Cell, Bad Cell: Classification of Segmented Images for Suitable Quantification and Analysis

Approximation-Guided Evolutionary Multi-Objective Optimization

Classification by Support Vector Machines

Approximation Model Guided Selection for Evolutionary Multiobjective Optimization

Multi-Objective Memetic Algorithm using Pattern Search Filter Methods

Multi-objective Optimization Algorithm based on Magnetotactic Bacterium

Dynamic Uniform Scaling for Multiobjective Genetic Algorithms

Reihe Informatik 10/2001. Efficient Feature Subset Selection for Support Vector Machines. Matthias Heiler, Daniel Cremers, Christoph Schnörr

Incorporation of Scalarizing Fitness Functions into Evolutionary Multiobjective Optimization Algorithms

Optimizing Delivery Time in Multi-Objective Vehicle Routing Problems with Time Windows

SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification

The k-means Algorithm and Genetic Algorithm

Effectiveness and efficiency of non-dominated sorting for evolutionary multi- and many-objective optimization

Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008

Evolutionary multi-objective algorithm design issues

Towards Understanding Evolutionary Bilevel Multi-Objective Optimization Algorithm

The Effects of Outliers on Support Vector Machines

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Evolutionary Optimization of Neural Networks for Face Detection

Allstate Insurance Claims Severity: A Machine Learning Approach

Time Complexity Analysis of the Genetic Algorithm Clustering Method

Using Different Many-Objective Techniques in Particle Swarm Optimization for Many Objective Problems: An Empirical Study

MULTI-OBJECTIVE GENETIC LOCAL SEARCH ALGORITHM FOR SUPPLY CHAIN SIMULATION OPTIMISATION

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS

Version Space Support Vector Machines: An Extended Paper

Lecture Set 1B. S.D. Sudhoff Spring 2010

Improving Generalization of Radial Basis Function Network with Adaptive Multi-Objective Particle Swarm Optimization

DCMOGADES: Distributed Cooperation model of Multi-Objective Genetic Algorithm with Distributed Scheme

Robust 1-Norm Soft Margin Smooth Support Vector Machine

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS

Face Recognition using SURF Features and SVM Classifier

Finding Sets of Non-Dominated Solutions with High Spread and Well-Balanced Distribution using Generalized Strength Pareto Evolutionary Algorithm

Online Mathematical Symbol Recognition using SVMs with Features from Functional Approximation

ScienceDirect. Differential Search Algorithm for Multiobjective Problems

X/$ IEEE

A Taxonomy of Semi-Supervised Learning Algorithms

MIXED VARIABLE ANT COLONY OPTIMIZATION TECHNIQUE FOR FEATURE SUBSET SELECTION AND MODEL SELECTION

Evolutionary Algorithms and the Cardinality Constrained Portfolio Optimization Problem

Assessing the Convergence Properties of NSGA-II for Direct Crashworthiness Optimization

STUDY OF MULTI-OBJECTIVE OPTIMIZATION AND ITS IMPLEMENTATION USING NSGA-II

Handling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization

Data Mining in Bioinformatics Day 1: Classification

Multi-objective optimization using Trigonometric mutation multi-objective differential evolution algorithm

Parallel Evaluation of Hopfield Neural Networks

A Parameterless-Niching-Assisted Bi-objective Approach to Multimodal Optimization

An Evolutionary Multi-Objective Crowding Algorithm (EMOCA): Benchmark Test Function Results

A Nelder-Mead Tuner for Svm

Software Documentation of the Potential Support Vector Machine

SVM Toolbox. Theory, Documentation, Experiments. S.V. Albrecht

Neural Network Regularization and Ensembling Using Multi-objective Evolutionary Algorithms

SSV Criterion Based Discretization for Naive Bayes Classifiers

SPEA2+: Improving the Performance of the Strength Pareto Evolutionary Algorithm 2

Finding Knees in Multi-objective Optimization

Transcription:

An Experimental Multi-Objective Study of the SVM Model Selection problem Giuseppe Narzisi Courant Institute of Mathematical Sciences New York, NY 10012, USA narzisi@nyu.edu Abstract. Support Vector machines (SVMs) are a powerful method for both regression and classification. However, any SVM formulation requires the user to set two or more parameters which govern the training process and such parameters can have a strong effect on the result performance of the engine. Moreover, the design of learning systems is inherently a multi-objective optimization problem. It requires to find a suitable trade-off between at least two conflicting objectives: model complexity and accuracy. In this work the SVM model selection problem is cast as a multi-objective optimization problem, where the and the number of support vectors of the model define the two objectives. Experimental analysis is presented on a well known test-bed of datasets using two different kernels: RBF and sigmoid. Key words: Support Vector Machine, Multi-Objective optimization, NSGA-II, SVM Model Selection. 1 Introduction Support Vector Machines have been proven to be very effective methods for classification and regression [12]. However, in order to obtain good generalization s the user needs to choose appropriate values for the involved parameters of the model. The kernel parameters together with the regularization parameter C, are called hyperparameters of the SVM, and the problem of tuning them in order, for example, to improve the generalization of the model is called SVM model selection problem. Usually the standard method to determine the hyperparameter is by grid search. In the simple grid-search approach the hyperparameters are varied with a fixed step-size through a wide range of values and the performance of every combination is measured. Because of its computational complexity, grid-search is only suitable for the adjustment of very few parameters. Further, the choice of the discretization of the search space may be crucial. Figure 1 shows the typical parameter surface for the and the number of support vectors as a function of the hyperparameters C and γ for the diabetes dataset. Recently gradient-based approaches have been explored for choosing the hyperparameters[2, 6, 8]. However they have some drawbacks and limitations. First of all, the score function to evaluate the quality of a set

2 Giuseppe Narzisi of hyperparameters must be differentiable, which excludes important important measure, as the number of support vectors. Also because the objective function is strongly multimodal, the performance of the gradient-based heuristic depend on the initialization, which means that the algorithm can easily get stuck in a sub-optimal local minima. # of SVs 36 34 24 22 36 34 24 22 650 600 550 500 450 350 250 650 600 550 500 450 350 250 0.01 0.1 C 1 10 100 1000 10000 1000000.0001 0.001 0.01 0.1 1 100000 100010000 100 10 gamma 0.01 0.1 C 1 10 100 1000 10000 1000000.0001 0.001 0.01 0.1 1 100000 100010000 100 10 gamma (a) (b) Fig.1. Parameter surface of the (a) and the number of SVs (b) as a function of the two hyperparameters C and γ for the diabetes dataset using 5-fold cross-validation. The main idea which is missing in this kind of approaches is that the SVM model selection problem is inherently a multi-objective optimization problem. Designing supervised learning systems for classification requires finding a suitable trade-off between several objectives. Typically we want to reduce the complexity of the model and at the same time obtaining a model with high accuracy level (or low rate). Sometimes having a model with the best generalization could not be the best choice if the price that we have to pay is working with a very complex model both in terms of time and space. Usually this problem is tackled aggregating the objectives into a scalar function (linear weighting of the objectives) and applying standards method to the resulting single objective optimization problem. However, it has been shown how this approach is not a good solution because it requires that the aggregate function correctly matches the problem, and this is not an easy task. The best solution is to apply directly the multi-objective approach in order to find the Pareto optimal set of solutions for the problem. Among the many possible approaches to solve a multi-objective optimization problem, the last decade has seen Multi-objective Evolutionary Algorithms (MOEA) as the emerging method in this area. Successful application have been already obtained in the machine learning area in the case of feature selection for SVMs [9, 10]. Similar experiments to the ones presented in this paper has been proposed in [7] where the split modified radius margin bounds and the training were used in conjunction with the number of SVs. The experiments presented in this work differ from that approach in many ways: 1) the impact of different kernels

A multi-objective analysis of Support Vector Machines 3 is analyzed; 2) the simple straightforward 2-objective formulation is considered (num. of SVs and CV ) before any additional sophistication; 3) the standard NSGA-II algorithms is used instead of the NSES algorithm proposed in [7]; 4) the is evaluated using the 5-fold cross-validation method. There are many reasons for using a multi-objective evolutionary approach for SVM model selection: ability to obtain in one run, not just a single model but several model which are optimal (in the Pareto sense) respect to the selected objectives or criteria; the best SVM model can be selected later from the Pareto front according to some higher level information or preferences; multiple hyperparameters can be tuned at the same time overcoming the limitation of the naive grid-search method; the objective/criteria do not need to be differentiable (as required for the gradient-based methods); efficient exploration of the multimodal search space associated with the parameters. The goal of this research work is to show the effectiveness of this approach for SVM model selection using a very simple 2-objective formulation which takes into account the complexity and the accuracy of the model. The paper is organized as follow. We first introduce SVMs and SVM model selection from the perspective of multi-objective optimization. Then we give the background on multi-objective optimization. Then we introduce the class of multi-objective evolutionary algorithms. Section 5 reports the results obtained on a test bed of four datasets widely used in the literature. Finally, the conclusions are presented and possible future line of investigation are given. 2 Multi-objective view of SVM The first evidence of the multi-objective nature of SVMs is directly related to their standard formulation in the inconsistent case, the so called C-SVM formulation: 1 min 2 w 2 + C m i ξ i (1) subject to y i [w x i + b] 1 ξ i, ξ i 0, i [1, m] where C is the the regularization parameter which determines the trade-off between the margin and the sum of the slack variables m i ξ i. The constant C is usually determined using same heuristic approach. However, the more natural formulation of the problem is the following: 1 min 2 w 2 min m i ξ i subject to y i [w x i + b] 1 ξ i, ξ i 0, i [1, m] (2) where the objective in (1) is split in two different conflicting objectives, overcoming the problem of determining the parameter C. Even if this formulation

4 Giuseppe Narzisi is more natural than (1), not so much effort on this problem is present in the literature. It would be interesting to analyze this problem using the theoretical approach presented by Mihalis Yannakakis in [13] where he discusses the condition under which an approximate trade-off curve can be constructed efficiently (in polynomial time). The multi-objective nature of SVM training is also present at the level of model selection. The typical criteria of evaluation for a classifier is given by the accuracy of the model in classifying new generated points, and this metric is often used alone in order to select/generate good classifiers. However there are many other important factors that must be taken into account when selecting a SVM model. A possible (not exhaustive) list is the following: number of input features; bound on the generalization (e.g., radius margin bound); number of support vectors. In this paper we consider the last one, number of SVs, as an additional selection criteria. 3 Multi-Objective Optimization When an optimization problem involves more then a single-valued objective function, the task of finding one (or more) optimum solution(s), is known as the Multi-Objective Optimization Problem (MOOP) [4]. An optimum solution with respect to one objective may not be optimum with respect to another objective. As a consequence, one cannot choose a solution, which is optimal with respect to only one objective. In problems characterized by more than one conflicting objective, there is no single optimum solution; instead there exists a set of solutions which are all optimal, called the Optimal Pareto front. A general multi-objective optimization problem is defined as follows (minimization case): min F(x) = [f 1 (x), f 2 (x),..., f M (x)] subject to E(x) = [e 1 (x), e 2 (x),..., e L (x)] 0 x i x (U) i, i = 1,...,N, x (L) i (3) where x = (x 1, x 2,...,x N ) is the vector of the N decision variables, M is the number of objectives f i, L is the number of constraints e j, and x (L) i and x (U) i are respectively the lower and upper bound for each decision variables x i. Two different solutions are compared using the concept of dominance, which induces a strict partial order in the objective space F. Here a solution a is said to dominate a solution b if it is better or equal in all objectives and better in at least one objective. For the minimization case we have: F(a) F(b) iff { fi (a) f i (b) i 1,..., M j 1,..., M f j (a) < f j (b) (4)

A multi-objective analysis of Support Vector Machines 5 In the specific case of the SVM model selection, we have that the hyperparameters are the decision variables of the problem, the range of exploration for each parameters are the bounds for each decision variable, and the model selection criteria are the objectives (no constraints are used in this formulation). 4 Method 4.1 Model selection metrics As discussed in section 2 there are many criteria that can be used for SVM model selection. In this section we introduce the two objectives that have been used for the simulations. Accuracy. The most direct way to evaluate the quality of a SVM model is to consider its classification performance (accuracy). In the simple case the data is split into a training and a validation set. The first set is used to generate the SVM model, the second set is used to evaluate the performance of the classifier. In this work we use the more general approach called L-fold cross-validation (CV). The data is partitioned into L disjoint sets D 1, D 2,..., D L and the SVM is trained L times on all data but the D i set which is used later as validation data. The accuracy (or ) is computed as the mean of the L different experiments. For reasons of computational complexity we use a 5-fold CV for each dataset. Number of support vectors. We know that the in the hard margin case the number of SVs is an upper bound on the expected number of s made by the leave-one-out procedure. Moreover the space and time complexity of the SVM classifier scales with the number of SVs. It follows that it is important to have a SVM model which has few number of support vector (SVs). Similarly to the 5-fold CV, the number of SVs is computed as the mean on the 5 different experiments of the CV method. 4.2 Multi-Objective Evolutionary Algorithms Evolutionary algorithms (EAs) are search methods that take their inspiration from natural selection and survival of the fittest in the biological world. EAs differ from more traditional optimization techniques in that they involve a search from a population of solutions, not from a single point. Each iteration of an EA involves a competitive selection that weeds out poor solutions. The solutions with high fitness are recombined with other solutions by swapping parts of a solution with another. Solutions are also mutated by making a small change to a single element of the solution. Recombination and mutation are used to generate new solutions that are biased towards regions of the space for which good solutions have already been seen. Multi-Objective Evolutionary Algorithms (MOEAs) are a special class of EAs with the goal of solving problems involving many conflicting objectives [4].

6 Giuseppe Narzisi LIBSVM library Test on new data Hyperparameters and mean number of SVs on 5 fold cross validation Decision Making phase SVM model selection NSGA II (Multi Objective Evolutionary Algorithm) Population Evolution Output Pareto fronts (trade off curve) Fig.2. NSGA-II and LIBSVM pipeline. Over the last decade, a steady stream of MOEAS has continued to be proposed and studied [4, 3]. MOEAs have been successfully applied to several real world problems (protein folding, circuit design, safety related systems, etc) even if no strong proof of convergence is available. Among the growing class of MOEAs, in this work we employ the well-known NSGA-II [5] (Nondominated Sorting Genetic Algorithm II). NSGA-II is based on the use of fast nondominated sorting approach to sort a population of solutions into different nondomination levels. It then uses elitism and a crowded-comparison operator for diversity preservation. Table 1. Benchmark datasets. Name Size Features Repository diabetes 768 8 UCI australian 690 14 Statlog german 1,000 24 Statlog splice 1,000 60 Delve 5 Results 5.1 Experiments In this research work we deal with the standard application of SVM for binary classification. We used a common benchmark of four datasets (table 1 shows the characteristics of the datasets). We consider two different kernel and their parameters: RBF (radial basis function): K(u,v) = exp( γ u v 2 ) Sigmoid: K(u,v) = tanh(γu T v + coef 0 )

A multi-objective analysis of Support Vector Machines 7 It follows that the hyperparameters considered will be respectively (C, γ) for the RBF kernel and (C, γ, coef 0 ) for the sigmoid kernel. The parameter ranges are: log 2 C [ 5,...,15], log 2 γ [ 10,...,4], coef 0 [0, 1]. According to the values suggested in [5], the NSGA-II parameters are set as follow: p c = 0.9, p m = 0.1, ν c = 10, ν m = 20. No effort has been spent in this work to tune these parameter, which clearly would improve the efficiency of the algorithm. A population size of 60 individuals is used and each simulation is curried out for a total of 250 generations. Each plot shows the Pareto fronts (trade-off curves) of all the points (SVM models) sampled by the algorithm after the first 50 generations. As it is described later, 50 iterations are enough to converge versus the final approximated Pareto front. SVMs are constructed using the LIBSVM 1 library [1] version 2.84. Figure 2 shows the interaction between NSGA-II and LIBSVM library. 350 3 310 290 0 0 240 0 270 220 0 200 250 20 22 24 34 36 (a) RBF 180 22 24 34 (b) Sigmoid num. of SVs 310 31 5.5 29.5 295 29 290.5 5 27.5 0 27 275.5 270 0 50 100 150 200 250 (c) RBF num. of SVs 31 0 29 0 27 240 220 25 200 0 50 100 150 200 250 24 (d) Sigmoid Fig. 3. Diabetes dataset: Pareto front of the sampled points using RBF (a) and sigmoid (b) kernel; mean evolution of the population for the and the number of SVs during the optimization of NSGA-II using RBF kernel (c) and sigmoid (d) kernel. 1 http://www.csie.ntu.edu.tw/ cjlin/libsvm

8 Giuseppe Narzisi 5.2 Discussion Figures 3, 4, 5 and 6 show the results obtained using the experimental protocol previously defined. Inspecting the results we observe, first of all, that approximate Pareto fronts are effectively obtained for each of the datasets showing how the two used objectives present a conflict behavior. This is also evident from the analysis of the evolution curves: an improvement of one objective is nearly always accompanied by a worsening in the other, but the interaction during the evolution produces a global minimization of both objectives. The choice of the kernel clearly affects the final outcome of the optimization algorithm. In particular the RBF kernel shows a better performance than the sigmoid kernel. Inspecting the Pareto fronts obtained we note that the RBF kernel allows to obtain a better distribution of solution along the two objectives. This is an important factor in multi-objective optimization: we want Pareto fronts with a wide range of values so that the selection of a final point in the second step (decision making) is facilitated. 180 180 170 175 160 170 165 150 140 1 160 120 110 155 12 14 16 18 20 22 24 (a) RBF 100 14 16 18 20 22 24 (b) Sigmoid 170 160 20 155 168 150 19 166 164 24 22 20 18 145 140 135 1 18 17 162 16 125 16 14 160 12 0 50 100 150 200 250 (c) RBF 120 115 15 0 50 100 150 200 250 (d) Sigmoid Fig. 4. Australian dataset: Pareto front of the sampled points using RBF (a) and sigmoid (b) kernel; mean evolution of the population for the and the number of SVs during the optimization of NSGA-II using RBF kernel (c) and sigmoid (d) kernel.

A multi-objective analysis of Support Vector Machines 9 For each dataset we also plot the mean evolution curves for the and the number of support vectors for the population of SVM models at each iteration. Inspecting the plots we observe that the algorithm generally converges very quickly to a set of good SVM models (first 50 iterations). It then uses the rest of the time to explore locally the space of solution for an additional finer refinement. If we compare the accuracy of the SVM models obtained using this method with other approaches in the literature we find comparable results. For example the best obtained for the diabetes dataset with this approach is 21.7 while the obtained by Keerthi in [8], Chapelle in [2] and Staelin in [11] are respectively 24.33, 23.19 and 20.3. Similarly for the splice dataset we obtain an of 12.4 while the obtained by by Keerthi in [8] and Staelin in [11] are respectively 10.16 and 11.7. 600 440 420 550 500 450 360 350 12 13 14 15 16 17 18 19 20 (a) RBF 0 18 20 22 24 (b) Sigmoid num. of SVs 20 540 19 520 18 500 17 480 460 16 440 15 420 14 13 0 50 100 150 200 250 12 (c) RBF num. of SVs 34 370 360 350 24 3 22 20 310 18 16 290 0 50 100 150 200 250 14 (d) Sigmoid Fig.5. Splice dataset: Pareto front of the sampled points using RBF (a) and sigmoid (b) kernel; mean evolution of the population for the and the number of SVs during the optimization of NSGA-II using RBF kernel (c) and sigmoid (d) kernel. An important advance of this approach is that together with good models, in terms of accuracy, the algorithm generate also many other models with different number of support vectors which are relevant in case that the complexity of the

10 Giuseppe Narzisi final model is an important factor for the final model selection. For example, in the case of the splice dataset, we could be happy to lose same degree of accuracy, and select a solution with an of 14% instead of 12%, in favor of a model that has a much lower complexity, 370 SVs instead of 570 (see figure 5). 450 440 440 420 4 420 410 360 390 370 0 360 22 24 34 36 (a) RBF 0 22 24 34 (b) Sigmoid 398 31 31 396 360 num. of SVs 394 392 390 29 29 388 27 27 386 0 384 0 50 100 150 200 250 25 (c) RBF 0 0 50 100 150 200 250 25 (d) Sigmoid Fig.6. German dataset: Pareto front of the sampled points using RBF (a) and sigmoid (b) kernel; mean evolution of the population for the and the number of SVs during the optimization of NSGA-II using RBF kernel (c) and sigmoid (d) kernel. 6 Conclusions and possible future investigations The SVM model selection problem clearly presents the characteristics of a multiobjective optimization problem. The results in this experimental work have shown that it is possible to effectively obtain approximated Pareto fronts of SVM models based on a simple 2-objective formulation where the accuracy and the complexity of the model are compared for Pareto dominance. This approach allows to visualize the characteristic trade-off curve for a specific dataset from where the user can select a specific model according to its own preferences and computational needs.

A multi-objective analysis of Support Vector Machines 11 The proposed method also allows to obtain comparable results to other approaches in the literature but with the advance that as set of Pareto optimal solutions (not a single one) is generated in output. Of course a deeper investigation is required and many different line of investigation can be considered : extending the formulation from 2-objectives to possibly k-objective (k > 2) including many other important criteria of model selection (like for example the number of input features); studying the performance of the proposed approach on the regression case; adapting the approach to the multi-classification case where it is harder to choose appropriate values for the base binary models of a decomposition scheme. References 1. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. 2. Olivier Chapelle, Vladimir Vapnik, Olivier Bousquet, and Sayan Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3):131 159, 2002. 3. Coello Coello and G. B. Lamont. Applications of Multi-Objective Evolutionary Algorithms. World Scientific, 2004. 4. Kalyanmoy Deb. Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, Inc., New York, NY, USA, 2001. 5. Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-II. IEEE Trans. Evolutionary Computation, 6(2):182 197, 2002. 6. Tobias Glasmachers and Christian Igel. Gradient-based adaptation of general gaussian kernels. Neural Comput., 17(10):2099 2105, 2005. 7. Christian Igel. Multi-objective model selection for support vector machines. Evolutionary Multi-Criterion Optimization, pages 534 546, 2005. 8. S.S. Keerthi. Efficient tuning of svm hyperparameters using radius/margin bound and iterative algorithms. IEEE Transactions on Neural Networks, 13:1225 1229, 2002. 9. S. Pang and N. Kasabov. Inductive vs. transductive inference, global vs. local models: Svm, tsvm, and svmt for gene expression classification problems. International Joint Conference on Neual Networks (IJCNN), 2:1197 1202, 2004. 10. S.Y.M. Shi, P.N. Suganthan, and K. Deb. Multi-class protein fold recognition using multi-objective evolutionary algorithms. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pages 61 66, 2004. 11. Carl Staelin. Parameter selection for support vector machines. HP Labs Technical Reports, 2002. 12. Vladimir N. Vapnik. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA, 1995. 13. Mihalis Yannakakis. Approximation of multiobjective optimization problems. Algorithms and Data Structures : 7th International Workshop, pages 1, 2001.