The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

Similar documents
Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Boosted Random Forest

Pipelined Multipliers for Reconfigurable Hardware

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

the data. Structured Principal Component Analysis (SPCA)

A Novel Validity Index for Determination of the Optimal Number of Clusters

Cluster-Based Cumulative Ensembles

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

Evolutionary Feature Synthesis for Image Databases

Gradient based progressive probabilistic Hough transform

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

Exploiting Enriched Contextual Information for Mobile App Classification

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

Extracting Partition Statistics from Semistructured Data

Relevance for Computer Vision

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Detection and Recognition of Non-Occluded Objects using Signature Map

A {k, n}-secret Sharing Scheme for Color Images

Model Based Approach for Content Based Image Retrievals Based on Fusion and Relevancy Methodology

Naïve Bayesian Rough Sets Under Fuzziness

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

A scheme for racquet sports video analysis with the combination of audio-visual information

13.1 Numerical Evaluation of Integrals Over One Dimension

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Using Augmented Measurements to Improve the Convergence of ICP

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Improved Circuit-to-CNF Transformation for SAT-based ATPG

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Gray Codes for Reflectable Languages

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

New Fuzzy Object Segmentation Algorithm for Video Sequences *

Particle Swarm Optimization for the Design of High Diffraction Efficient Holographic Grating

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Australian Journal of Basic and Applied Sciences. A new Divide and Shuffle Based algorithm of Encryption for Text Message

Micro-Doppler Based Human-Robot Classification Using Ensemble and Deep Learning Approaches

Spatial-Aware Collaborative Representation for Hyperspectral Remote Sensing Image Classification

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

Exploring the Commonality in Feature Modeling Notations

Weak Dependence on Initialization in Mixture of Linear Regressions

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

Polyhedron Volume-Ratio-based Classification for Image Recognition

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks

Graph-Based vs Depth-Based Data Representation for Multiview Images

Cell Projection of Convex Polyhedra

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Unified Subdivision Scheme for Polygonal Modeling

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

Batch Auditing for Multiclient Data in Multicloud Storage

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else

Learning Discriminative and Shareable Features. Scene Classificsion

A Fast Kernel-based Multilevel Algorithm for Graph Clustering

Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings

HEXA: Compact Data Structures for Faster Packet Processing

Cluster Centric Fuzzy Modeling

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

An Approach to Physics Based Surrogate Model Development for Application with IDPSA

Mean Deviation Similarity Index: Efficient and Reliable Full-Reference Image Quality Evaluator

We don t need no generation - a practical approach to sliding window RLNC

Approximate logic synthesis for error tolerant applications

arxiv: v1 [cs.db] 13 Sep 2017

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Diffusion Kernels on Graphs and Other Discrete Structures

Generating the Reduced Set by Systematic Sampling

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

Implementing Load-Balanced Switches With Fat-Tree Networks

MODEL AND ALGORITHMS OF THE FUZZY THREE-DIMENSIONAL AXIAL ASSIGNMENT PROBLEM WITH AN ADDITIONAL CONSTRAINT

Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS

An Interactive-Voting Based Map Matching Algorithm

Outline: Software Design

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Simulation of Crystallographic Texture and Anisotropie of Polycrystals during Metal Forming with Respect to Scaling Aspects

HIGHER ORDER full-wave three-dimensional (3-D) large-domain techniques in

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

COMBINATION OF INTERSECTION- AND SWEPT-BASED METHODS FOR SINGLE-MATERIAL REMAP

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

RANGE DOPPLER ALGORITHM FOR BISTATIC SAR PROCESSING BASED ON THE IMPROVED LOFFELD S BISTATIC FORMULA

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Stable Road Lane Model Based on Clothoids

Transcription:

The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer Siene and Tehnology, University of Siene and Tehnology of China, Hefei, China, 3007 apriot@mail.ust.edu.n, ketang@ust.edu.n The Center of Exellene for Researh in Computational Intelligene and Appliations (CERCIA), Shool of Computer Siene, The University of Birmingham, B5 TT Birmingham, U.K. x.yao@s.bham.a.uk Abstrat. Reently, building sparse SVMs beomes an ative researh topi due to its potential appliations in large sale data mining tasks. One of the most popular approahes to building sparse SVMs is to selet a small subset of training samples and employ them as the support vetors. In this paper, we explain that seleting the support vetors is equivalent to seleting a number of olumns from the kernel matrix, and is equivalent to seleting a subset of features in the feature seletion domain. Hene, we propose to use an effetive feature seletion algorithm, namely the Minimum Redundany Maximum Relevane (MRMR) algorithm to solve the support vetor seletion problem. MRMR algorithm was then ompared to two existing methods, namely bakfitting (BF) and pre-fitting (PF) algorithms. Preliminary results showed that MRMR generally outperformed BF algorithm while it was inferior to PF algorithm, in terms of generalization performane. However, the MRMR approah was extremely effiient and signifiantly faster than the two ompared algorithms. Keywords: Relevane, Redundany, Sparse design, SVMs, Mahine learning. Introdution As a relatively new lass of learning algorithms, kernel methods have been extensively studied reently []. The underlying onept of kernel methods an be interpreted as solving learning tasks in a reproduing kernel Hilbert spae (RKHS) indued by a kernel funtion. Then, a kernel method an usually be obtained by extending some traditional learning algorithm to the RKHS. Typial kernel methods inlude the well-known support vetor mahines (SVM) [], least squares support vetor mahines (LS-SVM) [3], proximal support vetor mahines (PSVM) [4], kernel Fisher disriminant analysis (KFDA) [5], et. Among the existing kernel methods, SVM is the first invented one, and probably is the most well-known one as well. Suppose we have n training samples, {(x, y ), (x, y ),, (x n, y n )}, where x i is a d-dimensional sample vetor and y ± is the i E. Corhado and H. Yin (Eds.): IDEAL 009, LNCS 5788, pp. 84 90, 009. Springer-Verlag Berlin Heidelberg 009

The MRMR Approah to Building Sparse Support Vetor Mahines 85 orresponding lass label. The training algorithm of SVM seeks a deision hyperplane in the RKHS, whih is expeted to separate apart the samples from different lasses. Given a sample x, the SVM adopts the deision funtion in Eq. () to lassify it to either + or - lass: N SV f( x) = α yk( x, x ) + b () i= i i i where k(x, x i ) is a predefined funtion, usually known as kernel funtion, that defines the inner produt of x and x i in the RKHS. α i and b are parameters of the lassifier that are omputed during the training episode, and N sv is the number of support vetors. In the literature of SVM, a support vetor is a training sample that orresponds to a non-zero α i (α i s always take non-negative values). Although SVM has been proven to be an effetive learning approah for real-world tasks, its omputational ost is relatively high, and hene its appliation in many data mining tasks is usually prohibited. The reasons are two-fold: First, the training episode of SVM involves solving a quadrati programming (QP) problem. Briefly speaking, this proedure is arried out on a kernel matrix K: k K = k ( x, x ) L k( x, x ) M O ( ) ( ) xn, x L k xn, xn where K ij an be alulated by k(x i, x j ). That means, K is a n-by-n matrix, and thereby the omputational omplexity of solving the QP problem inreases quadratially with the number of samples. Seond, Eq. () demonstrates that the omputational omplexity of lassifying a sample inreases linearly with the number of support vetors. Sine the number of support vetors typially inreases linearly with the number of samples, the time required for lassifying a new sample (i.e., the testing phase) inreases linearly with the number of samples. Due to the above drawbaks of SVM, a lot of work has been onduted to redue its omputational ost. In partiular, most approahes aim at reduing the number of support vetors of the final SVM. Suh an SVM lassifier is referred to the sparse SVM in the literature. The existing approahes for building sparse SVM an be ategorized into three groups. The first group of methods solves the SVM first to get a deision funtion in the form of Eq. (). Then, they try to use less support vetors to approximate this deision funtion. For example, the redued set (RS) method proposed by Burges [6] [7] utilizes the gradient desent algorithm to seek a set of virtual samples for the approximation. This type of methods an only make the testing phase of SVM more effiient, while requires higher omputational ost for training. The seond type of methods aims at diretly finding a small set of virtual support vetors that provides good generalization performane. A representative of this group of methods is the sparse kernel learning algorithm (SKLA) proposed by Wu et al. [8]. Given a predefined number of support vetors, the SKLA diretly uses a gradient desent algorithm to searh for the optimal vetors that minimize the objetive funtion of SVM. Like the first type methods, the final support vetors are not neessarily training samples, but an be any virtual data points. The differene is that it does not require solving the QP problem in prior, and thus redues the training ost. Although M n ()

86 X. Yang, K. Tang, and X. Yao finding virtual support vetors has been shown to be effetive, it osts too muh time to get a good solution. Hene, quite a few researhers have tried to aelerate SVM without seeking virtual support vetors [9] [0] [] [] [3] [4]. All the third type methods follow suh a priniple. Typially, these methods aim at seeking a small set of training samples, the size of whih is smaller than the atually number of support vetors obtained by training the whole QP problem, but are still suffiient for ahieving good generalization performane. With this purpose in mind, the general methodology is to selet a subset of training samples without diretly solving the QP problem. After the seletion proess, a new QP problem with muh smaller size (typially the size is determined by the size of the sample subset) an be formulated, and the optimal α i s are then omputed by solving the new QP problem. Sine this type of methods is usually easier to implement and may not suffer from many numerial problems, they seem to be more popular than the gradient desent-based methods. For example, Keerthi et al. [3] proposed two algorithms named bak-fitting (BF) and pre-fitting (PF) approahes, respetively. In this paper, we propose an approah to selet a training sample subset as the support vetors of SVM. Speifially, we suggest that the seletion of support vetors for SVM is equivalent to the seletion of features in the feature seletion domain. Sine the feature seletion problems have been intensively investigated in the pattern reognition literature, we believe that a good feature seletion algorithm an be readily applied to selet support vetors. Following this idea, we employed the well-known Minimum Redundany Maximum Relevane (MRMR) feature seletion algorithm [5] to address the support vetor seletion problem, and ompared it with two stateof-the-art support vetor seletion algorithms, the BF and PF algorithms. Preliminary results showed that MRMR generally outperformed the BF algorithm while it was inferior to PF algorithm in terms of generalization performane. However, the MRMR approah was signifiantly faster than the two ompared algorithms. The rest of this paper is organized as follows. In Setion, we elaborate the equivalene between support vetor seletion and feature seletion, and introdue the MRMR algorithm in detail. Setion 3 presents our preliminary experimental study. Setion 4 onludes the paper and disusses potential diretions for future work. Selet Support Vetors Using MRMR Algorithm From Eq. (), we may find that the deision funtion of SVM an be written in form of T f( x) = β K % + b (3) where β=[α y, α y, α Nsv y Nsv ] T, K [ k( x, x ),..., k( x, x )] T ~ =. In feature seletion domain, we hoose a subset of features so as to better desribe the relationship between the features and label. For linear ase, we want to get the following expression: T x = w x + (4) g( ) Nsv

The MRMR Approah to Building Sparse Support Vetor Mahines 87 where x=[x,,x sub ] with x i as the seleted feature value, and w=[w w sub ] with w i as the orresponding weight. is the bias term. Absolutely, if the expression is good for desribing the data, then we an say that the features are good representatives. We an see the similarity between the Eq. (3) and Eq. (4). That means, the kernel matrix K an be viewed as a new data matrix, eah training sample x i orresponds to a row of K. If we view the ith row of K as the mapping of sample x i in a new spae, then its jth features are obtained by alulating the value of k(x i, x j ). From this view point, when we try to redue the number of support vetors, we only need to redue the number of olumns of the kernel matrix K, while keeping the number of rows unhanged. After seleting N sv appropriate olumns of K, alulating the orresponding α i s is equivalent to seeking the optimal linear deision funtion with respet to the data lying in the l-dimensional spae. In the next, we introdue the MRMR approah for seleting the olumns of K. Let K denote the ith olumn of the n-by-n kernel matrix K, the MRMR starts from alulating the relevane sore of eah K, whih is [ ( i =. i. i) ]/ (5) = F n K K σ n is the number of training samples of the th lass. K is the mean value of where the olumn K and K. i is the mean value of K within the th lass. σ is the pooled variane that an be alulated by σ = [ ( n ) σ]/( n ) (where σ is the variane of K within the th lass.) F i is the F-statisti between the ith olumn and the labels, and it is equivalent to t-statisti in ase of binary lassifiation. Based upon Eq. (5), the relevane sore of a subset of olumns (say G) an be alulated by: R F = F (6) G i G Aordingly, MRMR measures the redundany of G by R off i = (, ) off i j (7) G i, j G where off(i, j) is the Pearson orrelation oeffiient between the ith and the jth olumns of K. With these definitions, MRMR aims at finding the subset of olumns with the maximum value of R F and the minimum value of R off. This is typially done by seeking the subset with the largest RF / R off or the largest value of RF Roff in the original literature [5]. In this paper, we introdue another parameter into the MRMR. That is, we seek the subset with the largest RF λroff, where λ is a predefined parameter whih ontrols the trade-off between the relevane and redundany. MRMR employs sequential forward seletion sheme to searh for the optimal feature subset.

88 X. Yang, K. Tang, and X. Yao First, the olumn orresponding to the largest F i is seleted aording to Eq. (5). After that, the rest of the features are seleted one by one. At eah iteration, the previously unseleted features are evaluated aording to Eqs. (6) and (7), the one with the largest RF λroff is seleted. 3 Experimental Study To evaluate the effiay of MRMR algorithm for building sparse SVMs, we arried out experimental studies on seven data sets of the UCI Repository [6], namely Australian, Monks-, Heart, Mammographi, Wdb, Hill-valley, and Promoters. Sine MRMR is in nature a seletion-based algorithm, we ompared it to two state-of-theart seletion based algorithms, namely the BF and PF algorithms. The radial basis funtion was used as the kernel funtion: k ( xi, x j ) exp( xi x j ) = γ (8) As to ompare the three methods, we implemented all the three methods ourselves. The Newton method used in [3] was employed in our implementation to obtain the solution (i.e., the oeffiients β) of SVM. All parameters were tuned using 5-fold ross-validation. We tuned the parameter γ and the regularization parameter C with seletion rate equivalent to (i.e., using all the data). And then we tuned the parameter λ with seletion rate equivalent 0% using our seletion method. For eah data set, we onduted 5-fold ross-validation for 0 times. In eah run, the three algorithms were applied separately to selet 4% of the training samples to build sparse SVMs. Then, lassifiation auray of the sparse SVMs built with the three subsets was evaluated. The average auray and the standard deviations are presented in Table. Furthermore, Wiloxon signed-rank test with signifiane level equivalent to 0.05 has been onduted and the results are also presented in Table. From Table, we may find that MRMR outperformed BF algorithm on data sets, while no signifiant differene was observed on the other 5 data sets. In omparison with PF algorithm, MRMR was inferior on 3 data sets, while ahieved omparable performane on the other 4 data sets. In summary, MRMR generally outperformed BF algorithm in terms of lassifiation auray, but was generally inferior to PF algorithm. Sine the major signifiane of onstruting sparse SVMs is to extend its appliation to large size data sets, the omputational ost required to obtain the sparse SVM is also of great importane. Table summarizes the average CPU time required by the three ompared algorithms to selet support vetors on the 7 data sets. It an be observed that the MRMR is the most effiient one among the three. The runtimes of BF and PF algorithms are at least 0 times and 00 times of that of MRMR, respetively. To summarize, experimental studies demonstrated that MRMR is definitely better than BF algorithm. In omparison with PF algorithm, MRMR may lead to sparse SVMs with inferior generalization performane. However, MRMR provides signifiant omputational advantage. Hene, MRMR an be viewed as a potential alternative to PF algorithm in ase of mining large sale data sets.

The MRMR Approah to Building Sparse Support Vetor Mahines 89 Table. Average lassifiation auray (%) of 0 runs of 5-fold ross-validation proedures. The standard deviations are given in the parentheses. The last two olumns present the results of the Wiloxon signed-rank test. means MRMR outperformed the ompared algorithm, 0 indiates that there was no signifiant differene between the two ompared algorithms, and - means MRMR was outperformed by the ompared algorithm. Method Datasets MRMR PF BF MRMR vs. PF Australian 86.5(.57) 86.6(.5) 86.63(.50) 0 0 Monks- 76.0(7.76) 9.09(5.4) 77.08(4.77) - 0 Heart 83.37(4.4) 8.83(4.53) 8.6(3.80) 0 Mammographi 80.64(3.75) 80.93(3.78) 80.83(3.94) 0 0 Wdb 97.06(.59) 96.94(.44) 97.07(.44) 0 0 Hill-valley 57.89(5.68) 65.47(4.7) 5.3(4.) - Promoters 6.88(4.9) 67.88(0.8) 66.3(0.8) - 0 MRMR vs. BF Table. Runtime (seonds) of the three ompared algorithms on the seven data sets Methods MRMR PF BF Datasets Australian 0.45 00.83.865 Monks- 0.0548.644 3.8000 Heart 0.0387.8770 0.54 Mammographi 0.0978 57.046 6.6690 Wdb 0.480 35.90 5.6759 Hill-valley 0.0093 58.63 5.76 Promoters 0.00 0.395 0.0464 4 Conlusion and Disussion Sparse SVMs are usually obtained by seleting support vetors from training samples. In this paper, we explained that the support vetor seletion is equivalent to seleting a number of olumns of the kernel matrix, and proposed to employ the MRMR algorithm to solve this problem. Experimental results indiated that the omputational ost of MRMR is extremely low in omparison to two existing approahes, i.e., bakfitting (BF) and pre-fitting (PF) algorithms. Furthermore, MRMR also outperformed the BF algorithm in terms of lassifiation auray. Our urrent work an be extended in the future along two main diretions. First, sine MRMR is very effiient and PF algorithm an lead to better generalization performane, it would be interesting to investigate the possibility of ombining MRMR with the PF algorithm. Suh a ombination might lead to novel algorithms with both good generalization performane and satisfatory omputational effiieny. Seond, the relevane and redundany defined in the original MRMR algorithm might not suit the speifi senario of building sparse SVMs. Hene, it is neessary to seek alternative definitions to enhane the performane of MRMR in the ase of building sparse SVMs.

90 X. Yang, K. Tang, and X. Yao Aknowledgement This work was partially supported by the Fund for International Joint Researh Program of Anhui Siene and Tehnology Department (No. 0808070306) and a National Natural Siene Foundation of China grant (No. 6080036). Referenes. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (004). Burges, C.J.C.: A tutorial on support vetor mahines for pattern reognition. Data Mining and Knowledge Disovery, 67 (998) 3. Suykens, J.A.K., Vandewalle, J.: Least squares support vetor mahine lassifiers. Neural Proessing Letters 9, 93 300 (999) 4. Fung, G., Mangasarian, O.L.: Proximal support vetor mahine lassifiers. In: Proeedings of Knowledge Disovery and Data Mining, San Franiso, CA, New York, pp. 77 86 (00) 5. Mika, S., Rätsh, G., Weston, J., Shölkopf, B., Smola, A.J., Mueller, K.-R.: Construting desriptive and disriminative non-linear features: Rayleigh oeffiients in kernel feature spaes. IEEE Transations on Pattern Analysis and Mahine Intelligene 5(5), 63 68 (003) 6. Burges, C.J.C.: Simplified support vetor deision rules. In: Proeedings of the 3 th International Conferene on Mahine Learning, Bari, Italy, pp. 7 77 (996) 7. Burges, C.J.C., Shoelkopf, B.: Improving speed and auray of support vetor learning mahines. In: Advanes in Neural Information Proessing Systems, vol. 9, pp. 375 38. MIT Press, Cambridge (997) 8. Wu, M., Shölkoph, B., Bakir, G.: A diret method for building sparse kernel learning algorithms. Journal of Mahine Learning Researh 7, 603 64 (006) 9. Lee, Y., Mangasarian, O.L.: RSVM: redued support vetor mahines. In: CD Proeedings of the First SIAM International Conferene on Data Mining, Chiago (00) 0. Lee, Y., Mangasarian, O.L.: SSVM: A smooth support vetor mahine. In: Computational Optimization and appliations, pp. 5 (00). Lin, K., Lin, C.: A study on redued support vetor mahines. IEEE Transations on Neural Networks 4, 449 459 (003). Downs, T., Gates, K.E., Masters, A.: Exat simplifiation of support vetor solutions. Journal of Mahine Learning Researh, 93 97 (00) 3. Keerthi, S.S., Chapelle, O., DeCoste, D.: Building support vetor mahines with redued lassifier omplexity. Journal of Mahine Learning Researh 8, (006) 4. Sun, P., Yao, X.: Greedy forward seletion algorithms to sparse Gaussian proess regression. In: Proeedings of the 006 International Joint Conferene on Neural Networks (IJCNN 006), Vanouver, Canada, pp. 59 65 (006) 5. Ding, C., Peng, H.: Minimum redundany feature seletion from miroarray gene expression data. In: Proeedings of the Computational Systems Bioinformatis, pp. 53 58 (003) 6. UCI Mahine Learning Repository, http://wwws.ui.edu/~mlearn/mlrepository.html