Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Similar documents
Classifier Selection Based on Data Complexity Measures *

The Research of Support Vector Machine in Agricultural Data Classification

Support Vector Machines

Support Vector Machines

A Binarization Algorithm specialized on Document Images and Photos

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Hermite Splines in Lie Groups as Products of Geodesics

Machine Learning 9. week

Network Intrusion Detection Based on PSO-SVM

Unsupervised Learning

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Learning a Class-Specific Dictionary for Facial Expression Recognition

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Smoothing Spline ANOVA for variable screening

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Edge Detection in Noisy Images Using the Support Vector Machines

A fast algorithm for color image segmentation

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Collaboratively Regularized Nearest Points for Set Based Recognition

Classification / Regression Support Vector Machines

Active Contours/Snakes

Cluster Analysis of Electrical Behavior

Backpropagation: In Search of Performance Parameters

An Optimal Algorithm for Prufer Codes *

Fast Feature Value Searching for Face Detection

A Semi-Supervised Approach Based on k-nearest Neighbor

Virtual Machine Migration based on Trust Measurement of Computer Node

The Study of Remote Sensing Image Classification Based on Support Vector Machine

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Classifying Acoustic Transient Signals Using Artificial Intelligence

Fitting: Deformable contours April 26 th, 2018

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Face Recognition Method Based on Within-class Clustering SVM

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Lecture 5: Multilayer Perceptrons

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

An efficient method to build panoramic image mosaics

Three supervised learning methods on pen digits character recognition dataset

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification

Discriminative Dictionary Learning with Pairwise Constraints

Efficient Text Classification by Weighted Proximal SVM *

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines

LECTURE : MANIFOLD LEARNING

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

SIGGRAPH Interactive Image Cutout. Interactive Graph Cut. Interactive Graph Cut. Interactive Graph Cut. Hard Constraints. Lazy Snapping.

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

An Image Fusion Approach Based on Segmentation Region

Optimizing Document Scoring for Query Retrieval

Correlative features for the classification of textural images

Support Vector Machines. CS534 - Machine Learning

Supervised Nonlinear Dimensionality Reduction for Visualization and Classification

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Tuning of Fuzzy Inference Systems Through Unconstrained Optimization Techniques

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

High-Boost Mesh Filtering for 3-D Shape Enhancement

K-means and Hierarchical Clustering

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Pruning Training Corpus to Speedup Text Classification 1

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Biostatistics 615/815

Feature Reduction and Selection

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Machine Learning. Topic 6: Clustering

Announcements. Supervised Learning

Evolutionary Support Vector Regression based on Multi-Scale Radial Basis Function Kernel

Relevance Feedback Document Retrieval using Non-Relevant Documents

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers

Identification of a Gaussian Fuzzy Classifier

Optimal Design of Nonlinear Fuzzy Model by Means of Independent Fuzzy Scatter Partition

WIRELESS CAPSULE ENDOSCOPY IMAGE CLASSIFICATION BASED ON VECTOR SPARSE CODING.

Laplacian Eigenmap for Image Retrieval

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Japanese Dependency Analysis Based on Improved SVM and KNN

Feature Selection as an Improving Step for Decision Tree Construction

Unsupervised Learning and Clustering

A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

A Load-balancing and Energy-aware Clustering Algorithm in Wireless Ad-hoc Networks

Learning-based License Plate Detection on Edge Features

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

On Supporting Identification in a Hand-Based Biometric Framework

Adaptive Virtual Support Vector Machine for the Reliability Analysis of High-Dimensional Problems

Transcription:

Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department of Computer Scence and Engneerng Nanjng Unversty of Aeronautcs and Astronautcs, Nanjng 116, Chna Abstract Choosng approprate values for kernel parameters s one of the key problems n many kernel-based methods because the values of these parameters have sgnfcant mpact on the performances of these methods. In ths paper, a novel approach s proposed to learn the kernel parameters n kernel mnmum dstance (KMD) classfer, where the values of the kernel parameters are computed through optmzng an objectve functon desgned for measurng the classfcaton relablty of KMD. Experments on both artfcal and real-world datasets show that the proposed approach works well on learnng kernel parameters of KMD. Keywords: Kernel mnmum dstance; kernel parameter optmzaton; kernel selecton 1. Introducton Mnmum dstance (MD) and nearest neghbor (NN) are smple but popular technques n pattern recognton. Recently, both methods have been extended to kernel versons,.e. the kernel mnmum dstance (KMD) and kernel nearest neghbor (KNN), for classfyng complex and nonlnear patterns such as faces [1], []. However, lke other kernel-based methods, the performance of KMD and KNN s greatly affected by the selecton of kernel parameters values. In ths paper, we focus on optmzng the kernel parameters for KMD. In the lterature, there are two wdely used approaches n choosng the values of kernel parameters n kernel-based methods [1], [3], [4]. The frst approach emprcally chooses a seres of canddate values for the kernel parameter, executes the concerned method under these values agan * Correspondng author. Emal: zhouzh@nju.edu.cn, Tel.: +86-5-8368-668, Fax: +86-5-8368-668

and agan, and selects the one correspondng to the best performance as the fnal kernel parameter value. However, ths approach suffers from the fact that only a very lmted canddate values are consdered, therefore the performance of the kernel-based methods may not be optmzed. The second approach s the well-known cross-valdaton, whch s also wdely used n model selecton. Compared wth the frst approach, cross-valdaton often yelds better performance because t searches the optmal value for kernel parameter n a much wder range. However, performng cross-valdaton s often tme-consumng and hence t cannot be used to adjust the kernel parameters n real tme [3]. Furthermore, when there are only a lmted number of tranng examples, the cross-valdaton approach can hardly ensure robust estmaton. In ths paper, a novel approach s proposed to learn the kernel parameters n KMD. At frst an objectve functon s defned to measure the classfcaton relablty of KMD wth dfferent kernel parameters. Then, the optmal values of the kernel parameters are chosen through optmzng the above defned objectve functon. Experments on both artfcal and real-world datasets show the effect of the proposed approach on learnng kernel parameters n KMD.. Kernel mnmum dstance classfer One of the key ngredents of KMD s the defnton of kernel-nduced dstance measures. Gven a data set S { x x } =,..., 1 l sampled from the nput space X, a kernel K(x,y) and a functon Φ n a feature space satsfy K(x,y) = Φ(x) T Φ(y). An mportant property of the kernel s that t can be drectly constructed n the orgnal nput space wthout knowng the concrete form of Φ. That s, a kernel mplctly defnes a nonlnear mappng functon. There are several typcal kernels, e.g. the Gaussan kernel x y d K( x, y) = exp, the polynomal kernel Kxy (, ) = ( xy T + 1), etc. The kernel-nduced dstance between two ponts defned by a kernel K s shown n Eq. (1). d x y x y K x x K x y K y y (, ) = Φ( ) Φ ( ) = (, ) (, ) + (, ). (1) Suppose the tranng data set S contans c dfferent classes,.e. S1, S,..., Sc, and each class S has l samples, satsfyng under the map Φ, and denote the centre of c l = l. Let ( S ) { ( x ) x } j j = 1 Φ = Φ S be the mage of class Φ ( S ) as S

1 Φ S = ( x ) Φ j. () l xj S Then, the dstance between the mage of a new pont x and the centre of class Φ S can be computed as d ( Φ( x), Φ ) = Φ( x) Φ S S T T T =Φ ( x) Φ ( x) +Φ Φ Φ ( x) Φ S S S 1 = K( xx, ) + Kx (, x) Kxx (, ) j k j l xj, xk S l xj S (3) Accordng to Eq. (3), the classfcaton rule n KMD s to assgn the new pont x to the class wth the smallest dstance: 1 c { } hx ( ) = argmn d ( Φ( x), Φ ) (4) S 3. The proposed method The followng objectve functon s defned to measure the classfcaton relablty of KMD wth dfferent kernel parameters: J d ( Φ( x ), Φ ) π ( ) (5) ( d Φ x ΦS ) j l S ( θ ) = exp = 1 mn ( ( ), ) 1 j c j π () Here θ denotes the kernel parameters, and π () denotes the class label of x. The ntuton behnd Eq. (5) s to make the dstance between the mage of a sample and the centre of ts correspondng class as small as possble, whle to make the dstance between the mage of the sample to other classes as large as possble. The smaller the value of the objectve functon, the hgher the classfcaton relablty. Here the exponental functon s used for speedng up the convergence of optmzaton. Note that when d ( ( x), S ) mn ( d ( ( x ), )) π ( ) S Φ Φ < Φ Φ j, the sample x s correctly classfed. 1 j c j π () Equaton (5) specfes that the optmal value for a kernel parameter should not only correctly classfy the tranng data, but also make the classfcaton relablty as hgh as possble. In the extreme case where d ( Φ ( x ), Φ ) = Sπ and mn ( d ) ( Φ( x ), ) () Φ S j = for each x, the hghest classfcaton relablty s obtaned. 1 j c j π () 3

The optmal values of the kernel parameters can be obtaned through mnmzng Eq. (5),.e. In ths paper, an teratve algorthm s employed to generate * θ = arg mn J ( θ ). (6) θ * θ. Accordng to the general gradent method, the updatng equaton for mnmzng the objectve functon J s gven by Where η s the learnng rate and ( n+ 1) ( n) J θ = θ + η θ n s the teraton step. (7) The proposed method KMD-opt s summarzed as follows: Step 1. Set the learnng rateη and the maxmum teraton number N, and set ε to a very small postve number. Step. Intalze the kernel parameters θ () = θ and set the teraton step n =. Step 3. Update the kernel parameters ( n) θ usng Eq. (7). Step 4. If ( n+ 1) ( n) θ θ < ε or n N, stop. Otherwse, set n= n+ 1, goto Step 3. 4. Experments Ths secton evaluates the effectveness of the proposed KMD-opt method. For comparson, the MD and KMD are also tested. An artfcal data set Crcles, as shown n Fg. 1, and two real-world data sets Bupa and Pd from UCI Machne Learnng Repostory [5] are used. For each data set, half of data are used as the tranng data set, whle the remanng data are used as the test data set. The kernel used n the experments s the Gaussan kernel x y K( x, y) = exp, where s the kernel parameter that should be optmzed. In ths paper, f wthout explct explanatons, the ntal value for the kernel parameter s set to l 1 x j 1 j x = = c l, where x s the centrod of the total l tranng data. Specfcally, the values for Crcles, Bupa and Pd are.33, 17.6 and 43.48 respectvely. The learnng rate η s set to.5 and ε s set to.1 wthout extra explanatons. Table 1 shows the test accuraces of MD, KMD and KMD-opt. The values are also presented. Table 1 shows that n most cases KMD obtans better test accuracy than MD, but when the kernel parameter s not chosen approprately ts performance deterorates greatly. In all cases, 4

KMD-opt acheves the best test accuracy. What s more, from Table 1, t can be found the KMD-opt method s qute robust because on every data set, the fnal s t produced are almost wth the same value although the method s wth dfferent ntalzatons. As an example, the left part of Fg. plots the test accuracy of KMD under a seres of values on Bupa. It verfes the clam that a good performance of KMD greatly depends on the selecton of kernel parameters. The rght part of Fg. plots the objectve functon n Eq. (5) under a seres of values on Bupa. It can be seen from Fg. that the objectve functon reaches ts mnmum at smlar values as those at whch KMD acheves ts hghest accuracy. 5. Conclusons In ths paper, a novel approach for learnng the kernel parameters s proposed and successfully appled to the kernel mnmum dstance (KMD) classfer. An objectve functon s defned to measure the classfcaton relablty of KMD wth dfferent kernel parameters, and then the optmal values of the kernel parameters are obtaned by optmzng the objectve functon. Experments show the effect of the proposed approach on learnng kernel parameters n KMD. In future works, the proposed approach wll be extended for other kernel-based learnng methods such as support vector machne (SVM) and kernel fsher dscrmnate (KFD). References [1] J. Peng, D.R. Hesterkamp, H.K. Da, Adaptve quasconformal kernel nearest neghbor classfcaton, IEEE Trans. PAMI 6 (5) (4) 656-661. [] J. Shawe-Taylor, N. Crstann, Kernel methods for pattern analyss, Cambrdge Unversty Press, 4. [3] L. Wang, K.L. Chan, Learnng kernel parameters by usng class separablty measure, NIPS Workshop on Kernel Machnes, Canada,. [4] D.Q. Zhang, S.C. Chen, Clusterng ncomplete data usng kernel-based fuzzy c-means algorthm, Neural Processng Letters 18(3) (3) 155-16. [5] C. Blake, E. Keogh, and C.J. Merz, UCI repostory of machne learnng databases [http://www.cs.uc.edu/~mlearn/mlrepostory.html], Department of Informaton and Computer Scence, Unversty of Calforna, Irvne, CA, 1998. 5

Fg. 1. The Crcles data set Fg.. Test accuracy (left) and objectve functon values (rght) under a seres of values on Bupa. Table 1. Comparsons of test accuracy (%) of MD, KMD and KMD-opt (the values n the brackets denote the values at convergence). Data sets MD KMD KMD-opt 3 / /3 3 / /3 Crcles 5 1 98 5 1 1 1(3.43) 1(3.43) 1(3.43) 1(3.43) 1(3.43) Bupa 59.43 68 61.14 57.14 66.9 65.14 69.14(16.55) 69.14(16.55) 69.14(16.55) 69.14(16.54) 69.14(16.54) Pd 6.5 65.1 58.7 5.78 64.3 64.84 65.63(41.45) 65.63(41.45) 65.63(41.45) 65.63(41.45) 65.63(41.45) 6