SELF-ORGANIZING methods such as the Self-
|
|
- Maria Watkins
- 6 years ago
- Views:
Transcription
1 Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Maximal Margin Learning Vector Quantisation Trung Le, Dat Tran, Van Nguyen, and Wanli Ma Abstract Kernel Generalised Learning Vector Quantisation (KGLVQ) was proposed to extend Generalised Learning Vector Quantisation into the kernel feature space to deal with complex class boundaries and thus yielded promising performance for complex classification tasks in pattern recognition. However KGLVQ does not follow the maximal margin principle, which is crucial for kernel-based learning methods. In this paper we propose a maximal margin approach (MLVQ) to the KGLVQ algorithm. MLVQ inherits the merits of KGLVQ and also follows the maximal margin principle to improve the generalisation capability. Experiments performed on the wellknown data sets available in UCI repository show promising classification results for the proposed method. I. INTRODUCTION SELF-ORGANIZING methods such as the Self- Organizing Map (SOM) or Learning Vector Quantisation (LVQ) introduced by Kohonen [8] provide a successful and intuitive method of processing data for easy access [6]. LVQ aims at generating the prototypes or reference vectors which delegate for the data of classes [7]. Although LVQ is a fast and simple learning algorithm, sometimes its prototypes diverge and, as a result, degrade its recognition ability [12]. To address this problem, Generalised Learning Vector Quantisation (GLVQ) [12] was proposed. It is a generalisation of the original model proposed by Kohonen, where the prototypes are updated based on the steepest descent method to minimise a cost function. GLVQ has been widely applied to and also shown good performance in many applications [9], [11], [12]. However, its performance deteriorates for complex data sets since pattern classes with nonlinear class boundaries usually need a large number prototypes. It thus becomes very difficult to determine the reasonable number of prototypes and their positions to achieve a good generalisation performance [10]. To overcome this drawback, in [10] Kernel Generalised Learning Vector Quantisation (KGLVQ) was proposed, which learns the prototypes of data in the feature space. Like LVQ and GLVQ, KGLVQ can be used for two class and multi-class classification problems. In the case of two-class classification problems, the entire feature space is divided into subspaces induced by two core prototypes. In each subspace, a mid-perpendicular hyperplane of the two core prototypes was employed to classify the data. However, the hyperplanes of KGLVQ do not guarantee maximising margins, which is crucial for kernel methods [13], [14], [15]. Trung Le and Van Nguyen are with Faculty of Information Technology, HCMc University of Pedagogy, Hochiminh city, Vietnam ( {trunglm, vannk}@hcmup.edu.vn). Dat Tran and Wanli Ma are with Faculty of Education, Science, Technology and Mathematics, University of Canberra, Australia ( {dat.tran, wanli.ma}@canberra.edu.au). In this paper, we propose a maximal margin approach to KGLVQ, and we name it MLVQ. It takes the advantage of maximising margins to improve the generalisation capability, as seen in Support Vector Machine [3], [1]. MLVQ is different from the approach in [4], which aims at maximising the hypothesis margin rather than the real margin. In our approach, a finite number of prototypes m and n are used to represent the positive and the negative classes, respectively, in a binary data set. The entire feature space is divided into m n subspaces, which are induced by the permutation pairs of the prototypes. In each subspace, a mid-perpendicular hyperplane of two correspondent prototypes is employed to classify the data. The cost function in our approach takes into account maximising the margins of hyperplanes to boost the generalisation capability. Experiments performed on 9 data sets in UCI repository show a promising performance of the proposed method. II. MAXIMAL MARGIN KERNEL GENERALISED LEARNING VECTOR QUANTISATION A. Introduction Consider a binary training set X = {(x 1, y 1 ), (x 2, y 2 ),..., (x l, y l )}, where x 1, x 2,..., x l R d are data points and y 1, y 2,..., y l { 1, 1} are labels. This training set is mapped into a high dimensional space namely feature space through a function φ(.). Based on the idea of Vector Quantisation (VQ), m prototypes A 1, A 2,..., A m of the positive class and n prototypes B 1, B 2,..., B n of the negative class will be discovered in the feature space. The classification is based on the minimum distance to the prototypes in each class. More precisely, given a new vector x the decision function is as follows: ( f(x) = sign φ(x) b j0 2 φ(x) a i0 2) (1) { where i 0 = arg min φ(x) a i 2}, j 0 = { 1 i m arg min φ(x) b j 2}, and a i, b j are coordinates of 1 j n A i, B j, i = 1,..., m; j = 1,..., n, respectively. B. Optimisation Problem Given a labeled training vector (x, y), let us denote a and b as two prototypes of the positive class and negative class which are, respectively, closest to φ(x). Let µ(x, a, b) be the function which satisfies the following criterion: if x is correctly classified, µ(x, a, b) < 0; otherwise µ(x, a, b) 0. Let g be a monotonically increasing function. To improve the error rate, µ(x, a, b) should decrease for all training vectors /13/$ IEEE 1668
2 Therefore, the criterion is formulated as minimising of the following function: min l {A},{B} ( ( g µ x i, a (i), b (i))) (2) where {A} and {B} are the sequences {A 1, A 2,..., A m } and {B 1, B 2,..., B n }, respectively, and a (i) and b (i) are two prototypes of two classes which are closest to φ(x i ). C. Solution Assuming that the prototypes are linear expansions of vectors φ(x 1 ), φ(x 2 ),..., φ(x l ), let us denote a i, i = 1,..., m and b j, j = 1,..., n as the coordinates of the prototypes: a i = b j = l k=1 l k=1 u ik φ(x k ), v jk φ(x k ), For convenience, if c = l i = 1,..., m j = 1,..., n (3) u i φ(x i ), we rewrite c as c = [u 1, u 2,..., u l ] = [u i ],...,l. Given a labeled training vector (x, y), firstly we determine two closest prototypes A and B for two classes with respect to x, and secondly we use gradient descent method to update the coordinates a and b of A and B, respectively, as follows: a = a α g a b = b α g b (4) We now introduce the algorithm for Vector Quantisation Support Vector Machine. ALGORITHM FOR VECTOR QUANTISATION SUPPORT VECTOR MACHINE Initialise Using C-Means or Fuzzy C-Means clustering to find out m prototypes for positive class and n prototypes for negative class in the input space Set t = 0 and i = 0 Repeat t = t + 1 i = (i + 1) mod l A t = A i0 where i 0 = arg min { φ(x i ) a k 2} 1 k m B t = B j0 where j 0 = arg min { φ(x i ) b k 2} 1 k n a i0 b j0 Update a i0 = a i0 α g Update b j0 = b j0 α g Until convergence is reached where the function g = g(µ, t) depends on learning 1 time t. The sigmoid function g (µ, t) = 1+e is a good µt candidate for g. If this sigmoid function is applied then = tg (µ, t) (1 g(µ, t)). g D. Selection of the µ-function We introduce some candidates for the µ-function. Let (x, y) be a labeled training vector, and a & b are two closest prototypes in the two classes to the vector. CANDIDATE 1 FOR THE µ-function [8] (LVQ) µ(x, a, b) = y ( φ(x) a 2 φ(x) b 2) = y(d 1 d 2 ) = η(d 1, d 2 ) CANDIDATE 2 FOR THE µ-function [12] (GLVQ) µ(x, a, b) = y( φ(x) a 2 φ(x) b 2 ) φ(x) a 2 + φ(x) b 2 = y(d1 d2) d 1+d 2 = η(d 1, d 2 ) where d 1 and d 2 in (5) and (6) are distances from φ(x) to the two prototypes a and b, respectively. These functions depend primarily on d 1 and d 2. The formula for adaptation of prototypes in (4) can be rewritten as follows a = a 2α g η η d 1 (a φ(x)) b = b 2α g η η d 2 (b φ(x)) If µ(x, a, b) = η(d 1, d 2 ) = y(d 1 d 2 ), the equations in (7) become: (5) (6) (7) a = a 2α g η y (a φ(x)) b = b + 2α g η y (b φ(x)) (8) If µ(x, a, b) = η(d 1, d 2 ) = y(d1 d2) d 1+d 2, the equations in (7) become: a = a α g η b = b + α g η 4yd 2 (a φ(x)) (d 1+d 2) 2 4yd 1 (b φ(x)) (d 1+d 2) 2 CANDIDATE 3 FOR THE µ-function [4] (HMLVQ) µ(x, a, b) = 1 y ( φ(x) a φ(x) b ) (10) 2 This µ-function refers to hypothesis margin in [4] and is used in AdaBoost [5]. The hypothesis margin measures how much the hypothesis can travel before it hits an instance as shown in Figure 1. The partial derivatives of µ with respect to a and b are: a = b = y 2 φ(x) a (9) (φ(x) a) y 2 φ(x) b (φ(x) b) (11) CANDIDATE 4 FOR µ-function (MLVQ) This is our proposed maximal margin approach MLVQ. The µ-function is of the form µ(x, a, b) = y( φ(x) a 2 φ(x) b 2 ) a b (12) It is noted that by referring to Theorem 1 in Appendix A, the absolute value of the µ-function in Candidate 4 is the sample margin at φ(x) in Figure 1, and it is also the distance from φ(x) to mid-perpendicular hyperplane of prototypes a and b. When x is correctly classified, this value is equal to the 1669
3 negative sample margin at x. Minimising µ(x, a, b) motivates maximising the sample margin at x. The partial derivatives of µ with respect to a and b are: a = b = 2y a b (φ(x) a) y( φ(x) a 2 φ(x) b 2 ) a b 3 2y a b (φ(x) b) + y( φ(x) a 2 φ(x) b 2 ) a b 3 (13) Fig. 1. (a) Hypothesis Margin, (b) Sample Margin margin at φ(x i ), and d i = φ(x i ) a 1 2 φ(x i ) b 1 2. To make it simple, let us consider a separable case, i.e., all vectors φ(x i ) are correctly classified by the hyperplane H. l The objective function becomes ( dis(φ(x i ), H)). The solving optimisation problem is transformed to the following: ( l ) ( l ) min ( dis(φ(x i ), H)) or max dis(φ(x i ), H) a 1,b 1 a 1,b 1 (16) The above objective function is the sum of all sample margins at vectors, not the margins in the original Support Vector Machine (SVM). However, it was regarded in a variation of SVM [2]. Therefore, when using 2 prototypes, 1 each for the positive and negative classes, the objective function of the proposed model closely relates to the margin of SVM. Furthermore, with m prototypes to delegate the positive class and n for the negative class, the entire space are divided into m n subspaces (receptive fields), and for each receptive field the hyperplane induced by two correspondent prototypes is used for classifying the data as shown in Figure 3. Since the objective function (2) is minimised, the margins in the corresponding fields tend to be maximised. E. Decision Function When a convergence is reached, we achieve the final prototypes a i = [u ik ] k=1,...,l, i = 1,..., m and b j = [v jk ] k=1,...,l, j = 1,..., n. For a new vector x, we can calculate the distances from φ(x) to the prototypes using the following: d(φ(x), a i ) = φ(x) a i 2 = K(x, x) 2 l u ip K(x p, x) + a i 2, p=1 d(φ(x), b j ) = φ(x) b j 2 = K(x, x) 2 l v jp K(x p, x) + b j 2, p=1 i = 1,..., m j = 1,..., n (14) The two closest prototypes to φ(x) and the decision function will be determined as follows i 0 = arg min{d(φ(x), a i )} 1 i m j 0 = arg min{d(φ(x), b j )} 1 j n f(x) = sign (d(φ(x), b j0 ) d(φ(x), a i0 )) F. The Rational of the Proposed Margin Approach (15) In this section, we discuss the rationale and the advantage of our proposed method. First, we consider the case when the numbers of prototypes for both positive and negative classes are set to 1 as shown in Figure 2. Assuming that g(µ, t) = µ is applied, by referring to Theorem 1 in Appendix A, the objective function in (2) becomes l (y i dis(φ(x i ), H)sign(d i )), where H is hyperplane induced by positive prototype a 1 and negative prototype b 1, dis(φ(x i ), H) stands for distance from φ(x i ) to H or sample Fig. 2. One positive and one negative prototype are used to classify the data set. Fig. 3. Two positive and two negative prototypes are used to classify the data set. A. Data sets III. EXPERIMENTAL RESULTS We conducted experiments on 9 data sets of UCI repository. Details of data sets are shown in Table I. The LVQ algorithms with different µ-functions mentioned above were performed in both the input and feature spaces to compare 1670
4 TABLE I NUMBER OF SAMPLES IN 9 DATA SETS. #POSITIVE: NUMBER OF POSITIVE SAMPLES, #NEGATIVE: NUMBER OF NEGATIVE SAMPLES AND d: DIMENSION. Data set #positive #negative d Astroparticle Australian Breast Cancer Fourclass Ionosphere Liver Disorders SvmGuide USPS Wine LVQ, GLVQ and HMLVQ with our proposed MLVQ in the input space and also compare kernel LVQ, kernel GLVQ and kernel HMLVQ with MLVQ in the kernel feature spaces. We also make comparison of MLVQ with SVM. B. Parameter Settings In our experiment, we did not use the sigmoid function g(µ, t) = 1 1+e which results in the derivative g µt = tg(1 g). The derivative of this function rapidly decreases to 0 when the time t approaches +. For example when t = 100, the derivative is nearly equal to 0 if 0.1 < µ < Instead, we applied g(µ, t) = whose derivative is 1+e µ t g t = tg(1 g). This function shows two good features as seen in Figures 4 and 5: 1) Its derivative approaches to 0 slower than that of the sigmoid function. 2) Given t, if µ of a vector exceeds a predefined threshold then the derivative or the rate at this vector is very small and the adaptation is minor. used. The learning rate α was set to Both the numbers of positive and negative prototypes were searched in the grid {1, 2, 3}. For Kernel LVQs and SVM, the popular RBF kernel function K(x, x ) = e γ x x 2 was used. The parameter γ was ranged in the grid {2 k : k = 2l + 1, l = 8, 6,..., 1}. For SVM, the trade-off parameter C was searched in the grid {2 k : k = 2l + 1, l = 8, 6,..., 2}. Experimental results are displayed in Tables II, III and Figures 6, 7. It is shown that our MLVQ method performed very well in the input space. It also happens that the kernel models always produce better performance in comparison to the correspondent models in the input space. It is reasonable since the data tend to be more compact in the feature space. Therefore, a few of prototypes are sufficient to delegate each class. The experiment also points out that MLVQ in the kernel feature spaces and SVM are comparable, however MLVQ is preferable because it is simpler and does not require searching over a large-scale range of parameters like SVM. TABLE II CLASSIFICATION RESULTS (IN %) ON 9 DATA SETS FOR THE 4 INPUT SPACE MODELS LVQ, GLVQ, HMLVQ AND MLVQ. Data set LVQ GLVQ HMLVQ MLVQ Astroparticle 66% 68% 70% 84% Australian 82% 82% 83% 85% Breast Cancer 94% 95% 95% 96% Fourclass 90% 90% 93% 88% Ionosphere 70% 69% 71 % 84% Liver Disorders 60% 61% 62% 64% SvmGuide3 74% 76% 74% 65% USPS 74% 74% 73% 95% Wine 91% 94% 90% 93% TABLE III CLASSIFICATION RESULTS (IN %) ON 9 DATA SETS FOR THE 4 kernel FEATURE SPACE MODELS LVQ, GLVQ, HMLVQ, MLVQ AND SVM. Fig. 4. The graph of the derivative of sigmoid function. Data set LVQ GLVQ HMLVQ MLVQ SVM Astroparticle 86% 89% 89% 95% 96% Australian 83% 82% 84% 88% 86% Breast Cancer 96% 97% 97% 97% 96% Fourclass 98% 99% 100% 99% 98% Ionosphere 92% 93% 92% 95% 93% Liver Disorders 62% 62% 64% 66% 60% SvmGuide3 75% 77% 76% 79% 76% USPS 82% 83% 82% 98% 98% Wine 96% 98% 95% 99% 99% Fig. 5. The graph of the derivative of new sigmoid function. To evaluate accuracies, cross validation with 5 folds were IV. CONCLUSION In this paper, we have introduced MLVQ, a new maximal margin approach to Kernel Generalised Learning Vector Quantisation. MLVQ maximises the real margin which is crucial for kernel method and can be applied to both the input space and feature space. The experiments conducted on the 9 data sets in UCI repository demostrate good performance of MLVQ in both the input space and feature space. 1671
5 Fig. 8. The formula to evaluate margin. Fig. 6. Classification results (in %) on 9 data sets for the 4 input space models LVQ, GLVQ, HMLVQ (HM) and MLVQ (SM). partial derivatives of µ with respect to a and b are a = b = 2y a b (φ(x) a) y( φ(x) a 2 φ(x) b 2 ) a b 3 2y a b (φ(x) b) + y( φ(x) a 2 φ(x) b 2 ) a b 3 (19) Fig. 7. Classification results (in %) on 9 data sets for the 4 kernel feature space models: kernel LVQ (KLVQ), kernel GLVQ (KGLVQ), kernel HMLVQ (KHM), kernel MLVQ (KSM) and SVM. APPENDIX Theorem 1 Let M, A and B be the points in the affine space R d. Let (H) : w T x + b = 0 be the mid-perpendicular hyperplane of segment AB. The following equality holds: Margin(M, H) = MA 2 MB 2 2AB where the sample margin Margin(M, H) is distance from point M to hyperplane (H). PROOF MA 2 MB 2 = MA 2 MB 2 = ( MA MB)( MA + MB) = 2 BA. MI = 2BA( MH + HI) (17) where I is the midpoint of segment AB, H is projected point of M to hyperplane (H) as in Figure 8. Since HI is orthogonal to BA and MH is parallel to BA, we have: MA 2 MB 2 = 2 BA. MH = 2BA.MH = 2BA.M argin(m, H) Corollary 1 If µ(x, a, b) = y( φ(x) a 2 φ(x) b 2 ) a b (18) then the PROOF a = 2y(a φ(x)) a b y( φ(x) a 2 φ(x) b 2 ) 2(a b) 2 a b a b 2 = 2y a b (φ(x) a) y( φ(x) a 2 φ(x) b 2 ) a b 3 (20) b = 2y(b φ(x)) a b y( φ(x) a 2 φ(x) b 2 ) 2(b a) 2 a b a b 2 = 2y a b (φ(x) b) + y( φ(x) a 2 φ(x) b 2 ) a b 3 (21) REFERENCES [1] C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2: , [2] Colin Campbell and Kristin P. Bennett. A linear programming approach to novelty detection, [3] C. Cortes and V. Vapnik. Support-vector networks. In Machine Learning, pages , [4] K. Crammer, R. Gilad-bachrach, A. Navot, and N. Tishby. Margin analysis of the lvq algorithm. In Advances in Neural Information Processing Systems 2002, pages , [5] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1): , [6] B. Hammer and T. Villmann. Generalized relevance learning vector quantization. Neural Networks, 15: , [7] T. Kohonen. Self-organization and associative memory: 3rd edition. Springer-Verlag, [8] T. Kohonen. Learning vector quantization. The Handbook of Brain Theory and Neural Networks, pages , [9] C-L. Liu and M. Nakagawa. Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition. Pattern Recognition, 34(3): ,
6 [10] A. K. Qinand and P. N. Suganthan. A novel kernel prototype-based learning algorithm. In ICPR, pages , [11] A. Sato. Discriminative dimensionality reduction based on generalized lvq. In ICANN, pages 65 72, [12] A. Sato and K. Yamada. Generalized learning vector quantization. In NIPS, pages , [13] B. Schölkopf and A.J. Smola. Learning with kernels : support vector machines, regularization, optimization, and beyond. The MIT Press, 2 edition, [14] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, [15] V. Vapnik. The Nature of Statistical Learning Theory. Springer, 2 edition,
A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines
A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines Trung Le, Dat Tran, Wanli Ma and Dharmendra Sharma Faculty of Information Sciences and Engineering University of Canberra, Australia
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationRelevance Determination for Learning Vector Quantization using the Fisher Criterion Score
17 th Computer Vision Winter Workshop Matej Kristan, Rok Mandeljc, Luka Čehovin (eds.) Mala Nedelja, Slovenia, February 1-3, 2012 Relevance Determination for Learning Vector Quantization using the Fisher
More informationBagging and Boosting Algorithms for Support Vector Machine Classifiers
Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationEfficient Case Based Feature Construction
Efficient Case Based Feature Construction Ingo Mierswa and Michael Wurst Artificial Intelligence Unit,Department of Computer Science, University of Dortmund, Germany {mierswa, wurst}@ls8.cs.uni-dortmund.de
More informationTable of Contents. Recognition of Facial Gestures... 1 Attila Fazekas
Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics
More informationKBSVM: KMeans-based SVM for Business Intelligence
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2004 Proceedings Americas Conference on Information Systems (AMCIS) December 2004 KBSVM: KMeans-based SVM for Business Intelligence
More informationEfficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationRule extraction from support vector machines
Rule extraction from support vector machines Haydemar Núñez 1,3 Cecilio Angulo 1,2 Andreu Català 1,2 1 Dept. of Systems Engineering, Polytechnical University of Catalonia Avda. Victor Balaguer s/n E-08800
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationThe Effects of Outliers on Support Vector Machines
The Effects of Outliers on Support Vector Machines Josh Hoak jrhoak@gmail.com Portland State University Abstract. Many techniques have been developed for mitigating the effects of outliers on the results
More informationTraining Data Selection for Support Vector Machines
Training Data Selection for Support Vector Machines Jigang Wang, Predrag Neskovic, and Leon N Cooper Institute for Brain and Neural Systems, Physics Department, Brown University, Providence RI 02912, USA
More informationUse of Multi-category Proximal SVM for Data Set Reduction
Use of Multi-category Proximal SVM for Data Set Reduction S.V.N Vishwanathan and M Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India Abstract.
More informationFast Support Vector Machine Classification of Very Large Datasets
Fast Support Vector Machine Classification of Very Large Datasets Janis Fehr 1, Karina Zapién Arreola 2 and Hans Burkhardt 1 1 University of Freiburg, Chair of Pattern Recognition and Image Processing
More informationSupport vector machines. Dominik Wisniewski Wojciech Wawrzyniak
Support vector machines Dominik Wisniewski Wojciech Wawrzyniak Outline 1. A brief history of SVM. 2. What is SVM and how does it work? 3. How would you classify this data? 4. Are all the separating lines
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationKernel-based online machine learning and support vector reduction
Kernel-based online machine learning and support vector reduction Sumeet Agarwal 1, V. Vijaya Saradhi 2 andharishkarnick 2 1- IBM India Research Lab, New Delhi, India. 2- Department of Computer Science
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationGenerating the Reduced Set by Systematic Sampling
Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationGlobal Metric Learning by Gradient Descent
Global Metric Learning by Gradient Descent Jens Hocke and Thomas Martinetz University of Lübeck - Institute for Neuro- and Bioinformatics Ratzeburger Allee 160, 23538 Lübeck, Germany hocke@inb.uni-luebeck.de
More informationDiscriminative classifiers for image recognition
Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationOpinion Mining by Transformation-Based Domain Adaptation
Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose
More informationFeature scaling in support vector data description
Feature scaling in support vector data description P. Juszczak, D.M.J. Tax, R.P.W. Duin Pattern Recognition Group, Department of Applied Physics, Faculty of Applied Sciences, Delft University of Technology,
More informationSoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification
SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification Thomas Martinetz, Kai Labusch, and Daniel Schneegaß Institute for Neuro- and Bioinformatics University of Lübeck D-23538 Lübeck,
More informationEnsembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier 3 super classifier result 1 * *A model is the learned
More informationKernel Methods and Visualization for Interval Data Mining
Kernel Methods and Visualization for Interval Data Mining Thanh-Nghi Do 1 and François Poulet 2 1 College of Information Technology, Can Tho University, 1 Ly Tu Trong Street, Can Tho, VietNam (e-mail:
More informationLeave-One-Out Support Vector Machines
Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationLocal Linear Approximation for Kernel Methods: The Railway Kernel
Local Linear Approximation for Kernel Methods: The Railway Kernel Alberto Muñoz 1,JavierGonzález 1, and Isaac Martín de Diego 1 University Carlos III de Madrid, c/ Madrid 16, 890 Getafe, Spain {alberto.munoz,
More informationDECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe
DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES Fumitake Takahashi, Shigeo Abe Graduate School of Science and Technology, Kobe University, Kobe, Japan (E-mail: abe@eedept.kobe-u.ac.jp) ABSTRACT
More informationCGBoost: Conjugate Gradient in Function Space
CGBoost: Conjugate Gradient in Function Space Ling Li Yaser S. Abu-Mostafa Amrit Pratap Learning Systems Group, California Institute of Technology, Pasadena, CA 91125, USA {ling,yaser,amrit}@caltech.edu
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationLab 2: Support vector machines
Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationVersion Space Support Vector Machines: An Extended Paper
Version Space Support Vector Machines: An Extended Paper E.N. Smirnov, I.G. Sprinkhuizen-Kuyper, G.I. Nalbantov 2, and S. Vanderlooy Abstract. We argue to use version spaces as an approach to reliable
More informationOne-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所
One-class Problems and Outlier Detection 陶卿 Qing.tao@mail.ia.ac.cn 中国科学院自动化研究所 Application-driven Various kinds of detection problems: unexpected conditions in engineering; abnormalities in medical data,
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More informationBagging for One-Class Learning
Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationA Practical Guide to Support Vector Classification
A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan
More informationLab 2: Support Vector Machines
Articial neural networks, advanced course, 2D1433 Lab 2: Support Vector Machines March 13, 2007 1 Background Support vector machines, when used for classication, nd a hyperplane w, x + b = 0 that separates
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationFUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION
FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural
More informationMinimum Risk Feature Transformations
Minimum Risk Feature Transformations Shivani Agarwal Dan Roth Department of Computer Science, University of Illinois, Urbana, IL 61801 USA sagarwal@cs.uiuc.edu danr@cs.uiuc.edu Abstract We develop an approach
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationBinarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines
2011 International Conference on Document Analysis and Recognition Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines Toru Wakahara Kohei Kita
More informationClustering with Reinforcement Learning
Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of
More informationUsing Analytic QP and Sparseness to Speed Training of Support Vector Machines
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Microsoft Research 1 Microsoft Way Redmond, WA 9805 jplatt@microsoft.com Abstract Training a Support Vector Machine
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationFuzzy-Kernel Learning Vector Quantization
Fuzzy-Kernel Learning Vector Quantization Daoqiang Zhang 1, Songcan Chen 1 and Zhi-Hua Zhou 2 1 Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics Nanjing
More information12 Classification using Support Vector Machines
160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.
More informationSupport Vector Machines
Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing
More informationSupport vector machines
Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest
More informationFace Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN
2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine
More informationUnivariate Margin Tree
Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,
More informationFunction approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationSecond Order SMO Improves SVM Online and Active Learning
Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms
More informationHierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs
Hierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs Kaan Ataman and W. Nick Street Department of Management Sciences The University of Iowa Abstract Many real-world
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationDynamic Ensemble Construction via Heuristic Optimization
Dynamic Ensemble Construction via Heuristic Optimization Şenay Yaşar Sağlam and W. Nick Street Department of Management Sciences The University of Iowa Abstract Classifier ensembles, in which multiple
More informationRecursive Similarity-Based Algorithm for Deep Learning
Recursive Similarity-Based Algorithm for Deep Learning Tomasz Maszczyk 1 and Włodzisław Duch 1,2 1 Department of Informatics, Nicolaus Copernicus University Grudzia dzka 5, 87-100 Toruń, Poland 2 School
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationCS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes
1 CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview
More informationNEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE)
NEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE) University of Florida Department of Electrical and
More informationSupport Vector Machines
Support Vector Machines VL Algorithmisches Lernen, Teil 3a Norman Hendrich & Jianwei Zhang University of Hamburg, Dept. of Informatics Vogt-Kölln-Str. 30, D-22527 Hamburg hendrich@informatik.uni-hamburg.de
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions
More informationUsing Decision Boundary to Analyze Classifiers
Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision
More informationComparison of different preprocessing techniques and feature selection algorithms in cancer datasets
Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract
More informationEfficient Pruning Method for Ensemble Self-Generating Neural Networks
Efficient Pruning Method for Ensemble Self-Generating Neural Networks Hirotaka INOUE Department of Electrical Engineering & Information Science, Kure National College of Technology -- Agaminami, Kure-shi,
More informationSVM Classification in Multiclass Letter Recognition System
Global Journal of Computer Science and Technology Software & Data Engineering Volume 13 Issue 9 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
More informationRobot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning
Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge
More informationAn Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm
Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationA Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 3 March 2017, Page No. 20765-20769 Index Copernicus value (2015): 58.10 DOI: 18535/ijecs/v6i3.65 A Comparative
More informationSupervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel
More informationMore Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA
More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
More informationClient Dependent GMM-SVM Models for Speaker Verification
Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)
More informationRobust 1-Norm Soft Margin Smooth Support Vector Machine
Robust -Norm Soft Margin Smooth Support Vector Machine Li-Jen Chien, Yuh-Jye Lee, Zhi-Peng Kao, and Chih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University
More informationENSEMBLE RANDOM-SUBSET SVM
ENSEMBLE RANDOM-SUBSET SVM Anonymous for Review Keywords: Abstract: Ensemble Learning, Bagging, Boosting, Generalization Performance, Support Vector Machine In this paper, the Ensemble Random-Subset SVM
More informationJ. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998
Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science
More informationFeature Selection in a Kernel Space
Bin Cao Peking University, Beijing, China Dou Shen Hong Kong University of Science and Technology, Hong Kong Jian-Tao Sun Microsoft Research Asia, 49 Zhichun Road, Beijing, China Qiang Yang Hong Kong University
More informationSupport Vector Machines + Classification for IR
Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines
More informationSUPPORT VECTOR MACHINES
SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training
More informationKernel Combination Versus Classifier Combination
Kernel Combination Versus Classifier Combination Wan-Jui Lee 1, Sergey Verzakov 2, and Robert P.W. Duin 2 1 EE Department, National Sun Yat-Sen University, Kaohsiung, Taiwan wrlee@water.ee.nsysu.edu.tw
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More information