Improving the Performance of Text Categorization using N-gram Kernels
|
|
- Lenard Simmons
- 5 years ago
- Views:
Transcription
1 Improving the Performance of Text Categorization using N-gram Kernels Varsha K. V *., Santhosh Kumar C., Reghu Raj P. C. * * Department of Computer Science and Engineering Govt. Engineering College, Palakkad, Kerala, India {varshavenugopal9, pcreghu}@gmail.com Machine Intelligence Research Lab Department of Electronics and Communication Engineering Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu, India cskumar@cb.amrita.edu ABSTRACT: Kernel Methods are known for their robustness in handling large feature space and are widely used as an alternative to external feature extraction based methods in tasks such as classification and regression. This work follows the approach of using different string kernels such as n-gram kernels and gappy-n-gram kernels on text classification. It studies how kernel concatenation and feature combination affects the classification accuracy of the system. It also explores how the kernel combination algorithms work on the system. The kernels are implemented as rational kernels, which satisfies the Mercer s Theorem ensuring the kernel matrices to be positive definite symmetric. The rational kernels are computed with a general algorithm of composition of weighted transducers which help in dealing with variable length sequences. These kernels are then used with SVM formulating efficient classifier for text categorization. Both one-stage and two stage algorithms are applied for kernel combination which were successful in achieving better system performance compared to that given by individual kernels. Keywords: Gappy-n-gram kernels, Text Classification, Kernel Methods Received: 28 September 2014, Revised 2 November 2014, Accepted 8 November DLINE. All Rights Reserved 1. Introduction The area of Natural Language Processing(NLP) and Bioinformatics largely need to analyze the similarity between the strings. Kernel Methods (KM) are powerful Machine Learning tools, which can alleviate the data representation problem. They substitute feature-based similarities with similarity functions, i.e., kernels, directly defined between training/test instances[4]. Hence they are considered as the best alternates for external feature extraction based classification systems. Additionally, the composition or adaptation of several kernels, facilitates the design of effective similarities required for new tasks, which also makes them worth to explore. A standard approach (Joachims, 1998) to text categorization makes use of the classical text representation technique (Salton et al., 1975), and was successful with Support Vector Machines. String Kernels are found to be successful in the area of text classification [1], which considers the document just as a long sequence. In kernel based methods the choice of the kernel, has been traditionally entirely left to the user. This paper uses learning kernel algorithms [2] which require the user, only to specify 8 International Journal of Computational Linguistics Research Volume 6 Number 1 March 2015
2 a family of kernels. This family of kernels can then be used by a learning algorithm to form a combined kernel and derive an accurate predictor. The rational kernels are the family of kernels including string kernels, which constructs the kernels in terms of transducers [3]. The concept of kernel combination is also an area, which can enhance the individual kernel performance [4], [5]. Many algorithms are there which can help in achieving good embedding of candidate kernels to get better accuracy [4], [6] This paper is built on a string kernel based classification system, which classifies the documents in terms of the continuous or discontinuous n-gram they share. Different kernel combination algorithms are applied on the system in order to get better performance. The behavior of the system with feature combination and kernel concatenation is also analyzed. 2. Kernel Methods Getting the similarity measures between the documents is the fundamental task of text classification. The Kernel Methods ( KMs) naturally induces the similarity between two documents in terms of their dot product in the feature space. Given an input document X, kernel can be defined [6] as a function k that returns the inner product over the feature space X X.For every x, y in X, satisfies k (x, y) = k (y, x) and Σ n i = 1 Σn j = i c i c j k (x i, x j ) 0 (1) for any c N, {c i } n i = 1 Rn and {x i } n i = 1 X n the matrix formed by all the k ij is called the Kernel Matrix. As the kernel values are computed using inner products all the values in the kernel matrix are positive. In terms of feature space the kernel function returns the dot product of the feature vectors. Thus there exist a mapping function ø which maps the input document X to a feature space F, as Ø : X X R applying kernel function on the feature space will return the inner product of the feature vectors. k (x, y) = Øx, Øy (2) This inner product serves as the similarity measure in Kernel Methods. When the value of this measure increases the similarity also becomes more. thus the kernel matrix which contain all these n n similarity measures serves as the reference for document similarity. Kernel methods can be readily be used with SVM classifier. SVMs are a class of algorithms that combine the principles of statistical learning theory with optimization techniques and the idea of a kernel mapping [6].Given a sample of N independent and identically distributed training instances {( x i, y i )} N where x is the D-dimensional input vector and y {1, +1} is its class i = 1 i i label, SVM basically finds the linear discriminant with the maximum margin in the feature space induced by the mapping function Ø : R D R S. The resulting discriminant function is f (x) = w, Φ (x) + b. (3) The classifier can be trained by solving the following quadratic optimization problem [7]. 3. String Kernels The representation and computation of rational kernels is based on weighted finite-state transducers. 3.1 Weighted Transducers A weighted transducer can be considered as a linear automaton with augmented output label and some real-valued weight that may represent a cost or a probability [7]. Input (output) labels are concatenated along a path to form an input (output) sequence. The weights of the transducers considered here are non-negative real values. Definition 1 [7]: A weighted finite-state transducer T over a semi-ring K is an 8-tuple T = ( Σ, Δ, Q, I, F, E, λ, ρ) where: Σ is the finite input alphabet of the transducer; Δ is the finite output alphabet; Q is a finite set of states; I Q; the set of initial states; F Q the set of final states; E Q ( Σ {ε} ( Δ {ε}) K Q a finite set of transitions; λ : I K the initial weight function; and ρ : F K the final weight function mapping F to K. International Journal of Computational Linguistics Research Volume 6 Number 1 March
3 Any path from initial state to accepting state is called accepting path. The weight of each accepted path is the sum of the product of the constituent transition weights. For input and output strings a common alphabet Σ is chosen. The weight associated by a weighted transducer T to a pair of strings (x, y) Σ Σ is denoted by T (x, y) and is obtained by summing the weights of all accepting paths with input label x and output label y. There are mainly two operations on transducers which is used for kernel computation, they are inverse and composition respectively. The inverse of a transducer T is given by just swapping the input and output symbols of the transducer, thus T 1 (y, x) = T (x, y) for any x, y in Σ. The composition T 1 ο T 2 is defined as [3], [8] (T 1 ο T 2 ) (x, y) = Σ z Σ T 1 (x, z) T 2 (z, y) (4) where x and y are the input sequences. The composition of the transducers over x and y will give the count of common sequence Z Σ they share, such as if the sequence z is absent in one of the input strings, then the counting term corresponding to that particular z will go zero. This concept is used in getting the similarity of two input string. 3.2 Rational Kernels The computation of rational kernels is done with the help of weighted transducers. The definitions are followed from [3], [8]. The rational kernels are the family of kernels that can be defined through weighted transducers. Most of the kernels widely used in classification belong to this family. They can be defined as the kernel K such that K ( x, y) = U ( x, y) (5) for every x, y X, where U is the weighted transducer. The theorem following [3] act as the main theorem which will help in solving problem of Positive Definite Symmetric(PDS) kernels for kernel learning Theoram1 [3]: Let T be an arbitrary weighted transducer. Then, the function defined by the transducer U = T ο T -1 is a PDS rational kernel. Thus, we will refer by PDS rational kernels to the rational kernels K defined by a transducer U = T ο T 1. To ensure that the finiteness of the kernel values, we will also assume that T does not admit any cycle with input. This implies that for any x Σ there are finitely many sequences z Σ for which T (x, z) = 0. 1) Algorithm for constructing rational kernel: Let K be a rational kernel and let T be the associated weighted transducer. Let A and B be two acyclic weighted automaton that represent just two strings x, y Σ or may be any complex weighted automaton. By definition of rational kernels (Theoram1) and the shortest-distance [3], K (A, B) can be computed by: Constructing the composed transducer N = A ο T ο B. Computing w [N], by determining the shortest-distance from the initial states of N to its final states using the shortest-distance [3] Computing ψ (w [N]) (where ψ is a function : K R such that K (x, y) = ψ (x, y)) [3]. 3.3 N-gram Kernel The n-gram kernels generate similarity by taking into account the count of common n-grams shared by the documents. The similarity is calculated in terms of the sum of the products of the common continuous n-gram shared. The n-gram kernels can be efficiently built from their corresponding n-gram count transducers. To construct the n-gram kernel the algorithm described on the above section is enough the only modification needed is that the transducer T should be an n-gram counting transducer. The count based similarity with an n-gram count transducer T n can be given as A ο T n : Expected count of n-grams in A T -1 n ο B: Expected count of n-grams in B A ο T n ο B: Expected count of matching sequences in A and B Thus the similarity based on shared n-grams can be efficiently found out. 3.4 Gappy-n-gram Kernel 10 International Journal of Computational Linguistics Research Volume 6 Number 1 March 2015
4 The gappy-n-gram kernel work similar to n-gram kernels but in a wider context. This kernel takes consideration of the discontinuous n-gram shared between the document as the measure of similarity. Thus n-grams with internal gaps are also taken into account. For this kernel in addition to the ngram length another parameter is also there which is the decay factor λ. The value of lambda varies between zero and one, for each gap the count get multiplied with this decay factor. Thus the greater the gap incorporated in n-gram the less important it is considered to be. The gappy-n-gram kernels can also be created with the transducers provided there will be extra self loops for each state with weight equal to decay factor. This is done in-order to include the gaps in the n-gram kernel. The rest of the kernel construction and similarity measures are same as with n-gram kernels. The computational cost is much higher for gappy-ngrams since it computes large feature space. Consider three strings cat, car, cast. The feature space generated by the two string kernels are given as Bigram features ca at ar as st car cat cast Gappy bigram features ca at ar as st cr ct cs car λ 0 0 cat λ 0 cast 1 λ λ 2 λ Thus the words cat and cast are made similar in terms of single bigram by n-gram kernel, but it has wider elaboration through gappy n-gram kernel. It considers the influence of discontinuous bigram also in similarity measures, with the decay factor penalizing for each gap. 4. String Kernel Classification System String kernel based classification system is supervised system. It processes the documents with the help of string kernels and classifies with SVM. The different steps in constructing the system is given below. 4.1 Preparing the Data Every document is converted to its Finite State Transducer representation. This conversion is necessary since the kernel computations are done in terms of transducers composition. Every transition in each FST represents transition from one ascii character to another. The weights for each transition is calculated in terms of negative log probabilities. The alphabet is taken as the entire character set. 4.2 Creating the String Kernels Both n-gram and gappy-n-gram kernels are created from the entire dataset. Here the n-gram kernel can be formed with the help of n-gram transducers which will count on every accepting n-gram. And using the transducer composition the corresponding n- gram kernels can be generated. The text documents which are converted to FSTs are then composed with these transducers inorder to get the kernel values. Thus each document gets mapped to both n-gram and gappy-n-gram feature space. 4.3 Evaluating the Kernels for the Dataset The evaluation of the n-gram kernel can be defined otherwise as the creation of kernel matrices. By applying the kernel function on N input strings which are automata we can generate the N N kernel matrix. The matrix is generated by simply taking the dot product of the feature vector corresponding to the input strings. 4.4 Training and Testing using SVM International Journal of Computational Linguistics Research Volume 6 Number 1 March
5 For the training of the system SVM can readily be used. The classification takes place according to the structural risk minimization algorithm and maximum marginal criterion[10]. The training and testing is done in a transduction way [4]. In this setting, optimizing the kernel K corresponds to choosing a kernel matrix formed using the entire dataset. This matrix consist of trainingdata block, mixed training, testing data block, and testing-data block as in [2]. In transduction setting, the training and test-data blocks are entangled: tuning training-data entries in K (to optimize their embedding) imply that test-data entries are automatically tuned in some way as well [2]. This can be achieved by constraining the search space of possible kernel matrices: the capacity of the search space of possible kernel matrices in order to prevent over fitting and achieve good generalization on test data. 4.5 Evaluation Measures After getting the predicted label values for the testing documents, the testing accuracy is used as an evaluation measure. Other than that the F1 measure is taken into account. The F1 measure is a trade of between the precision and recall of the entire system. We can calculate F1 measure as F1 = (2 Precision Recall) (Precision + Recall). The goodness of the system in classification indicates high precision and recall value thus high F1 value. 5. Multiple Kernel Learning Multiple Kernel Learning(MKL) learns a (linear or non linear) combination of kernels, in purpose of achieving better results, comparing to learn with a single kernel All kernel based methods can be potentially extended to MKL framework. Given a training set S = {(x 1, y 1 ),..., (x n, y n )}, Given a set of basic kernels {K 1,...,K M }, K k R n n, K k positive semi definite. The objective of MKL to optimize a cost function Q (K, S) where K is a combination of basic kernels, for example K = Σ M k = 1 μ k K k μ 0. [2] k In MKL, the combined kernel is a kernel matrix corresponding to the entire dataset which is learned, which optimizes a certain cost function that depends on the available labels. The available labels are used to learn a good embedding, apply it to both the labeled and the unlabeled data. The resulting kernel matrix can then be used in combination with support vector machine (SVM). There are one-stage and two-stage algorithms used in MKL. One-stage method consists of minimizing an objective function both with respect to the kernel combination parameters and the hypothesis chosen [2].The two-stage algorithms [7] learn kernels in the form of linear combinations of p base kernels K k, k [1, p]. In all cases, the final hypothesis learned belongs to the reproducing kernel Hilbert space associated to a kernel K μ = Σ p k = 1 μ k K, where the mixture weights are selected subject to k the condition μ k 0, which guarantees that K is a PDS kernel, and a condition on the norm of μ, μ = Λ 0, where Λ is a regularization parameter [7]. In the first stage, these algorithms determine the mixture weights. In the second stage, they train a kernel-based algorithm. Three MKL algorithms used for kernel combination described below. 5.1 Uniform Combination (unif) The kernels are combined with uniform weights. In this most straight- forward method, equal mixture weights are chosen,thus Λ the combined kernel matrix is K = ρ Σ p k = 1 K k. [7] 5.2 Alignment based Combination (align) This method uses the training sample to independently compute the alignment between each kernel matrix K k and the target kernel matrix K Y = yy T, based on the labels y, and to choose each mixture weight μ k proportional to that alignment. Thus, the p resulting kernel matrix is: K α Σ p k = 1 ρ (K k ; K Y ) K k [7] 5.3 Linear Combination (lin1) In this algorithm positive linear combination of kernels [4] are taken, and the regularization restricts the kernel matrix trace. Let {K 1,...,K m } be the kernels to be combined, the combination can be given as K = Σ m i = 1 μ i K i, where K 0, trace (K) c. The set {K 1,...,K m } could be a set of initial guesses of the kernel matrix, with different kernel parameter values. Instead of fine-tuning the kernel parameter for a given kernel using cross-validation, we can now evaluate the given kernel for a range of kernel parameters and then optimize the weights in the linear combination of the obtained kernel matrices. 12 International Journal of Computational Linguistics Research Volume 6 Number 1 March 2015
6 N-gram-kernel Category N-gram F1 Precision Recall Accuracy(%) acq 5, corn 4,5, crude 4, earn 5, Gappy-n-gram Kernel acq corn 3, crude 3, earn Table 1. The result on subset of Reuters21578 dataset n-gram and gappy-n-gram kerenl acq corn crude earn N-gram kernel 3-gram gram gram gram Gappy-n-gram kernel 3-gram gram gram gram Experiments and Results Table 2. Classification accuracy(%) with individual kernels For the experiments done on string kernel, a subset of reuters dataset with ModeApte split is used. The dataset contains a total of 466 document over four categories acquisition, corn, crude, and earn. From the 466 documents 377 documents were selected for training(including 154 earn,114 acq,76 crude, 38corn) and the remaining 89(including 42 earn,26 acq,15 crude, 10 corn) documents constitute the test set. The string kernels used are gappy-ngram kernel and n-gram kernel with length of n-gram varying in 3,4,5,6,7,8 were constructed on the dataset. The decay parameter for the gappy n-gram kernel was set to 0.5. The results for n-gram and gappy-n-gram classification is given in table 1.Only the best n-gram performance is reported. The classifier parameter is set as 1. The classification accuracy is found to be decreasing when the string length of the kernel increased. The decay parameter when increased the accuracy decayed. The feature combination and the weighted combination of kernels does not give significant improvements in the classification accuracy. The technique of Kernel concatenation gave improvement to the accuracy. The results with individual kernels are given in Table II. The improvement in accuracy by concatenation of both n-gram and gappyn-gram kernels is given in Table III. Through concatenation all categories mark significant change in accuracy. The kernel combining algorithms used belong to both one stage( lin1) and two-stage(unif, align) learning algorithms. Before the International Journal of Computational Linguistics Research Volume 6 Number 1 March
7 Kernels acq corn crude earn N-gram kernel 3, ,4, ,4,5, Gappy-n-gram kernel 3, ,4, ,4,5, Table 3. Classification accuracy(%) with combined kernels N-gram kernel Category unif lin1 align acq corn crude earn Gappy-n-gram kernel acq corn crude earn Table 4. The result obtained by applying kernel combination algorithm algorithms are applied, each base kernel is centered and normalized to have trace equal to one. The results are reported with table IV. The one-stage algorithm does not bring any improvement in accuracy, but the rest of the algorithm showed improvement over individual kernels. For this set of experiments only the 3,4,5 gram kernels are used. The reason is that in combining kernels the rest of the kernels seemed less contributing. 7. Conclusion The n-gram kernel and gappy-n-gram kernel based classification system delivered good performance. The performance of both the string kernels are found to be comparable, thus gappy-n-gram kernel is found worth analyzing text documents in wider context. The results achieved on the Reuters subset dataset were comparable to those reported in. [1] Few differences exists although, since the exact documents which are used in [1] are not used in this work, also the preprocessing done on is not used here. The classification accuracy of the system is found to be increased using the kernel concatenation and the algorithmic combination of string kernels. The experiments conducted using algorithms for kernel combination shows, the two stage algorithms to be more efficient than one-stage algorithms on this dataset. References [1] Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianin, N., Watkins, C. (2002). Text classification using string kernels, J. Mach. Learn. Res., 2, p , Mar [Online]. Available: 14 International Journal of Computational Linguistics Research Volume 6 Number 1 March 2015
8 [2]Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5, p , Dec. [Online]. Available: [3] Cortes, C., Haffner, P., Mohri, M., Bennett, K., Cesa-bianchi, N. (2004). Rational kernels: Theory and algorithms, Journal of Machine Learning Research, 5, p [4] Martins, A. (2006). String kernels and similarity measures for information retrieval, Tech. Rep. [5] Cortes, C., Mohri, M., Rostamizadeh, A. (2008). Learning sequence kernels, Oct. p [Online]. Available: /mlsp [6] Ben-Hur, A., Weston, J. (2010). A user s guide to support vector machines, Methods in Molecular Biology, 609, p , [Online]. Available: /bib/ben-hur/ben2010user/howto.pdf [7] Cortes, C., Mohri, M., Rostamizadeh, A., Two-stage learning kernel algorithms. [8] Cortes, C., Mohri, M. (2009). Learning with weighted transducers, In: Proceedings of the 2009 Conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7 th International Workshop FSMNLP Amsterdam, The Netherlands, The Netherlands: IOS Press, 2009, p [Online]. Available: International Journal of Computational Linguistics Research Volume 6 Number 1 March
Text Classification using String Kernels
Text Classification using String Kernels Huma Lodhi John Shawe-Taylor Nello Cristianini Chris Watkins Department of Computer Science Royal Holloway, University of London Egham, Surrey TW20 0E, UK fhuma,
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationSVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS]
SVM cont d A method of choice when examples are represented by vectors or matrices Input space cannot be readily used as attribute-vector (e.g. too many attrs) Kernel methods: map data from input space
More informationLearning Hierarchies at Two-class Complexity
Learning Hierarchies at Two-class Complexity Sandor Szedmak ss03v@ecs.soton.ac.uk Craig Saunders cjs@ecs.soton.ac.uk John Shawe-Taylor jst@ecs.soton.ac.uk ISIS Group, Electronics and Computer Science University
More informationString Vector based KNN for Text Categorization
458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research
More informationFast Kernels for Inexact String Matching
Fast Kernels for Inexact String Matching Christina Leslie and Rui Kuang Columbia University, New York NY 10027, USA {rkuang,cleslie}@cs.columbia.edu Abstract. We introduce several new families of string
More informationA MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION. Patrick Haffner Park Avenue Florham Park, NJ 07932
Springer Handbook on Speech Processing and Speech Communication A MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION Corinna Cortes Google Research 76 Ninth Avenue New York, NY corinna@google.com
More information12 Classification using Support Vector Machines
160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.
More informationLearning with infinitely many features
Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example
More informationA generalized quadratic loss for Support Vector Machines
A generalized quadratic loss for Support Vector Machines Filippo Portera and Alessandro Sperduti Abstract. The standard SVM formulation for binary classification is based on the Hinge loss function, where
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationNew String Kernels for Biosequence Data
Workshop on Kernel Methods in Bioinformatics New String Kernels for Biosequence Data Christina Leslie Department of Computer Science Columbia University Biological Sequence Classification Problems Protein
More informationDiscriminative Clustering for Image Co-segmentation
Discriminative Clustering for Image Co-segmentation Armand Joulin Francis Bach Jean Ponce INRIA Ecole Normale Supérieure, Paris January 2010 Introduction Introduction Task: dividing simultaneously q images
More informationPacket Classification using Support Vector Machines with String Kernels
RESEARCH ARTICLE Packet Classification using Support Vector Machines with String Kernels Sarthak Munshi *Department Of Computer Engineering, Pune Institute Of Computer Technology, Savitribai Phule Pune
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationMath 734 Aug 22, Differential Geometry Fall 2002, USC
Math 734 Aug 22, 2002 1 Differential Geometry Fall 2002, USC Lecture Notes 1 1 Topological Manifolds The basic objects of study in this class are manifolds. Roughly speaking, these are objects which locally
More informationLearning with Weighted Transducers
Learning with Weighted Transducers Corinna CORTES a and Mehryar MOHRI b,1 a Google Research, 76 Ninth Avenue, New York, NY 10011 b Courant Institute of Mathematical Sciences and Google Research, 251 Mercer
More informationSupport Vector Machines
Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization
More informationTwo-graphs revisited. Peter J. Cameron University of St Andrews Modern Trends in Algebraic Graph Theory Villanova, June 2014
Two-graphs revisited Peter J. Cameron University of St Andrews Modern Trends in Algebraic Graph Theory Villanova, June 2014 History The icosahedron has six diagonals, any two making the same angle (arccos(1/
More informationA Short SVM (Support Vector Machine) Tutorial
A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange
More informationSVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1
SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used
More informationMathematical Themes in Economics, Machine Learning, and Bioinformatics
Western Kentucky University From the SelectedWorks of Matt Bogard 2010 Mathematical Themes in Economics, Machine Learning, and Bioinformatics Matt Bogard, Western Kentucky University Available at: https://works.bepress.com/matt_bogard/7/
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationMultiple Kernel Machines Using Localized Kernels
Multiple Kernel Machines Using Localized Kernels Mehmet Gönen and Ethem Alpaydın Department of Computer Engineering Boğaziçi University TR-3434, Bebek, İstanbul, Turkey gonen@boun.edu.tr alpaydin@boun.edu.tr
More informationTRANSDUCTIVE LINK SPAM DETECTION
TRANSDUCTIVE LINK SPAM DETECTION Denny Zhou Microsoft Research http://research.microsoft.com/~denzho Joint work with Chris Burges and Tao Tao Presenter: Krysta Svore Link spam detection problem Classification
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationSVM Classification in Multiclass Letter Recognition System
Global Journal of Computer Science and Technology Software & Data Engineering Volume 13 Issue 9 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationLinear methods for supervised learning
Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes
More informationKernels for Structured Data
T-122.102 Special Course in Information Science VI: Co-occurence methods in analysis of discrete data Kernels for Structured Data Based on article: A Survey of Kernels for Structured Data by Thomas Gärtner
More informationSupport Vector Machines
Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,
More informationEncoding Words into String Vectors for Word Categorization
Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,
More informationLocal Linear Approximation for Kernel Methods: The Railway Kernel
Local Linear Approximation for Kernel Methods: The Railway Kernel Alberto Muñoz 1,JavierGonzález 1, and Isaac Martín de Diego 1 University Carlos III de Madrid, c/ Madrid 16, 890 Getafe, Spain {alberto.munoz,
More informationKernel Combination Versus Classifier Combination
Kernel Combination Versus Classifier Combination Wan-Jui Lee 1, Sergey Verzakov 2, and Robert P.W. Duin 2 1 EE Department, National Sun Yat-Sen University, Kaohsiung, Taiwan wrlee@water.ee.nsysu.edu.tw
More informationWell Analysis: Program psvm_welllogs
Proximal Support Vector Machine Classification on Well Logs Overview Support vector machine (SVM) is a recent supervised machine learning technique that is widely used in text detection, image recognition
More informationOpinion Mining by Transformation-Based Domain Adaptation
Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationArithmetic in Quaternion Algebras
Arithmetic in Quaternion Algebras Graduate Algebra Symposium Jordan Wiebe University of Oklahoma November 5, 2016 Jordan Wiebe (University of Oklahoma) Arithmetic in Quaternion Algebras November 5, 2016
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More informationSupport Vector Machines and their Applications
Purushottam Kar Department of Computer Science and Engineering, Indian Institute of Technology Kanpur. Summer School on Expert Systems And Their Applications, Indian Institute of Information Technology
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationFraud Detection using Machine Learning
Fraud Detection using Machine Learning Aditya Oza - aditya19@stanford.edu Abstract Recent research has shown that machine learning techniques have been applied very effectively to the problem of payments
More informationExponentiated Gradient Algorithms for Large-margin Structured Classification
Exponentiated Gradient Algorithms for Large-margin Structured Classification Peter L. Bartlett U.C.Berkeley bartlett@stat.berkeley.edu Ben Taskar Stanford University btaskar@cs.stanford.edu Michael Collins
More informationRule extraction from support vector machines
Rule extraction from support vector machines Haydemar Núñez 1,3 Cecilio Angulo 1,2 Andreu Català 1,2 1 Dept. of Systems Engineering, Polytechnical University of Catalonia Avda. Victor Balaguer s/n E-08800
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions
More informationSketchable Histograms of Oriented Gradients for Object Detection
Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The
More informationArithmetic in Quaternion Algebras
Arithmetic in Quaternion Algebras 31st Automorphic Forms Workshop Jordan Wiebe University of Oklahoma March 6, 2017 Jordan Wiebe (University of Oklahoma) Arithmetic in Quaternion Algebras March 6, 2017
More informationClassification. 1 o Semestre 2007/2008
Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationTransductive Learning: Motivation, Model, Algorithms
Transductive Learning: Motivation, Model, Algorithms Olivier Bousquet Centre de Mathématiques Appliquées Ecole Polytechnique, FRANCE olivier.bousquet@m4x.org University of New Mexico, January 2002 Goal
More informationPositive Definite Kernel Functions on Fuzzy Sets
Positive Definite Kernel Functions on Fuzzy Sets FUZZ 2014 Jorge Guevara Díaz 1 Roberto Hirata Jr. 1 Stéphane Canu 2 1 Institute of Mathematics and Statistics University of Sao Paulo-Brazil 2 Institut
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationProfile-based String Kernels for Remote Homology Detection and Motif Extraction
Profile-based String Kernels for Remote Homology Detection and Motif Extraction Ray Kuang, Eugene Ie, Ke Wang, Kai Wang, Mahira Siddiqi, Yoav Freund and Christina Leslie. Department of Computer Science
More informationSupport Vector Machines (SVM)
Support Vector Machines a new classifier Attractive because (SVM) Has sound mathematical foundations Performs very well in diverse and difficult applications See paper placed on the class website Review
More informationSupport Vector Machines for Face Recognition
Chapter 8 Support Vector Machines for Face Recognition 8.1 Introduction In chapter 7 we have investigated the credibility of different parameters introduced in the present work, viz., SSPD and ALR Feature
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationKernels and representation
Kernels and representation Corso di AA, anno 2017/18, Padova Fabio Aiolli 20 Dicembre 2017 Fabio Aiolli Kernels and representation 20 Dicembre 2017 1 / 19 (Hierarchical) Representation Learning Hierarchical
More informationModifying Kernels Using Label Information Improves SVM Classification Performance
Modifying Kernels Using Label Information Improves SVM Classification Performance Renqiang Min and Anthony Bonner Department of Computer Science University of Toronto Toronto, ON M5S3G4, Canada minrq@cs.toronto.edu
More informationData mining with Support Vector Machine
Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine
More informationImproving Image Segmentation Quality Via Graph Theory
International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,
More informationKernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017
Kernel SVM Course: MAHDI YAZDIAN-DEHKORDI FALL 2017 1 Outlines SVM Lagrangian Primal & Dual Problem Non-linear SVM & Kernel SVM SVM Advantages Toolboxes 2 SVM Lagrangian Primal/DualProblem 3 SVM LagrangianPrimalProblem
More informationKernel Principal Component Analysis: Applications and Implementation
Kernel Principal Component Analysis: Applications and Daniel Olsson Royal Institute of Technology Stockholm, Sweden Examiner: Prof. Ulf Jönsson Supervisor: Prof. Pando Georgiev Master s Thesis Presentation
More informationBagging for One-Class Learning
Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationSecond Order SMO Improves SVM Online and Active Learning
Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationSEAFLOOR SEDIMENT CLASSIFICATION OF SONAR IMAGES
SEAFLOOR SEDIMENT CLASSIFICATION OF SONAR IMAGES Mrs K.S.Jeen Marseline 1, Dr.C.Meena 2 1 Assistant Professor, Sri Krishna Arts & Science College, Coimbatore 2 Center Head Avinashilingam Institute For
More informationA Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 3 March 2017, Page No. 20765-20769 Index Copernicus value (2015): 58.10 DOI: 18535/ijecs/v6i3.65 A Comparative
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationUnsupervised Feature Selection for Sparse Data
Unsupervised Feature Selection for Sparse Data Artur Ferreira 1,3 Mário Figueiredo 2,3 1- Instituto Superior de Engenharia de Lisboa, Lisboa, PORTUGAL 2- Instituto Superior Técnico, Lisboa, PORTUGAL 3-
More informationThe Effects of Outliers on Support Vector Machines
The Effects of Outliers on Support Vector Machines Josh Hoak jrhoak@gmail.com Portland State University Abstract. Many techniques have been developed for mitigating the effects of outliers on the results
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationFacial expression recognition using shape and texture information
1 Facial expression recognition using shape and texture information I. Kotsia 1 and I. Pitas 1 Aristotle University of Thessaloniki pitas@aiia.csd.auth.gr Department of Informatics Box 451 54124 Thessaloniki,
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationAn introduction to random forests
An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationSupport Vector Machines + Classification for IR
Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More information9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives
Foundations of Machine Learning École Centrale Paris Fall 25 9. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech Learning objectives chloe agathe.azencott@mines
More informationText Classification using String Kernels
Text Classification using String Kernels HUlna Lodhi John Shawe-Taylor N ello Cristianini Chris Watkins Department of Computer Science Royal Holloway, University of London Egham, Surrey TW20 OEX, UK {huma,
More informationA Review on Plant Disease Detection using Image Processing
A Review on Plant Disease Detection using Image Processing Tejashri jadhav 1, Neha Chavan 2, Shital jadhav 3, Vishakha Dubhele 4 1,2,3,4BE Student, Dept. of Electronic & Telecommunication Engineering,
More informationSemi supervised clustering for Text Clustering
Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering
More informationLinear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines
Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving
More informationSupport Vector Machine Learning for Interdependent and Structured Output Spaces
Support Vector Machine Learning for Interdependent and Structured Output Spaces I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, ICML, 2004. And also I. Tsochantaridis, T. Joachims, T. Hofmann,
More informationMultiple cosegmentation
Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Supérieure April 25, 2012 Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation
More informationCHAPTER 3 FUZZY RELATION and COMPOSITION
CHAPTER 3 FUZZY RELATION and COMPOSITION The concept of fuzzy set as a generalization of crisp set has been introduced in the previous chapter. Relations between elements of crisp sets can be extended
More informationChoosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008
Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008 Presented by: Nandini Deka UH Mathematics Spring 2014 Workshop
More informationDECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY
DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY Ramadevi Yellasiri, C.R.Rao 2,Vivekchan Reddy Dept. of CSE, Chaitanya Bharathi Institute of Technology, Hyderabad, INDIA. 2 DCIS, School
More information