Support vector machines

Similar documents
Linear methods for supervised learning

Topic 4: Support Vector Machines

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Support Vector Machines

All lecture slides will be available at CSC2515_Winter15.html

SUPPORT VECTOR MACHINES

Support Vector Machines

Support Vector Machines

12 Classification using Support Vector Machines

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Lecture 7: Support Vector Machine

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

SUPPORT VECTOR MACHINES

Support vector machines

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

More on Classification: Support Vector Machine

Support Vector Machines

CS 229 Midterm Review

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

6.034 Quiz 2, Spring 2005

DM6 Support Vector Machines

Classification by Support Vector Machines

Lecture 9: Support Vector Machines

Chakra Chennubhotla and David Koes

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Classification by Support Vector Machines

A Dendrogram. Bioinformatics (Lec 17)

Naïve Bayes for text classification

Introduction to Support Vector Machines

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Support Vector Machines

5 Learning hypothesis classes (16 points)

Support vector machine (II): non-linear SVM. LING 572 Fei Xia

Support Vector Machines + Classification for IR

Instance-based Learning

Nearest Neighbor Predictors

Data Mining in Bioinformatics Day 1: Classification

Support Vector Machines.

Distribution-free Predictive Approaches

Application of Support Vector Machine Algorithm in Spam Filtering

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Mathematical Themes in Economics, Machine Learning, and Bioinformatics

3 Perceptron Learning; Maximum Margin Classifiers

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

Chap.12 Kernel methods [Book, Chap.7]

Lecture 9. Support Vector Machines

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs

Introduction to Automated Text Analysis. bit.ly/poir599

A Practical Guide to Support Vector Classification

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning. Classification

Instance-based Learning

Classification by Support Vector Machines

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Supervised vs unsupervised clustering

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Basis Functions. Volker Tresp Summer 2017

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Generative and discriminative classification techniques

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

1 Case study of SVM (Rob)

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Advanced Studies in Applied Statistics (WBL), ETHZ Applied Multivariate Statistics Spring 2018, Week 11

Evaluating Classifiers

2. On classification and related tasks

Machine Learning: Think Big and Parallel

CISC 4631 Data Mining

The Curse of Dimensionality

Data mining with Support Vector Machine

ECG782: Multidimensional Digital Signal Processing

Data Mining. Lecture 03: Nearest Neighbor Learning

A Short SVM (Support Vector Machine) Tutorial

Support Vector Machines (a brief introduction) Adrian Bevan.

Statistics 202: Statistical Aspects of Data Mining

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

Network Traffic Measurements and Analysis

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Support Vector Machines

732A54/TDDE31 Big Data Analytics

Going nonparametric: Nearest neighbor methods for regression and classification

Classification by Nearest Shrunken Centroids and Support Vector Machines

Constrained optimization

To earn the extra credit, one of the following has to hold true. Please circle and sign.

Kernel Methods & Support Vector Machines

9 Classification: KNN and SVM

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Transcription:

Support vector machines Cavan Reilly October 24, 2018

Table of contents K-nearest neighbor classification Support vector machines

K-nearest neighbor classification Suppose we have a collection of measurements x i and class labels y i for i = 1,..., n and our goal is to predict the class label for a new collection of measurements x new. A conceptually simple way to acheive this is to measure the distance between x new and x i for all i and then predict the class for the new observation to be the most common class label among the K nearest neighbors (where K = 1, 3, 5, 7,...). The value of K can be determined by cross validation, however the method for computing distances now becomes critical.

K-nearest neighbor classification Moreover rarely can a scientific understanding of the problem ever contribute to the decision regarding what distance to use (unlike the use of a likelihood). Nonetheless, given a method for computing distances, the data x i and y i and K, the space in which the data reside (the data space) gets partitioned into regions where the prediction for new observations is for one class or the other. Unless the data really do separate out well, this partition is likely to be rather complicated. This makes prediction sensitive to the values of x new and consequently the predictions would frequently be incorrect.

Support vector machines Support vector machines are a class of algorithms which were designed for classification of 2 groups, however these methods have been extended to other settings (e.g. multigroup classification and regression). The starting point for these methods is that if the data from the 2 groups can be separated then one should pick a boundary in the data space and use this boundary for classifying future observations. If we pick a linear boundary, then there are still an infinte number of possible boundaries.

Support vector machines In support vector machines the linear boundary is selected to maximize the margin betweem the 2 groups. Only a subset of observations will be involved in determining exactly where this boundary lies: such observations are called support vectors. In most applications the data from the different groups do not separate out well, so simply finding a linear boundary in the data space will not result in a good classification scheme. In this case we could try to map our data from data space to another space, called feature space in such a way that a linear boundary separates the 2 groups in this other space.

Support vector machines Recall multidimensional scaling: map to a lower dimensional space to see if the data separate in this lower dimensional space. In the support vector machine literature we pursue the opposite: map to a higher dimensional space. Of course, in practice one can always map to a higher dimensional space and separate the data from the 2 groups perfectly. In fact one can always do this by mapping to a space with just 1 more dimension.

Support vector machines So given a new observation, we would map this to the higher dimensional space and see which side of our boundary the new observation maps to. We can do this by looking at the angle between where our point gets mapped and the boundary (well really the vector that is normal to the boundary). It turns out that, if I define my boundary by maximizing the margin, the angle between where the new observation gets mapped and the vector that is normal to the boundary only depends on an inner product in the space where I map the data. Moreover I can represent the boundary just using such inner products.

Support vector machines So rather than specifying feature space and a mapping, I can just specify an inner product in feature space. There are well established characterizations of functions of 2 vectors in arbitrary spaces that are valid as inner products (this characterization is known as Mercer s theorem). Such functions are called kernels in this context, hence if we specify a kernel we are ready to do classification.

Support vector machines: slack variables and cost parameters Although we can always seprate the data in a higher dimensional space, if we try to do so by specifying a kernel rather than a mapping, we may not acheive perfect separation in feature space. In this event there will be no margin to maximize, hence we need to allow for errors in the classification process. Formally we do this by introducing slack variables, one for each observation in our data set, and we pay a certain cost depending on how much slack is in the data. When we train our classifier we must select a value for this cost, and sometimes other parameters too.

Support vector machines: kernels Users of SVMs spend a considerable amount of effort on training the classifier by trying to optimize the performance on the training data set, typically in terms of cross validated errors. This may involve trying different kernels, different costs and perhaps parameters that enter the kernel. Some common kernels are 1. the Gaussian radial basis kernel: k(x 1, x 2 ) = exp{ c x 1 x 2 2 } 2. the simple kernel k(x 1, x 2 ) = i x 1ix 2i 3. the polynomial kernel k(x 1, x 2 ) = (c 1 i x 1ix 2i + c 2 ) c 3.

Support vector machines: kernels The first is useful when little is known about what might work well, the second for text processing and the third for classifying images. There are a number of packages that implement support vector machines in R and there are publications that compare them in terms of generality, speed and quality. We will use the kernlab package: it has good speed, a numerically stable algorithm and is more extensible than some of the others. In particular, you can make up your own kernels in this package.