SUPPORT VECTOR MACHINES

Similar documents
SUPPORT VECTOR MACHINES

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

All lecture slides will be available at CSC2515_Winter15.html

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Support vector machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Support Vector Machines

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Support Vector Machines

Introduction to Machine Learning

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

Support Vector Machines.

Support Vector Machines + Classification for IR

Support Vector Machines (a brief introduction) Adrian Bevan.

DM6 Support Vector Machines

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

Support Vector Machines

Support Vector Machines

Neural Networks and Deep Learning

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Machine Learning for NLP

Kernel Methods & Support Vector Machines

Lecture 9: Support Vector Machines

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Support Vector Machines

Linear methods for supervised learning

Support Vector Machines

COMS 4771 Support Vector Machines. Nakul Verma

Support Vector Machines

12 Classification using Support Vector Machines

Support vector machine (II): non-linear SVM. LING 572 Fei Xia

Lab 2: Support vector machines

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs

Machine Learning: Think Big and Parallel

Lecture 7: Support Vector Machine

Lecture Linear Support Vector Machines

Topic 4: Support Vector Machines

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

Content-based image and video analysis. Machine learning

Support Vector Machines and their Applications

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

732A54/TDDE31 Big Data Analytics

Generative and discriminative classification techniques

Support Vector Machines.

10. Support Vector Machines

Chakra Chennubhotla and David Koes

DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe

An Introduction to Machine Learning

Classification by Support Vector Machines

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes

Machine Learning Lecture 9

1-Nearest Neighbor Boundary

More on Classification: Support Vector Machine

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

Support Vector Machines 290N, 2015

Support Vector Machines

Support Vector Machines

Machine Learning Lecture 9

Large synthetic data sets to compare different data mining methods

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

Perceptron Learning Algorithm

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Lab 2: Support Vector Machines

CAP5415-Computer Vision Lecture 13-Support Vector Machines for Computer Vision Applica=ons

CS570: Introduction to Data Mining

Support vector machines. Dominik Wisniewski Wojciech Wawrzyniak

Perceptron Learning Algorithm (PLA)

Data-driven Kernels for Support Vector Machines

Learning via Optimization

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

Chap.12 Kernel methods [Book, Chap.7]

A Practical Guide to Support Vector Classification

Support vector machines

A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines

SVM Toolbox. Theory, Documentation, Experiments. S.V. Albrecht

Machine Learning Classifiers and Boosting

Version Space Support Vector Machines: An Extended Paper

Classification by Support Vector Machines

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

Information Management course

Combine the PA Algorithm with a Proximal Classifier

Constrained optimization

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi

Introduction to Support Vector Machines

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Combining SVMs with Various Feature Selection Strategies

Large Margin Classification Using the Perceptron Algorithm

Syllabus. 1. Visual classification Intro 2. SVM 3. Datasets and evaluation 4. Shallow / Deep architectures

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Generative and discriminative classification techniques

Generative and discriminative classification

The Effects of Outliers on Support Vector Machines

Classification: Linear Discriminant Functions

ECG782: Multidimensional Digital Signal Processing

Transcription:

SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines

Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training example (each pass through examples is called an epoch) 3. If it misclassifies an example modify the weights 4. Continue until the neural network classifies all training examples correctly (Derive gradient-descent update rule) Backpropagation 2

Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software Packages LIBSVM (LIBLINEAR) on the Resources page SVM-Light Which is the best decision boundary? x x 2 x and x 2 0 0 0 0 0 0 0 x x 2 x or x 2 0 0 0 0 0 x x x 2 x 2 3

Support Vector Machines Support vectors Maximum-margin decision hyperplane Maximize the margin What defines a hyperplane? 4

What defines a hyperplane? A hyperplane is defined by: A vector w Perpendicular to the hyperplane Often called the weight vector A scalar b Selects the hyperplane that is distance b from the origin from among all possible hyperplanes w b Classify a new instance (prediction) D = {(x i,y i ) i =...N} y i 2 {, } w x + b =0 w x + b<0 w x + b>0 x on the decision boundary x below the decision boundary x above the decision boundary w h(x) = sign(w x + b) X b 5

Learning (Training) How to calculate the margin? γ i r 0 x i (x i -r 0 ) w w The margin γ i is the length of the projection of the vector (x i -r 0 ) onto the weight Xvector The length of the projection of one vector onto another is just the dot product! w i = x i r 0 = x i w r 0 w = x i w + b Learning (Training) max,w,b i = x i w + b xj w + b s.t. y j =min i 8j i Multiply by y i to ensure positive max w,b xj w + b s.t. y j 8j Scale w so that γ=/iiwii max w,b s.t. y j x j w + b 8j Simplify constraints min w,b s.t. y j x j w + b 8j Max /x same as min x 6

Solving the Optimization Problem 8 min w,b 2 2 s.t. y j x j w + b 8j Need to optimize a quadratic function subject to linear constraints The solution involves constructing a dual problem where a Lagrange multiplier (a scalar) is associated with every constraint in the primary problem Solving the Optimization Problem min w,b 2 2 such that y (i) w x (i) + b 8i max min w,b X NX 2 2 i yi (w x i + b i= Dual Lagrange multipliers max NX X X XX i i j y i y j (x i x j ) 2 i= i j subject to i 0 and X i i y i =0 7

X X X X Solving the Optimization Problem The solution has the form: NX w = i y i x i and b = y i w x i for any x i s.t. i 6=0 i= Each non-zero alpha indicates corresponding x i is a support vector The classifying function has the form: X h(x) = sign i y i (x i x)+b i Relies on an dot product between the test point x and the support vectors x i Soft-margin Classification slack variables ξ i can be added to allow misclassification of difficult or noisy examples. Still, try to minimize training set errors, and to place hyperplane far from each class (large margin) ξ i ξ j 8

How many support vectors? Determined by alphas in optimization Typically only a small proportion of the training data The number of support vectors determines the run time for prediction How fast are SVMs? Training Time for training is dominated by the time for solving the underlying quadratic programming problem Slower than Naïve Bayes Non-linear SVMs are worse Testing (Prediction) Fast - as long as we don t have too many support vectors 9

Non-linear SVMs General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x φ(x) The Kernel trick The linear classifier relies on an inner product between vectors x it x j X g(x i ) = sign i i y (i) x (i) x + b If every example is mapped into a high-dimensional space via some transformation Φ: x φ(x) then the inner product becomes: X g(x i ) = sign i i y (i) '(x (i) ) '(x)+b A kernel function is some function that corresponds to a dot product in some transformed feature space: K(x i,x j )= φ(x i ) T φ(x j ) 0

The Kernel trick Sec. 5.2.3 The kernel K may be cheaper to compute then doing the actual transformation φ Implicitly do the transformation φ(x) = x x x x 2 x x 3 x 2 x x 2 x 2 x 2 x 3 x 3 x x 3 x 2 x 3 x 3 K(x, z) = = = ( n )( n ) x i z i x i z i i= j= n n x i x j z i z j i= j= n (x i x j )(z i z j ) i,j= * Kernels Why use kernels? n Make non-separable problem separable. n Map data into better representational space Common kernels n Linear n Polynomial K(x,z) = (+x T z) d n Radial basis function (infinite dimensional space)

Summary Support Vector Machines (SVMs) Find the maximum margin hyperplane Only the support vectors needed to determine hyperplane Use slack variables to allow some error Use a kernel function to make non-separable data separable Often among the best performing classifiers * 2