Support vector machines. Dominik Wisniewski Wojciech Wawrzyniak

Similar documents
Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

SUPPORT VECTOR MACHINES

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

All lecture slides will be available at CSC2515_Winter15.html

Kernel Methods & Support Vector Machines

A Short SVM (Support Vector Machine) Tutorial

Support vector machines

Generative and discriminative classification techniques

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

Linear Regression: One-Dimensional Case

SUPPORT VECTOR MACHINES

Introduction to Support Vector Machines

Machine Learning for NLP

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Lecture 10: Support Vector Machines and their Applications

Support Vector Machines

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Lecture 7: Support Vector Machine

12 Classification using Support Vector Machines

Support Vector Machines and their Applications

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Speaker Verification Using SVM

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

More on Classification: Support Vector Machine

Support Vector Machines

CS 229 Midterm Review

Introduction to Machine Learning

Support Vector Machines

Hsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction

CS570: Introduction to Data Mining

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Lecture 9: Support Vector Machines

Support Vector Machines

Support Vector Machines.

CS446: Machine Learning Fall Problem Set 4. Handed Out: October 17, 2013 Due: October 31 th, w T x i w

An Introduction to Machine Learning

KBSVM: KMeans-based SVM for Business Intelligence

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

Information Management course

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

Classification: Linear Discriminant Functions

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

Support Vector Machines: Brief Overview" November 2011 CPSC 352

SUPPORT VECTOR MACHINE ACTIVE LEARNING

Lecture Linear Support Vector Machines

Support Vector Machines

SVM Classification in Multiclass Letter Recognition System

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs

ECG782: Multidimensional Digital Signal Processing

Transductive Learning: Motivation, Model, Algorithms

Support Vector Machines.

Module 4. Non-linear machine learning econometrics: Support Vector Machine

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Rule extraction from support vector machines

Neetha Das Prof. Andy Khong

Lab 2: Support vector machines

Discriminative classifiers for image recognition

Topic 7 Machine learning

Lecture 3: Convex sets

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

Classification by Support Vector Machines

Support Vector Machines (a brief introduction) Adrian Bevan.

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

ADVANCED CLASSIFICATION TECHNIQUES

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

Classification by Support Vector Machines

Support Vector Machines + Classification for IR

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

Naïve Bayes for text classification

Road-Sign Detection and Recognition Based on Support Vector Machines. Maldonado-Bascon et al. et al. Presented by Dara Nyknahad ECG 789

Lifting Transform, Voronoi, Delaunay, Convex Hulls

10. Support Vector Machines

Lecture 5: Linear Classification

Client Dependent GMM-SVM Models for Speaker Verification

Support Vector Machines for Face Recognition

Neural Networks and Deep Learning

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi

1 Case study of SVM (Rob)

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Lecture 3: Linear Classification

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Feature scaling in support vector data description

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

DM6 Support Vector Machines

Computerlinguistische Anwendungen Support Vector Machines

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Data Mining in Bioinformatics Day 1: Classification

5 Learning hypothesis classes (16 points)

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

One-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所

Data Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.

Data mining with Support Vector Machine

Support Vector Machines

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Math 414 Lecture 2 Everyone have a laptop?

Transcription:

Support vector machines Dominik Wisniewski Wojciech Wawrzyniak

Outline 1. A brief history of SVM. 2. What is SVM and how does it work? 3. How would you classify this data? 4. Are all the separating lines equally good? 5. An example of small and large margins. 6. Transforming the Data. 7. Learning. 8. Support vectors. 9. Kernel functions. 10.Predicting the classification. 11. References.

A brief history of SVM. SVMs were introduced by Boser, Guyon, Vapnik in 1992. SVMs have become popular because of their success in handwritten digit recognition. SVMs are now important and active field of all Machine Learning research and are regarded as an main example of kernel methods.

What is SVM and how does it work. Family of machine-learning algorithms that are used for mathematical and engineering problems including for example handwriting digit recognition, object recognition, speaker identification, face detections in images and target detection. Task: Assume we are given a set S of points x i R n with i = 1,2,..., N. Each point x i belongs to either of two classes and thus is given a label y i {-1,1}. The goal is to establish the equation of a hyperplane that divides S leaving all the points of the same class on the same side. SVM performs classification by constructing an N-dimensional hyperplane that optimally separates the data into two categories.

How would you classify this data? Let s consider the objects on illustration on the left. We can see that the objects belong to two different classes. The separating line (2 dimentional hyperplane) on the second picture is a decision plane which divides the objects into two subsets such that in each subset all elements are similar. Note: There are a lot of possible separating lines for a given set of objects. Are all the separating lines (decision boundaries = decision planes) equally good?

Are all the separating lines equally good? Among the possible hyperplanes, we select the one where the distance of the hyperplane from the closest data points (the margin ) is as large as possible. An intuitive justification for this criterion is: suppose the training data are good, in the sense that every possible test vector is within some radius r of a training vector. Then, if the chosen hyperplane is at least r from any training vector it will correctly separate all the test data. By making the hyperplane as far as possible from any data, r is allowed to be correspondingly large. The desired hyperplane (that maximizes the margin) is also the bisector of the line between the closest points on the convex hulls of the two data sets.

An example of small and large margins.

Transforming the Data The mathematical equation which describes the separating boundary between two classes should be simple. This is why we map the data of input space into feature space. The mapping (rearranging) involves increasing dimension of the feature space. The data points are mapped from the input space to a new feature space before they are used for training or for classification. After transforming the Data and after learning we look for an answer by examing simpler feature space. x ( x x ) ( x ) ( φ ( x ),..., φ ( )) = 1,..., n a φ = 1 N x Input space φ(.) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) Feature space

Learning Learning can be regarded as finding the maximum margin separating hiperplane between two classes of points. Suppose that a pair (w,b) defines a hyperplane which has the following equation: Let {x 1,..., x m } be our data set and let y i {1,-1} be the class label of x i. The decision boundary should classify all points correctly i.e. the following equations have to be satisfied: Among all hyperplanes separating the data, there exists a unique one yielding the maximum margin of separation between the classes which can be determined in the following way;

Support vectors (1) Let s notice that not all of the training points are important when choosing the hyperplane. To make it clear let s consider the following example: Let s make two covex hulls for the two seperate classes of pionts. It s is clear that the rear points are not important for choosing the decison boundary. At the above pictures the points which are relevant are marked by yellow colour. The points are called Support Vectros.

Support vectors (2) All training points have associated coefficients with them. The coefficients express the strength with which that points are embedded in the final decision function for any given test points. For all Support Vectors, which are the points that lie closest to the separating hyperplane, the coefficients are greater than 0. For the rest of the points the corresponding coefficients are equal to zero. The following equation describes the dependency between the training points and the decision boundary:,whereα i are positive real numbers the coefficients. The coefficients need to satisfy the following conditions: The coefficients have to be chosen to maximize the first equation.

Support vectors (3) Class 2 α 5 =0 α 8 =0. 6 α 10 =0 α 7 =0 α 2 =0 α 4 =0 α 9 =0 Class 1 α 3 =0 α 6 =1.4 α 1 =0.8

Kernel functions A Kernel is a function K, such that for all x x X 1, K( x ( ) ( ) 1, x ) = φ x φ x 2 2 1 2 It computes the similarity of two data points in the feature space using dot product. The selection of an appropriate kernel function is important, since the kernel function defines the feature space in which the training set examples will be classified. The kernel expresses prior knowledge about the phenomenon being modeled, encoded as a similarity measure between two vectors. A support vector machine can locate a separating hyperplane in the feature space and classify points in that space without even representing the space explicitly, simply by defining a kernel function, that plays the role of the dot product in the feature space.

Predicting the classification Let X be a test point. The Support Vector Machine will predict the classification of the test point X using the following formula: The function returns 1 or -1 depends on which class the X point belongs to. w ϕ(x ) b - this is a dot product of vector w and vector form the origin to the point ϕ(x ). - this is a shift of the hyperplane from the origin of the coordinate system.

References http://www.idi.ntnu.no/emner/it3704/lectures/papers/bennett_2000_ Support.pdf http://aya.technion.ac.il/karniel/cmcc/svm-tutorial.pdf http://www.ecs.soton.ac.uk/~srg/publications/pdf/svm.pdf http://en.wikipedia.org/wiki/support_vector_machine

Exercises Explain what is the difference between soft margin and regular margin. Give two examples of kernel function.