COMS 4771 Support Vector Machines. Nakul Verma

Similar documents
Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

Introduction to Machine Learning

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

Lecture 7: Support Vector Machine

Linear methods for supervised learning

Convex Optimization and Machine Learning

Kernel Methods & Support Vector Machines

Constrained optimization

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Convex Programs. COMPSCI 371D Machine Learning. COMPSCI 371D Machine Learning Convex Programs 1 / 21

All lecture slides will be available at CSC2515_Winter15.html

Perceptron Learning Algorithm

Support Vector Machines.

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

DM6 Support Vector Machines

Linear programming and duality theory

Programming, numerics and optimization

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

Support Vector Machines

Perceptron Learning Algorithm (PLA)

Support Vector Machines

SUPPORT VECTOR MACHINES

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

Introduction to Constrained Optimization

Support Vector Machines and their Applications

Unconstrained Optimization Principles of Unconstrained Optimization Search Methods

A Short SVM (Support Vector Machine) Tutorial

Natural Language Processing

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Nonlinear Programming

Introduction to Optimization

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Natural Language Processing

CME307/MS&E311 Theory Summary

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Mathematical Programming and Research Methods (Part II)

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

Introduction to Mathematical Programming IE496. Final Review. Dr. Ted Ralphs

PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR PROGRAMMING. 1. Introduction

Part 5: Structured Support Vector Machines

10. Support Vector Machines

CS 229 Midterm Review

Characterizing Improving Directions Unconstrained Optimization

Support vector machines

Linear Programming Problems

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Advanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs

CME307/MS&E311 Optimization Theory Summary

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

Introduction to Support Vector Machines

SVMs for Structured Output. Andrea Vedaldi

Support Vector Machines

Gate Sizing by Lagrangian Relaxation Revisited

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

Support Vector Machines.

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs

Convex Optimization. Lijun Zhang Modification of

Support Vector Machines

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

SUPPORT VECTOR MACHINE ACTIVE LEARNING

SUPPORT VECTOR MACHINES

12 Classification using Support Vector Machines

Chakra Chennubhotla and David Koes

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization

Local and Global Minimum

5 Machine Learning Abstractions and Numerical Optimization

Read: H&L chapters 1-6

4 Integer Linear Programming (ILP)

11 Linear Programming

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Module 4. Non-linear machine learning econometrics: Support Vector Machine

College of Computer & Information Science Fall 2007 Northeastern University 14 September 2007

Lecture 5: Linear Classification

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

Kernels and Constrained Optimization

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi

Support Vector Machines for Face Recognition

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding

Lecture 5: Duality Theory

FERDINAND KAISER Robust Support Vector Machines For Implicit Outlier Removal. Master of Science Thesis

Lecture 4 Duality and Decomposition Techniques

Machine Learning: Think Big and Parallel

Chapter II. Linear Programming

Constrained Optimization COS 323

Introduction to Optimization

Data-driven Kernels for Support Vector Machines

Support Vector Machines

Parallel and Distributed Graph Cuts by Dual Decomposition

Applied Lagrange Duality for Constrained Optimization

Machine Learning for Signal Processing Lecture 4: Optimization

Transcription:

COMS 4771 Support Vector Machines Nakul Verma

Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron Generalizing to non-linear boundaries (via Kernel space) Problems become linear in Kernel space The Kernel trick to speed up computation

Perceptron and Linear Separablity Say there is a linear decision boundary which can perfectly separate the training data Which linear separator will the Perceptron algorithm return? The separator with a large margin γ is better for generalization How can we incorporate the margin in finding the linear boundary?

Solution: Support Vector Machines (SVMs) Motivation: It returns a linear classifier that is stable solution by giving a maximum margin solution Slight modification to the problem provides a way to deal with nonseparable cases It is kernelizable, so gives an implicit way of yielding non-linear classification.

SVM Formulation Say the training data S is linearly separable by some margin (but the linear separator does not necessarily passes through the origin). Then: decision boundary: Linear classifier: Idea: we can try finding two parallel hyperplanes that correctly classify all the points, and maximize the distance between them!

SVM Formulation (contd. 1) Decision boundary for the two hyperpanes: Distance between the two hyperplanes: why? Training data is correctly classified if: if y i = +1 if y i = -1 Together: for all i

SVM Formulation (contd. 2) Distance between the hyperplanes: Training data is correctly classified if: Therefore, want: Maximize the distance: Let s put it in the standard form

SVM Formulation (finally!) Maximize: SVM standard (primal) form: Minimize: What can we do if the problem is not-linearly separable?

SVM Formulation (non-separable case) Idea: introduce a slack for the misclassified points, and minimize the slack! SVM standard (primal) form (with slack): Minimize:

SVM: Question SVM standard (primal) form (with slack): Minimize: Questions: 1. How do we find the optimal w, b and ξ? 2. Why is it called Support Vector Machine?

How to Find the Solution? Cannot simply take the derivative (wrt w, b and ξ) and examine the stationary points SVM standard (primal) form: Minimize: Why? Minimize: x 2 x 5 (infeasible region) x 2 Gradient not zero at the function minima (respecting the constraints)! x=5 x Need a way to do optimization with constraints

Detour: Constrained Optimization Constrained optimization (standard form): minimize subject to: for 1 i n (objective) (constraints) What to do? Projection methods start with a feasible solution x 0, find x 1 that has slightly lower objective value, if x 1 violates the constraints, project back to the constraints. iterate. Penalty methods We ll assume that the problem is feasible use a penalty function to incorporate the constraints into the objective

The Lagrange (Penalty) Method Consider the augmented function: Optimization problem: Minimize: (Lagrange function) (Lagrange variables, or dual variables) Observation: For any feasible x and all λ i 0, we have So, the optimal value to the constrained optimization: The problem becomes unconstrained in x!

The Dual Problem Optimal value: (also called the primal) Now, consider the function: Observation: Since, for any feasible x and all λ i 0: Optimization problem: Minimize: Lagrange function: Thus: (also called the dual)

(Weak) Duality Theorem Theorem (weak Lagrangian duality): (also called the minimax inequality) Optimization problem: Minimize: Lagrange function: Primal: (called the duality gap) Under what conditions can we achieve equality? Dual:

Convexity A function f: R d R is called convex iff for any two points x, x and β [0,1]

Convexity A set S R d is called convex iff for any two points x, x S and any β [0,1] Examples:

Convex Optimization A constrained optimization minimize subject to: for 1 i n (objective) (constraints) is called convex a convex optimization problem If: the objective function is convex function, and the feasible set induced by the constraints g i is a convex set Why do we care? We and find the optimal solution for convex problems efficiently!

Convex Optimization: Niceties Every local optima is a global optima in a convex optimization problem. Example convex problems: Linear programs, quadratic programs, Conic programs, semi-definite program. Several solvers exist to find the optima: CVX, SeDuMi, C-SALSA, We can use a simple descend-type algorithm for finding the minima!

Gradient Descent (for finding local minima) Theorem (Gradient Descent): Given a smooth function Then, for any and For sufficiently small, we have: Can derive a simple algorithm (the projected Gradient Descent): Initialize for t = 1,2, do (step in the gradient direction) (project back onto the constraints) terminate when no progress can be made, ie,

Back to Constrained Opt.: Duality Theorems Theorem (weak Lagrangian duality): Optimization problem: Minimize: Theorem (strong Lagrangian duality): If f is convex and for a feasible point x*, or when g is affine Lagrange function: Primal: Then Dual:

Ok, Back to SVMs Observations: object function is convex the constraints are affine, inducing a polytope constraint set. So, SVM is a convex optimization problem (in fact a quadratic program) SVM standard (primal) form: Minimize: (w,b) Moreover, strong duality holds. Let s examine the dual the Lagrangian is:

SVM Dual Lagrangian: Primal: Dual: SVM standard (primal) form: Minimize: (w,b) Unconstrained, let s calculate when α I > 0, the corresponding x i is the support vector w is only a function of the support vectors!

SVM Dual (contd.) Lagrangian: Primal: Dual: SVM standard (primal) form: Minimize: (w,b) Unconstrained, let s calculate So: subject to

SVM Optimization Interpretation SVM standard (primal) form: Minimize: (w,b) Maximize γ = 2/ w SVM standard (dual) form: Maximize: (α i ) Kernelized version Only a function of support vectors

What We Learned Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the optimal solution

Questions?

Next time Parametric and non-parametric Regression