CAP5415-Computer Vision Lecture 13-Support Vector Machines for Computer Vision Applica=ons

Size: px

Start display at page:

Download "CAP5415-Computer Vision Lecture 13-Support Vector Machines for Computer Vision Applica=ons"

Hester Matthews
5 years ago
Views:

1 CAP5415-Computer Vision Lecture 13-Support Vector Machines for Computer Vision Applica=ons Guest Lecturer: Dr. Boqing Gong Dr. Ulas Bagci 1

2 October 14 Reminders Choose your mini-projects (both). Send with a short proposal/explana=on. October 8 Due for Programming Assignment #3 2

3 PaWern Classifica=on Problem Suppose we are given two classes of objects, we are then faced with a new object and we have to assign it to one of the two classes. 3

4 Mo=va=on denotes +1 denotes -1 How would you classify this data? 4

5 Mo=va=on denotes +1 denotes -1 How would you classify this data? 5

6 Mo=va=on denotes +1 denotes -1 How would you classify this data? 6

7 Mo=va=on denotes +1 denotes -1 How would you classify this data? 7

8 Mo=va=on denotes +1 denotes -1 Any of these would be fine....but which is best? 8

9 Classifier Design denotes +1 denotes -1 Define the margin of a linear classifier as the width that the boundary could be increased by before hi`ng a datapoint. 9

10 Maximum Margin denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) 10

11 Maximum Margin denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM 11

12 Maximum Margin 1. Intuitively this feels safest. denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against 2. If we ve made a small error in the location of the boundary (it s been jolted in its perpendicular direction) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) this gives us least chance of causing a misclassification. 3. LOOCV is easy since the model is immune to removal of any nonsupport-vector datapoints. 4. There s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing. 5. Empirically it works very very well. 12

13 SVM A supervised approach for classifica=on and regression 13

14 SVM A supervised approach for classifica=on and regression Developed in the computer science society at 1990s, has grown in popularity since then. 14

15 SVM A supervised approach for classifica=on and regression Developed in the computer science society at 1990s, has grown in popularity since then. Shown to perform well in variety of se`ngs, and ocen considered as one of the best out-of-the box classifiers. 15

SVM An approach for classifica=on Developed

16 SVM An approach for classifica=on Developed in the computer science society at 1990s, has grown in popularity since then. Shown to perform well in variety of se`ngs, and ocen considered as one of the best out-of-the box classifiers. Max-margin Classifier Support Vector Classifier Support Vector Machines (historical appearances in the literature) 16

17 Maximal-Margin Classifier In a p-dimensional space, a hyperplane is a flat affine subspace of dimension p-1. 17

18 Maximal-Margin Classifier In a p-dimensional space, a hyperplane is a flat affine subspace of dimension p-1. Ex. In 2D, a hyperplane is 1D line, Ex. In 3D, a hyperplane is 2D plane. 18

19 Maximal-Margin Classifier In a p-dimensional space, a hyperplane is a flat affine subspace of dimension p-1. Ex. In 2D, a hyperplane is 1D line, Ex. In 3D, a hyperplane is 2D plane. General hyperplane defini=on X X px p =0 19

20 Maximal-Margin Classifier In a p-dimensional space, a hyperplane is a flat affine subspace of dimension p-1. Ex. In 2D, a hyperplane is 1D line, Ex. In 3D, a hyperplane is 2D plane. General hyperplane defini=on X X px p =0 Hyperplane for 2D data X X 2 =0 20

21 Maximal-Margin Classifier In a p-dimensional space, a hyperplane is a flat affine subspace of dimension p-1. Ex. In 2D, a hyperplane is 1D line, Ex. In 3D, a hyperplane is 2D plane. 1+2X 1 +3X 2 > 0 General hyperplane defini=on X X px p =0 Hyperplane for 2D data X X 2 =0 1+2X 1 +3X 2 < 0 1+2X 1 +3X 2 =0 21

22 Max-margin classifier Lec: There are two classes of observa=ons: blue and in purple (each of which has measurements on two variables). Three separa=ng hyperplanes, out of many possible, are shown in black. Right: A separa=ng hyperplane is shown in black: a test observa=on that falls in the blue por=on of the grid will be assigned to the blue class, and a test observa=on that falls into the purple por=on of the grid will be assigned to the purple class. 22

23 Max-margin classifier There are two classes of observa=ons, shown in blue and in purple. The maximal margin hyperplane is shown as a solid line. The margin is the distance from the solid line to either of the dashed lines. The two blue points and the purple point that lie on the dashed lines are the support vectors, and the distance from those points to the margin is indicated by arrows. The purple and blue grid indicates the decision rule made by a classifier based on this separa=ng hyperplane. 23

24 Construc=on of Max-Margin Classifier Consider n training observa=ons x 1,...,x n 2 R p 24

25 Construc=on of Max-Margin Classifier Consider n training observa=ons and associated class labels x 1,...,x n 2 R p y 1,...,y n 2 { 1, +1} 25

26 Construc=on of Max-Margin Classifier Consider n training observa=ons and associated class labels x 1,...,x n 2 R p y 1,...,y n 2 { 1, +1} Briefly, max-margin hyperplane is the solu=on to the op=miza=on problem: 26

27 Construc=on of Max-Margin Classifier Consider n training observa=ons and associated class labels x 1,...,x n 2 R p y 1,...,y n 2 { 1, +1} Briefly, max-margin hyperplane is the solu=on to the op=miza=on problem: This equa=on ensures that each observa=on is on the correct side of the hyperplane and at least a distance M from the hyperplane. Hence, M represents the margin of 27 our hyperplane.

28 Non-Separable Case The maximal margin classifier is a very natural way to perform Classifica=on, if a separa=ng hyperplane exists. In many cases there is no separa=ng hyperplane exists! In this case, we cannot exactly separate the two classes.(however, No=ce soc margin in the following slides!) The generaliza=on of the maximal margin classifier to the non-separable case is known as the support vector classifier. 28

29 Support Vector Classifier-Separable Separable plane is shown. Decision boundary is the solid line. Broken lines bound the Shaded maximal margin of 2M. 29

30 Support Vector Classifier-Overlap Overlap case is shown. The points labeled ξi are on the wrong side of the Margin. The margin is maximized subject to a total budget X i apple constant 30

31 Is Max-Margin Robust? Lec: Two classes of observa=ons are shown in blue and in purple, along with the maximal margin hyperplane. Right: An addi=onal blue observa=on has been added, leading to a drama=c shic in the maximal margin hyperplane shown as a solid line. The dashed line indicates the maximal Margin hyperplane that was obtained in the absence of this addi=onal point. 31 Max-margin classifier is sensi=ve to individual observa=ons!

32 Soc-Margin ~ Support Vector Classifier Rather than seeking the largest possible margin so that every observa=on is not only on the correct side of the hyperplane but also o the correct side of the margin, we instead allow some observa=ons to on the incorrect side of the margin, or even the incorrect side of the hyperplane. 32

33 Soc-Margin ~ Support Vector Classifier Rather than seeking the largest possible margin so that every observa=on is not only on the correct side of the hyperplane but also o the correct side of the margin, we instead allow some observa=ons to on the incorrect side of the margin, or even the incorrect side of the hyperplane. C: nonnega=ve tuning params, eps: slack variables allowing obs. to be on the wrong Side of the margin. 33

34 SV Classifier Le#: A support vector classifier was fit to a small data set. The hyperplane is shown as a solid line and the margins are shown as dashed lines. Purple observa=ons: Observa=ons 3, 4, 5, and 6 are on the correct side of the margin, observa=on 2 is on the margin, and observa=on 1 is on the wrong side of the margin. Blue observa=ons: Observa=ons 7 and 10 are on the correct side of the margin, observa=on 9 is on the margin, and observa=on 8 is on the wrong side of the margin. No observa=ons are on the wrong side of the hyperplane. Right: Two addi=onal points, 11 and 12 are added. 34

35 SV Classifier Four different tuning param C Were used to fit SVM into Small number of data. The largest value of C was used in the top lec panel, and smaller values were used in the top right bowom lec, and bowom right panels. When C is large, then there is a high tolerance for observa=ons Being on the wrong side of the margin, and so the margin will be large. As C decreases, the tolerance for observa=ons being on the wrong side of the margin decreases, and the margin narrows. 35

36 SV Classifier In prac,ce we are some,mes faced with non-linear class boundaries. Support vector classifier or any linear classifier will perform poorly here Le#: The observa=ons fall into two classes, with a non-linear boundary between them. Right: The support vector classifier seeks a linear boundary, and consequently performs very 36 poorly.

37 Non-Linear Support Vector Machines n General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x φ(x) 37

38 The Kernel Trick n n n n The linear classifier relies on dot product between vectors K(x i,x j )=x it x j If every data point is mapped into high-dimensional space via some transformation Φ: x φ(x), the dot product becomes: K(x i,x j )= φ(x i ) T φ(x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space. Example: 2-dimensional vectors x=[x 1 x 2 ]; let K(x i,x j )=(1 + x it x j ) 2, Need to show that K(x i,x j )= φ(x i ) T φ(x j ): K(x i,x j )=(1 + x it x j ) 2, = 1+ x i12 x j x i1 x j1 x i2 x j2 + x i22 x j x i1 x j1 + 2x i2 x j2 = [1 x i1 2 2 x i1 x i2 x i2 2 2x i1 2x i2 ] T [1 x j1 2 2 x j1 x j2 x j2 2 2x j1 2x j2 ] = φ(x i ) T φ(x j ), where φ(x) = [1 x x 1 x 2 x 2 2 2x 1 2x 2 ] 38

39 Example of Kernel Func=ons n Linear: K(x i,x j )= x i T x j n Polynomial of power p: K(x i,x j )= (1+ x i T x j ) p n Gaussian (radial-basis func=on network): xi x K( xi, xj) = exp( 2 2σ n Sigmoid: K(x i,x j )= tanh(β 0 x i T x j + β 1 ) j 2 ) 39

40 Non-linear SVM Mathema=cally n Dual problem formulation: Find α 1 α N such that Q(α) =Σα i - ½ΣΣα i α j y i y j K(x i, x j ) is maximized and (1) Σα i y i = 0 (2) α i 0 for all α i n The solution is: f(x) = Σα i y i K(x i, x j )+ b n Optimization techniques for finding α i s remain the same! 40

41 Formula=ng the op=miza=on problem Constraint becomes : Var 1 ξ i y ( w x + b) 1 ξ, x ξ 0 i i i i i w! ξ i r r w x+ b= w! x! r r w x+ b= 1 Var 2 + b = 0 Objec=ve func=on penalizes for misclassified instances and those within the margin 1 min 2 C trades-off margin width and misclassifica=ons 2 w + C ξi i 41 41

42 Non-Linear SVM Overview n SVM locates a separa=ng hyperplane in the feature space and classify points in that space n It does not need to represent the space explicitly, simply by defining a kernel func=on n The kernel func=on plays the role of the dot product in the feature space. 42

43 Disadvantages of Linear Decision Surfaces Var 1 Var

44 Advantages of non-linear Decision Surfaces Var 1 Var 2 44

45 SVM Classifier Le#: An SVM with a polynomial kernel of degree 3 is applied to the non-linear data from previous slides, resul=ng in a far more appropriate decision rule. Right: An SVM with a radial kernel is applied. In this example, either kernel is capable of capturing the decision boundary. 45

46 Mul=-class SVM SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2). What can be done? Answer: with output arity N, learn N SVM s SVM 1 learns Output==1 vs Output!= 1 SVM 2 learns Output==2 vs Output!= 2 : SVM N learns Output==N vs Output!= N Then to predict the output for a new input, just predict with each SVM and find out which one puts the predic=on the furthest into the posi=ve region. 46

47 Trade-off Between Flexibility and Interpretability In general, as the flexibility of a method increases, its interpretability decreases. 47

48 Why do SVM generalize? Even though they map to a very high-dimensional space They have a very strong bias in that space The solu=on has to be a linear combina=on of the training instances Large theory on Structural Risk Minimiza=on providing bounds on the error of an SVM Typically the error bounds too loose to be of prac=cal use 48

49 Prac=cal Issues Choice of kernel - Gaussian or polynomial kernel is default - if ineffec=ve, more elaborate kernels are needed - domain experts can give assistance in formula=ng appropriate similarity measures Choice of kernel parameters - e.g. σ in Gaussian kernel - σ is the distance between closest points with different classifica=ons - In the absence of reliable criteria, applica=ons rely on the use of a valida=on set or cross-valida=on to set such parameters. Op=miza=on criterion Hard margin v.s. Soc margin - a lengthy series of experiments in which various parameters are tested 49

50 CV Applica=on of SVM: Human Detec=on Final Feature Vectors Go to SVM 50

51 CV Applica=on of SVM: Pedestrian Detec=on Feature vectors: HOG: histogram of gradients 51

52 CV Applica=on of SVM: Pedestrian Detec=on 52

53 References and Slice Credits An excellent tutorial on VC-dimension and Support Vector Machines: C.J.C. Burges. A tutorial on support vector machines for pawern recogni=on. Data Mining and Knowledge Discovery, 2(2): , hwp://citeseer.nj.nec.com/ burges98tutorial.html The VC/SRM/SVM Bible: Sta=s=cal Learning Theory by Vladimir Vapnik, Wiley- Interscience; 1998 Andrew W. Moore, CMU James, WiWen, Has=e, Tibshirani: An Introduc=on to Sta=s=cal Learning 53

Support Vector Machines

Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to