Connections between the Lasso and Support Vector Machines

Size: px

Start display at page:

Download "Connections between the Lasso and Support Vector Machines"

Wesley Reynolds
5 years ago
Views:

1 Connections between the Lasso and Support Vector Machines Martin Jaggi Ecole Polytechnique 2013 / 07 / 08 ROKS 13 - International Workshop on Advances in Regularization, Optimization, Kernel Methods and Support Vector Machines: Theory and Applications

2 Outline An Equivalence between the Lasso and Support Vector Machines Reduction from Lasso to SVM Reduction from SVM to Lasso Applications Greedy Algorithms (from optimization and signal processing)

3 SVM R d = large margin linear classifier Training data

4 SVM R d

5 SVM R d mirror blue points at the origin

6 SVM R d

7 Polytope distance n points in R d A 2 R d n w w min w2conv(a) kwk2 min x2 kaxk2

regularized offset/bias One Class (all with or without using kernels) kaxk 2 = x

8 SVM variants A 2 R d n whose dual problem is of the form min x2 kaxk2 Hard margin Soft margin (L2-loss) Soft margin (L1-loss) Two class no offset/bias Two class regularized offset/bias One Class (all with or without using kernels) kaxk 2 = x T A T Ax min w2r d, 2R, 2R n 1 2 k wk2 2 + C 2 P i 2 i s.t. y i w T X i i 8i 2 [1..n]

9 Lasso A 2 R d n b 2 R d = -regularized least squares regression `1 min kax kxk 1 applet bk2 Sparse regression Feature selection

10 Lasso A 2 R d n b 2 R d = -regularized least squares regression `1 min kax bk 2 x2l 1 L 1 := {x 2 R n kxk 1 apple 1} = conv({±e i }) Sparse regression Feature selection

11 (Lasso SVM) A 2 R d n b 2 R d Given a Lasso min kax bk 2 x2l 1 min x 0 2 kãx0 k 2 construct an equivalent SVM instance x = I n I n 2 L 1 R n (barycentric coordinates) x 0 2 R 2n min ka( I x 0 2 )x0 bk 2 n I n SVM: Ã := A A b1 T 2 R d 2n

12 (Lasso SVM) Geometric interpretation: min kax bk 2 x2l 1 b { A i } {A i } A conv(s) = conv(as) AL 1 = A conv({±e i }) = conv(a{±e i }) = conv({±a i })

13 (SVM Lasso) A 2 R d n Given an SVM min x2 kaxk2 min construct an equivalent Lasso instance x2l 1 kãx bk 2 more challenging reduction! Lasso: Ã := A + b1 T 2 R d n b / w w weakly separating for A w

14 (SVM Lasso) Geometric interpretation: w {Ãi} w Ã := A + b1 T 2 R d n b / w w weakly separating for A

15 (SVM Lasso) Geometric interpretation: w {Ãi} b w { Ã i } Ã := A + b1 T 2 R d n b / w w weakly separating for A

(SVM Lasso) Properties of the constructed Lasso

16 (SVM Lasso) Properties of the constructed Lasso instance {Ãi} b w w min kãx 2 bk x2l 1 { Ã i } Theorem: For any x 0 2 This x 0 2 x 2 L 1 for the Lasso, there is a vector, of the same or better Lasso objective. attains the same objective in the SVM. Ã := A + b1 T 2 R d n b / w w weakly separating for A

17 Implications: Algorithms apply to both problems sublinear time algorithms Õ(n + d) Implications for Lasso Kernelized version min x2l 1 X i (A i )x i (b) 2 H defined in terms of apple(a i,a j ), apple(a i,b), apple(b, b) apple(y, z) =h (y), (z)i

18 Implications for SVMs Support vectors = non-zeros in the Lasso solution number of SVs

19 Implications for SVMs Support vectors = non-zeros in the Lasso solution number of SVs Screening rules (discard points which can be guaranteed to be non-svs) w

20 Implications for SVMs Support vectors = non-zeros in the Lasso solution number of SVs Screening rules (discard points which can be guaranteed to be non-svs) w

21 Greedy Algorithms Convex optimization methods applied to min kax bk 2 x2l 1 Signal processing sparse recovery methods recover a sparse x from a noisy measurement b of Ax Frank-Wolfe

22 Greedy Algorithms Convex optimization methods applied to min kax bk 2 x2l 1 Signal processing sparse recovery methods recover a sparse x from a noisy measurement b of Ax f(x) x L 1 R n

23 Greedy Algorithms Convex optimization methods applied to min kax bk 2 x2l 1 Signal processing sparse recovery methods recover a sparse x from a noisy measurement b of Ax f(x) x L 1 R n

24 Greedy Algorithms Convex optimization methods applied to min kax bk 2 x2l 1 Signal processing sparse recovery methods recover a sparse x from a noisy measurement b of Ax f(x) x L 1 R n

25 Greedy Algorithms Convex optimization methods applied to min kax bk 2 x2l 1 Signal processing sparse recovery methods recover a sparse x from a noisy measurement b of Ax f(x) x L 1 R n

26 Greedy Algorithms Convex optimization methods applied to min kax bk 2 x2l 1 Signal processing sparse recovery methods recover a sparse x from a noisy measurement b of Ax f(x) i := arg max i rf(x) i ±e i x L 1 R n

27 Greedy Algorithms Convex optimization methods applied to min kax bk 2 x2l 1 Signal processing sparse recovery methods recover a sparse x from a noisy measurement b of Ax Frank-Wolfe selects the same atom per step matching pursuit fully corrective Frank-Wolfe i := arg max i equivalent to rf(x) i OMP

28 Thanks

Introduction to Support Vector Machines

Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,