Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization
|
|
- Florence Wilkins
- 5 years ago
- Views:
Transcription
1 Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization PhD defense of Federico Pierucci Thesis supervised by Prof. Zaid Harchaoui Prof. Anatoli Juditsky Dr. Jérôme Malick Université Grenoble Alpes Inria - Thoth and BiPoP Inria Grenoble June 23rd, 2017
2 Application 1: Collaborative filtering Collaborative filtering for recommendation systems Matrix completion optimization problem. Ratings X: film 1 film 2 film 3 Albert Ben Celine Diana Elia Franz Data: for user i and movie j X ij R, with (i, j) I: known ratings Purpose: predict a future rating New (i, j) X ij =? Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 2 / 33
3 Application 1: Collaborative filtering Collaborative filtering for recommendation systems Matrix completion optimization problem. Ratings X: film 1 film 2 film 3 Albert Ben Celine Diana Elia Franz Low rank assumption: Movies can be divided into a small number of types Data: for user i and movie j X ij R, with (i, j) I: known ratings Purpose: predict a future rating New (i, j) X ij =? For example: 1 min W R d k N (i,j) I W ij X ij + λ W σ,1 W σ,1 Nuclear norm = sum of singular values convex function surrogate of rank Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 2 / 33
4 Application 2: Multiclass classification Multiclass classification of images Example: ImageNet challenge 7 marmot 7 edgehog 7? Data (xi, yi ) Rd Rk : pairs of (image, category) Purpose: predict the category for a new image New image x 7 y =? Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 3 / 33
5 Application 2: Multiclass classification Multiclass classification of images Example: ImageNet challenge 7 marmot 7 edgehog 7? Data (xi, yi ) Rd Rk : pairs of (image, category) Purpose: predict the category for a new image New image x 7 y =? Low rank assumption: The features are assumed to be embedded in a lower dimensional space Multiclass version of support vector machine (SVM): min W Rd k N 1 X max 0, 1 + max {Wr> xi Wy>i xi } r s.t. r6=yi N i=1 {z k(ax,y W)+ k + λ kwkσ,1 } Wj Rd : the j-th column of W Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 3 / 33
6 These two problems have the form: ŷ 1 N {}} i { min l(y i, F(x i, W)) W R d k N i=1 }{{} =:R(W), empirical risk Notation Prediction F prediction function l loss function + λ W }{{} regularization Matrix learning problem Data: N number of examples x i feature vector y i outcome ŷ i predicted outcome y F(, W 1) F(, W 2) (xi, y i) x Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 4 / 33
7 These two problems have the form: ŷ 1 N {}} i { min l(y i, F(x i, W)) W R d k N i=1 }{{} =:R(W), empirical risk Notation Prediction F prediction function l loss function + λ W }{{} regularization Matrix learning problem Data: N number of examples x i feature vector y i outcome ŷ i predicted outcome Nonsmooth empirical risk: y F(, W 1) F(, W 2) (xi, y i) x Challenges Large scale: N, k, d Robust learning: g(ξ) = ξ max{0, ξ} Generalization nonsmooth regularization Noisy data, outliers nonsmooth empirical risk Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 4 / 33
8 My thesis in one slide min W }{{} 2nd contribution 1 N N l(y i, F(x i, W)) } i=1 {{ } 1st contribution + λ W }{{} 3rd contribution 1 - Smoothing techniques 2 - Conditional gradient algorithms 3 - Group nuclear norm Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 5 / 33
9 Motivations: Smoothing is a key tool in optimization Part 1 Unified view of smoothing techniques for first order optimization Smooth loss allows the use of gradient-based optimization g(ξ) = ξ g(ξ) = max{0, ξ} Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 6 / 33
10 Motivations: Smoothing is a key tool in optimization Part 1 Unified view of smoothing techniques for first order optimization Smooth loss allows the use of gradient-based optimization g(ξ) = ξ g(ξ) = max{0, ξ} Contributions: Unified view of smoothing techniques for nonsmooth functions New example: smoothing of top-k error (for list ranking and classification) Study of algorithms = smoothing + state of art algorithms for smooth problems Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 6 / 33
11 Part 2 Conditional gradient algorithms for doubly nonsmooth learning Motivations: Common matrix learning problems formulated as min W R d k R(W) }{{} nonsmooth emp.risk + λ W }{{} nonsmooth regularization Nonsmooth empirical risk, e.g. L1 norm robust to noise and outlyers Standard nonsmooth optimization methods not always scalable (e.g. nuclear norm) Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 7 / 33
12 Part 2 Conditional gradient algorithms for doubly nonsmooth learning Motivations: Common matrix learning problems formulated as min W R d k R(W) }{{} nonsmooth emp.risk + λ W }{{} nonsmooth regularization Nonsmooth empirical risk, e.g. L1 norm robust to noise and outlyers Standard nonsmooth optimization methods not always scalable (e.g. nuclear norm) Contributions: New algorithms based on (composite) conditional gradient Convergence analysis: rate of convergence + guarantees Some numerical experiences on real data Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 7 / 33
13 Motivations: Part 3 Regularization by group nuclear norm Structured matrices can join information coming from different sources Low-rank models improve robustness and dimensionality reduction W 1 W 1 = W 3 W 2 W 3 W 2 Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 8 / 33
14 Motivations: Part 3 Regularization by group nuclear norm Structured matrices can join information coming from different sources Low-rank models improve robustness and dimensionality reduction W 1 W 1 = W 3 W 2 W 3 W 2 Contributions: Definition of a new norm for matrices with underlying groups Analysis of its convexity properties Used as regularizer provides low rank by groups and aggregate models Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 8 / 33
15 Outline 1 Unified view of smoothing techniques 2 Conditional gradient algorithms for doubly nonsmooth learning 3 Regularization by group nuclear norm 4 Conclusion and perspectives Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 9 / 33
16 Smoothing techniques Purpose: to smooth a convex function g : R n R Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 10 / 33
17 Smoothing techniques Purpose: to smooth a convex function g : R n R Two techniques: 1) Product convolution [Bertsekas 1978] [Duchi et al. 2012] ( ) g pc γ (ξ) := g(ξ z) 1 z γ µ dz γ R n 0.5 µ : probability density 2) Infimal convolution [Moreau 1965] [Nesterov 2007] [Beck, Teboulle 2012] { ( )} g ic z γ (ξ) := inf g(ξ z) + γ ω z R n γ ω : smooth convex function Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 10 / 33
18 Smoothing techniques Purpose: to smooth a convex function g : R n R Two techniques: 1) Product convolution [Bertsekas 1978] [Duchi et al. 2012] ( ) g pc γ (ξ) := g(ξ z) 1 z γ µ dz γ R n 0.5 µ : probability density 2) Infimal convolution [Moreau 1965] [Nesterov 2007] [Beck, Teboulle 2012] { ( )} g ic z γ (ξ) := inf g(ξ z) + γ ω z R n γ ω : smooth convex function Result g γ is uniform approximation of g, i.e. m, M 0 : γm g γ(x) g(x) γm g γ is L γ-smooth, i.e. g γ differentiable, convex, g γ(x) g γ(y) L γ x y (L γ proportional to 1 ) γ where is the dual norm of Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 10 / 33
19 Smoothing surrogates of nonsmooth functions Purpose: obtain g γ to be used into algorithms (possibly) explicit expression easy to evaluate numerically Elementary example (in R) : absolute value g(x) = x Product convolution, with µ(x) = 1 2π e 1 2 x2 g c γ(x) = xf ( x ) 2 γ π γe x2 2γ 2 + xf ( x ) γ F (x) := 1 2π x e t2 2 dt cumulative distribution of Gaussian Infimal convolution, with ω(x) = 1 2 x 2 g ic γ (x) = { 1 2γ x2 + γ 2 x if x γ if x > γ Motivating nonsmooth function: top-k loss (next) Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 11 / 33
20 Top-3 loss for Classification Motivating nonsmooth functions: top-k loss Example: top-3 loss Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 12 / 33
21 Top-3 loss for Classification Motivating nonsmooth functions: top-k loss Example: top-3 loss Cat Ground truth 1 Paper towel 2 Wall 3 Cat Prediction = loss = 0 Good prediction if the true class is among the first 3 predicted. Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 12 / 33
22 Top-3 loss for Classification Motivating nonsmooth functions: top-k loss Example: top-3 loss Cat Ground truth 1 Paper towel 2 Wall 3 Cat Prediction = loss = 0 Good prediction if the true class is among the first 3 predicted. Top-3 loss for Ranking 1 Janis Joplins 2 David Bowie 3 Eric Clapton 4 Patty Smith 5 Jean-Jacques Goldman 6 Francesco Guccini. Grund truth 1 David Bowie 2 Patty Smith 3 Janis Joplins Prediction = loss = Predict an ordered list, the loss counts the mismatches to the true list Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 12 / 33
23 Convex top-k error function, written as a sublinear function g(x) = max x, z z Z { } n Z := z R n : 0 z i 1, z k i 1 = cube simplex i=1 Smoothing of top-k Case k = 1 Top-1 g(x) = x + = max{0, max{x i}} i ( n ) Infimal convolution with ω(x) = x i ln(x i) x i i=1 ( γ 1 + ln x n i=1 g γ(x) = e i ) x γ n if e i γ i=1 > 1 γ n x γ n if e i γ i=1 1 i=1 e x i x x Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 13 / 33
24 Convex top-k error function, written as a sublinear function g(x) = max x, z z Z { } n Z := z R n : 0 z i 1, z k i 1 = cube simplex i=1 Smoothing of top-k Case k = 1 Top-1 g(x) = x + = max{0, max{x i}} i ( n ) Infimal convolution with ω(x) = x i ln(x i) x i i=1 γ g γ(x) = ( 1 + ln x n e i ) γ i=1 if n i=1 e x i γ > 1 Classification Same result as in statistics [Hastie et al., 2008] γ = 1 multinomial logistic loss Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 13 / 33
25 Smoothing of top-k case k > 1 Infimal convolution with ω = t < 0 H γ(t) 1 = 2 t2 t [0, 1 ] k t > 1 k t 1 k k 2 g γ(x) = λ (x, γ) + We need to solve an auxiliary problem (smooth dual problem) Evaluate g γ(x) through the dual problem Define P x := {x i, x i k : i = 1... n} Θ (λ) = 1 t j P x π [0,1/k] (t j + λ) Find a, b P x s.t. Θ (a) 0 Θ (b) { } λ (x, γ) = max 0, a Θ (a)(b a) Θ (b) Θ (a) n H γ(x i + λ (x, γ)) i=1 Θ (λ) a b λ Θ (λ) λ t j Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 14 / 33
26 Outline 1 Unified view of smoothing techniques 2 Conditional gradient algorithms for doubly nonsmooth learning 3 Regularization by group nuclear norm 4 Conclusion and perspectives Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 15 / 33
27 Matrix learning problem min W R d k R(W) }{{} nonsmooth + λ Ω(W) }{{} nonsmooth Empirical risk R(W) := 1 N l(w, xi, yi) N i=1 Top-k for ranking and multiclass classification L1 for regression l 1(W, x, y) := A x,yw l 1(W, x, y) := (A x,yw) + Regularizer (typically norm) Nuclear norm W σ,1 Ω(W) sparsity on singular values L1 norm W 1 := d k Wij sparsity on entries i=1 j=1 Group nuclear norm Ω G(W) (of contribution 3) sparsity feature selection Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 16 / 33
28 Existing algorithms for nonsmooth optimization min W R d k R(W) }{{} nonsmooth + λ Ω(W) }{{} nonsmooth Subgradient, bundle algorithms [Nemirovski, Yudin 1976] [Lemarechal 1979] Proximal algorithms [Douglas, Rachford 1956] Algorithms are not scalable for nuclear norm: iteration cost full SVD = O(dk 2 ) Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 17 / 33
29 Existing algorithms for nonsmooth optimization min W R d k R(W) }{{} nonsmooth + λ Ω(W) }{{} nonsmooth Subgradient, bundle algorithms [Nemirovski, Yudin 1976] [Lemarechal 1979] Proximal algorithms [Douglas, Rachford 1956] Algorithms are not scalable for nuclear norm: iteration cost full SVD = O(dk 2 ) What if the loss were smooth? min W R d k S(W) }{{} smooth + λ Ω(W) }{{} nonsmooth Algorithms with faster convergence when S is smooth Proximal gradient algorithms [Nesterov 2005] [Beck, Teboulle, 2009] Still not scalable for nuclear norm: iteration cost full SVD (Composite) conditional gradient algorithms [Frank, Wolfe, 1956][Harchaoui, Juditsky, Nemirovski, 2013] Efficient iterations for nuclear norm: iteration cost compute largest singular value = O(dk) Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 17 / 33
30 Composite conditional gradient algorithm State of art algorithm: min W R d k S(W) }{{} smooth + λ Ω(W) }{{} nonsmooth Composite conditional gradient algorithm Let W 0 = 0 r 0 such that Ω(W ) r 0 for t = 0... T do Compute Z t = argmin S(W t), D D s.t. Ω(D) r t α t, β t = argmin α,β 0; α+β 1 Update W t+1 = α tz t + β tw t r t+1 = (α t + β t)r t end for S(αZ t + βw t) + λ(α + β)r t [gradient step] [optimal stepsize] where W t, Z t, D R d k Efficient and scalable for some Ω e.g. nuclear norm, where Z t = uv Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 18 / 33
31 Conditional gradient despite nonsmooth loss Use conditional gradient replacing S(W t) with a subgradient s t R(W t) Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 19 / 33
32 Conditional gradient despite nonsmooth loss Use conditional gradient replacing S(W t) with a subgradient s t R(W t) Simple counter example in R 2 min Aw + b 1 + w 1 2 w R w 2 atoms of A level sets of R emp w 1 algorithm stuck at (0, 0) here the algorithm using a subgradient of a nonsmooth empirical risk doe Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 19 / 33
33 Smoothed composite conditional gradient algorithm Idea: Replace the nonsmooth loss with a smoothed loss min W R d k R(W) + λ Ω(W) min }{{} W R d k R γ(w) + λ Ω(W) }{{} nonsmooth smooth {R γ} γ>0 family of smooth approximations of R Let W 0 = 0 r 0 such that Ω(W ) r 0 for t = 0... T do Compute Z t = argmin R γt (W t), D D s.t. Ω(D) r t α t, β t = argmin α,β 0; α+β 1 Update W t+1 = α tz t + β tw t r t+1 = (α t + β t)r t end for α t, β t = stepsize γ t = smoothing parameter R γt (αz t + βw t) + λ(α + β)r t Note: We want solve the initial doubly nonsmooth problem Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 20 / 33
34 Convergence analysis Doubly nonsmooth problem min F (W) = R(W) + λ Ω(W) W R d k W optimal solution γ t = smoothing parameter ( stepsize) Theorems of convergence Fixed smoothing of R γ t = γ F (W t) F (W ) γm + Dimensionality freedom of M depends on ω or µ The best γ depends on the required accuracy ε 2 γ(t + 14) Time-varying smoothing of R γ t = γ 0 t+1 F (W t) F (W ) C t Dimensionality freedom of C depends on ω or µ, γ 0 and W Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 21 / 33
35 Algoritm implementation Package All the Matlab code written from scratch, in particular: Multiclass SVM Top-k multiclass SVM All other smoothed functions Memory Efficient memory management Tools to operate with low rank variables Tools to work with sparse sub-matrices of low rank matrices (collaborative filtering) Numerical experiments - 2 motivating applications Fix smoothing - matrix completion (regression) Time-varying smoothing - top-5 multiclass classification Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 22 / 33
36 Fix smoothing Example with matrix completion, regression Data: Movielens Benchmark d = users Iterates W t generated on a train set k = movies We observe R(W t) on the validation set ratings (= 1.3%) Choose the best γ that minimizes R(W t) in the ( normalized into [0,1] ) validation set Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 23 / 33
37 Fix smoothing Example with matrix completion, regression Data: Movielens Benchmark d = users Iterates W t generated on a train set k = movies We observe R(W t) on the validation set ratings (= 1.3%) Choose the best γ that minimizes R(W t) in the ( normalized into [0,1] ) validation set Emp. risk on validation set, λ= γ = γ = 0.01 γ = 0.1 γ = 0.5 γ = 1 γ = 5 γ = 10 γ = 50 γ = best iterations Each γ gives a different optimization problem Tiny smoothing slower convergence Large smoothing objective much different than the initial one Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 23 / 33
38 Time-varying smoothing Example with top-5 multiclass classification Data: ImageNet k = 134 classes N = images Features: BOW d = 4096 features Benchmark Iterates W t generated on a train set We observe top-5 misclassification error on the validation set To compare: find best fixed smoothing parameter (using the other benchmark) Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 24 / 33
39 Time-varying smoothing Example with top-5 multiclass classification Data: ImageNet k = 134 classes N = images Features: BOW d = 4096 features Benchmark Iterates W t generated on a train set We observe top-5 misclassification error on the validation set To compare: find best fixed smoothing parameter (using the other benchmark) Time-varying smoothing parameter p { 1 2, 1} γ t = γ 0 (1 + t) p Misclassification error Validation set iterations γ 0 =0.01 p=1 γ 0 =0.1 p=1 γ 0 =1 p=1 γ 0 =10 p=1 γ 0 =100 p=1 γ 0 =0.01 p=0.5 γ 0 =0.1 p=0.5 γ 0 =1 p=0.5 γ 0 =10 p=0.5 γ 0 =100 p=0.5 γ=0.1 non ad. No need to tune γ 0: Time-varying smoothing matches the performances of the best experimentally tuned fixed smoothing Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 24 / 33
40 Outline 1 Unified view of smoothing techniques 2 Conditional gradient algorithms for doubly nonsmooth learning 3 Regularization by group nuclear norm 4 Conclusion and perspectives Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 25 / 33
41 Group nuclear norm Matrix generalization of the popular group lasso norm [Turlach et al., 2005] [Yuan and Lin, 2006] [Zhao et al., 2009] [Jacob et al., 2009] Nuclear norm W σ,1 : sum of singular values of W Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 26 / 33
42 Group nuclear norm Matrix generalization of the popular group lasso norm [Turlach et al., 2005] [Yuan and Lin, 2006] [Zhao et al., 2009] [Jacob et al., 2009] Nuclear norm W σ,1 : sum of singular values of W W = W 1 i 1 Π 1 W 2 W 3 i 3 i 3 Π 3 W 1 W 2 i g: immersion Π g: projection Π 3 W 3 G = {1, 2, 3} Ω G(W) := W= min g G i g(w g) α g W g σ,1 g G [Tomioka, Suzuki 2013] non-overlapping groups Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 26 / 33
43 Group nuclear norm Matrix generalization of the popular group lasso norm [Turlach et al., 2005] [Yuan and Lin, 2006] [Zhao et al., 2009] [Jacob et al., 2009] Nuclear norm W σ,1 : sum of singular values of W W = W 1 i 1 Π 1 W 2 W 3 i 3 i 3 Π 3 W 1 W 2 i g: immersion Π g: projection Π 3 W 3 G = {1, 2, 3} Ω G(W) := W= min g G i g(w g) α g W g σ,1 g G [Tomioka, Suzuki 2013] non-overlapping groups Convex analysis - theoretical study Fenchel conjugate Ω G Dual norm Ω G Expression of Ω G as a support function Convex hull of functions involving rank Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 26 / 33
44 Convex hull - Results In words, the convex hull is the largest convex function lying below the given one Properly restricted to a ball, the nuclear norm is the convex hull of rank [Fazel 2001] generalization Theorem Properly restricted to a ball, group nuclear norm is the convex hull of: The reweighted group rank function: Ω rank G (W) := inf α g rank(w g) W= i g(w g) g G g G The reweighted restricted rank function: Ω rank (W) := min g G α g rank(w) + δ g(w) δ g indicator function Learning with group nuclear norm enforces low-rank property on groups Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 27 / 33
45 Learning with group nuclear norm Usual optimization algorithms can handle the group nuclear norm: composite conditional gradient algorithms (accelerated) proximal gradient algorithms Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 28 / 33
46 Learning with group nuclear norm Usual optimization algorithms can handle the group nuclear norm: composite conditional gradient algorithms (accelerated) proximal gradient algorithms Illustration with proximal gradient optimization algorithm The key computations are parallelized on each group Good scalability when there are many small groups prox of group nuclear norm where D γ : soft thresholding operator prox γωg ((W g) g) = ( U gd γ(s g)v g ) g G SVD decomposition W g = U gs gv g D γ(s) = Diag({max{s i γ, 0}} 1 i r ). Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 28 / 33
47 Learning with group nuclear norm Usual optimization algorithms can handle the group nuclear norm: composite conditional gradient algorithms (accelerated) proximal gradient algorithms Illustration with proximal gradient optimization algorithm The key computations are parallelized on each group Good scalability when there are many small groups prox of group nuclear norm where D γ : soft thresholding operator prox γωg ((W g) g) = ( U gd γ(s g)v g ) g G SVD decomposition W g = U gs gv g D γ(s) = Diag({max{s i γ, 0}} 1 i r ). Package in Matlab, in particular: vector space of group nuclear norm, overloading of + * Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 28 / 33
48 Numerical illustration: matrix completion Ground truth data Synthetic low rank matrix X sum of 10 rank-1 groups normalized to have µ = 0, σ = Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 29 / 33
49 Numerical illustration: matrix completion Ground truth data Synthetic low rank matrix X sum of 10 rank-1 groups normalized to have µ = 0, σ = Observation observations Uniform 10% sampling X ij with (i, j) I Gaussian additive noise σ = Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 29 / 33
50 Numerical illustration: matrix completion Solution W? Ground truth X data solution Recovery error: 1 kw? Xk2 = N min W Rd k Pierucci 1 X N 1 (Wij 2 Xij )2 + λ ΩG (W) (i,j) I Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 30 / 33
51 Outline 1 Unified view of smoothing techniques 2 Conditional gradient algorithms for doubly nonsmooth learning 3 Regularization by group nuclear norm 4 Conclusion and perspectives Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 31 / 33
52 Summary Smoothing Versatile tool in optimization Ways to combine smoothing with many existing algorithms Time-varying smoothing Theory: minimization convergence analysis Practice: recover the best, no need to tune γ Group nuclear norm Theory and practice to combine groups and rank sparsity Overlapping groups Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 32 / 33
53 Perspectives Smoothing for faster convergence: Moreau-Yosida smoothing can be used to improve the condition number of poorly conditioned objectives before applying linearly-convergent convex optimization algorithms [Hongzhou et al. 2017] Smoothing for better prediction: Smoothing can be adapted to properties of the dataset and be used to improve the prediction performance of machine learning algorithms Learning group structure and weights for better prediction: The group structure in the group nuclear norm can be learned to leveraged underlying structure and improve the prediction Extensions to group Schatten norm Potential applications of group nuclear norm multi-attribute classification multiple tree hierarchies dimensionality reduction, feature selection e.g. concatenate features, avoid PCA Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 33 / 33
54 Perspectives Smoothing for faster convergence: Moreau-Yosida smoothing can be used to improve the condition number of poorly conditioned objectives before applying linearly-convergent convex optimization algorithms [Hongzhou et al. 2017] Smoothing for better prediction: Smoothing can be adapted to properties of the dataset and be used to improve the prediction performance of machine learning algorithms Learning group structure and weights for better prediction: The group structure in the group nuclear norm can be learned to leveraged underlying structure and improve the prediction Extensions to group Schatten norm Potential applications of group nuclear norm multi-attribute classification multiple tree hierarchies dimensionality reduction, feature selection e.g. concatenate features, avoid PCA Thank You Pierucci Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization 33 / 33
Proximal operator and methods
Proximal operator and methods Master 2 Data Science, Univ. Paris Saclay Robert M. Gower Optimization Sum of Terms A Datum Function Finite Sum Training Problem The Training Problem Convergence GD I Theorem
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - C) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationConditional gradient algorithms for machine learning
Conditional gradient algorithms for machine learning Zaid Harchaoui LEAR-LJK, INRIA Grenoble Anatoli Juditsky LJK, Université de Grenoble Arkadi Nemirovski Georgia Tech Abstract We consider penalized formulations
More informationConvex optimization algorithms for sparse and low-rank representations
Convex optimization algorithms for sparse and low-rank representations Lieven Vandenberghe, Hsiao-Han Chao (UCLA) ECC 2013 Tutorial Session Sparse and low-rank representation methods in control, estimation,
More informationConvex Optimization / Homework 2, due Oct 3
Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a
More informationConvex Optimization MLSS 2015
Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :
More informationConvex and Distributed Optimization. Thomas Ropars
>>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationOptimization for Machine Learning
with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex
More informationLimitations of Matrix Completion via Trace Norm Minimization
Limitations of Matrix Completion via Trace Norm Minimization ABSTRACT Xiaoxiao Shi Computer Science Department University of Illinois at Chicago xiaoxiao@cs.uic.edu In recent years, compressive sensing
More informationConditional gradient algorithms for machine learning
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationAn accelerated proximal gradient algorithm for nuclear norm regularized least squares problems
An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems Kim-Chuan Toh Sangwoon Yun Abstract The affine rank minimization problem, which consists of finding a matrix
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More information1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics
1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationIteratively Re-weighted Least Squares for Sums of Convex Functions
Iteratively Re-weighted Least Squares for Sums of Convex Functions James Burke University of Washington Jiashan Wang LinkedIn Frank Curtis Lehigh University Hao Wang Shanghai Tech University Daiwei He
More informationA More Efficient Approach to Large Scale Matrix Completion Problems
A More Efficient Approach to Large Scale Matrix Completion Problems Matthew Olson August 25, 2014 Abstract This paper investigates a scalable optimization procedure to the low-rank matrix completion problem
More informationMining Sparse Representations: Theory, Algorithms, and Applications
Mining Sparse Representations: Theory, Algorithms, and Applications Jun Liu, Shuiwang Ji, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona State University What is Sparsity?
More informationLearning with infinitely many features
Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example
More informationThe convex geometry of inverse problems
The convex geometry of inverse problems Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Joint work with Venkat Chandrasekaran Pablo Parrilo Alan Willsky Linear Inverse Problems
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationRevisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. Author: Martin Jaggi Presenter: Zhongxing Peng
Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization Author: Martin Jaggi Presenter: Zhongxing Peng Outline 1. Theoretical Results 2. Applications Outline 1. Theoretical Results 2. Applications
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationOptimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund
Optimization Plugin for RapidMiner Technical Report Venkatesh Umaashankar Sangkyun Lee 04/2012 technische universität dortmund Part of the work on this technical report has been supported by Deutsche Forschungsgemeinschaft
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationNon-exhaustive, Overlapping k-means
Non-exhaustive, Overlapping k-means J. J. Whang, I. S. Dhilon, and D. F. Gleich Teresa Lebair University of Maryland, Baltimore County October 29th, 2015 Teresa Lebair UMBC 1/38 Outline Introduction NEO-K-Means
More informationThe flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department
More informationPIPA: A New Proximal Interior Point Algorithm for Large-Scale Convex Optimization
PIPA: A New Proximal Interior Point Algorithm for Large-Scale Convex Optimization Marie-Caroline Corbineau 1, Emilie Chouzenoux 1,2, Jean-Christophe Pesquet 1 1 CVN, CentraleSupélec, Université Paris-Saclay,
More informationConvex Optimization. Lijun Zhang Modification of
Convex Optimization Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf Outline Introduction Convex Sets & Functions Convex Optimization
More informationSpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian
SpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School of Information Science and Technology Department of
More informationLeaves Machine Learning and Optimization Library
Leaves Machine Learning and Optimization Library October 7, 07 This document describes how to use LEaves Machine Learning and Optimization Library (LEMO) for modeling. LEMO is an open source library for
More informationComposite Self-concordant Minimization
Composite Self-concordant Minimization Volkan Cevher Laboratory for Information and Inference Systems-LIONS Ecole Polytechnique Federale de Lausanne (EPFL) volkan.cevher@epfl.ch Paris 6 Dec 11, 2013 joint
More informationSection 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1
Section 5 Convex Optimisation 1 W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 2018 page 5-1 Convex Combination Denition 5.1 A convex combination is a linear combination of points where all coecients
More informationSparsity and image processing
Sparsity and image processing Aurélie Boisbunon INRIA-SAM, AYIN March 6, Why sparsity? Main advantages Dimensionality reduction Fast computation Better interpretability Image processing pattern recognition
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationGradient LASSO algoithm
Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents
More informationRobust Principal Component Analysis (RPCA)
Robust Principal Component Analysis (RPCA) & Matrix decomposition: into low-rank and sparse components Zhenfang Hu 2010.4.1 reference [1] Chandrasekharan, V., Sanghavi, S., Parillo, P., Wilsky, A.: Ranksparsity
More informationWhen Sparsity Meets Low-Rankness: Transform Learning With Non-Local Low-Rank Constraint For Image Restoration
When Sparsity Meets Low-Rankness: Transform Learning With Non-Local Low-Rank Constraint For Image Restoration Bihan Wen, Yanjun Li and Yoram Bresler Department of Electrical and Computer Engineering Coordinated
More informationMachine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums
Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September
More informationAspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology
Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology Hausdorff Institute for Mathematics (HIM) Trimester: Mathematics of Signal Processing
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationA General Greedy Approximation Algorithm with Applications
A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been
More informationLecture 19: Convex Non-Smooth Optimization. April 2, 2007
: Convex Non-Smooth Optimization April 2, 2007 Outline Lecture 19 Convex non-smooth problems Examples Subgradients and subdifferentials Subgradient properties Operations with subgradients and subdifferentials
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationTutorial on Convex Optimization for Engineers
Tutorial on Convex Optimization for Engineers M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de
More informationConvex Optimization: from Real-Time Embedded to Large-Scale Distributed
Convex Optimization: from Real-Time Embedded to Large-Scale Distributed Stephen Boyd Neal Parikh, Eric Chu, Yang Wang, Jacob Mattingley Electrical Engineering Department, Stanford University Springer Lectures,
More informationSubspace Clustering with Global Dimension Minimization And Application to Motion Segmentation
Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace
More informationThree-Dimensional Sensors Lecture 6: Point-Cloud Registration
Three-Dimensional Sensors Lecture 6: Point-Cloud Registration Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/ Point-Cloud Registration Methods Fuse data
More informationLecture 19: November 5
0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationEE8231 Optimization Theory Course Project
EE8231 Optimization Theory Course Project Qian Zhao (zhaox331@umn.edu) May 15, 2014 Contents 1 Introduction 2 2 Sparse 2-class Logistic Regression 2 2.1 Problem formulation....................................
More informationOutlier Pursuit: Robust PCA and Collaborative Filtering
Outlier Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University of Singapore Joint w/ Constantine Caramanis, Yudong Chen, Sujay
More informationSparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual
Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know learn the proximal
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein October 8, 2015 1 / 30 Introduction Presentation Outline 1 Convex
More informationConvexity Theory and Gradient Methods
Convexity Theory and Gradient Methods Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Convex Functions Optimality
More informationTransfer Learning Algorithms for Image Classification
Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell 1 Motivation Goal: We want to be able to build classifiers for thousands of visual
More informationSparse & Functional Principal Components Analysis
Sparse & Functional Principal Components Analysis Genevera I. Allen Department of Statistics and Electrical and Computer Engineering, Rice University, Department of Pediatrics-Neurology, Baylor College
More informationx 1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) 75 x y=20
SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) L R 75 x 2 38 0 y=20-38 -75 x 1 1 Introducing the model 2 1 Brain decoding
More informationA fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]
A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] Yongjia Song University of Wisconsin-Madison April 22, 2010 Yongjia Song
More informationLearning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro
Learning Low-rank Transformations: Algorithms and Applications Qiang Qiu Guillermo Sapiro Motivation Outline Low-rank transform - algorithms and theories Applications Subspace clustering Classification
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationLECTURE 18 LECTURE OUTLINE
LECTURE 18 LECTURE OUTLINE Generalized polyhedral approximation methods Combined cutting plane and simplicial decomposition methods Lecture based on the paper D. P. Bertsekas and H. Yu, A Unifying Polyhedral
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationRandom Walk Distributed Dual Averaging Method For Decentralized Consensus Optimization
Random Walk Distributed Dual Averaging Method For Decentralized Consensus Optimization Cun Mu, Asim Kadav, Erik Kruus, Donald Goldfarb, Martin Renqiang Min Machine Learning Group, NEC Laboratories America
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers Customizable software solver package Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein April 27, 2016 1 / 28 Background The Dual Problem Consider
More informationConvexization in Markov Chain Monte Carlo
in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non
More informationCS 231A Computer Vision (Autumn 2012) Problem Set 1
CS 231A Computer Vision (Autumn 2012) Problem Set 1 Due: Oct. 9 th, 2012 (2:15 pm) 1 Finding an Approximate Image asis EigenFaces (25 points) In this problem you will implement a solution to a facial recognition
More informationELEG Compressive Sensing and Sparse Signal Representations
ELEG 867 - Compressive Sensing and Sparse Signal Representations Introduction to Matrix Completion and Robust PCA Gonzalo Garateguy Depart. of Electrical and Computer Engineering University of Delaware
More informationData fusion and multi-cue data matching using diffusion maps
Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by
More informationarxiv: v1 [cs.cv] 2 May 2016
16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer
More informationA Multilevel Proximal Gradient Algorithm for a Class of Composite Optimization Problems
A Multilevel Proximal Gradient Algorithm for a Class of Composite Optimization Problems Panos Parpas May 9, 2017 Abstract Composite optimization models consist of the minimization of the sum of a smooth
More informationMathematical Programming and Research Methods (Part II)
Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types
More informationParallel and Distributed Sparse Optimization Algorithms
Parallel and Distributed Sparse Optimization Algorithms Part I Ruoyu Li 1 1 Department of Computer Science and Engineering University of Texas at Arlington March 19, 2015 Ruoyu Li (UTA) Parallel and Distributed
More informationSparse Estimation of Movie Preferences via Constrained Optimization
Sparse Estimation of Movie Preferences via Constrained Optimization Alexander Anemogiannis, Ajay Mandlekar, Matt Tsao December 17, 2016 Abstract We propose extensions to traditional low-rank matrix completion
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More informationDetecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference
Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400
More informationTowards direct motion and shape parameter recovery from image sequences. Stephen Benoit. Ph.D. Thesis Presentation September 25, 2003
Towards direct motion and shape parameter recovery from image sequences Stephen Benoit Ph.D. Thesis Presentation September 25, 2003 September 25, 2003 Towards direct motion and shape parameter recovery
More informationConic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding
Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding B. O Donoghue E. Chu N. Parikh S. Boyd Convex Optimization and Beyond, Edinburgh, 11/6/2104 1 Outline Cone programming Homogeneous
More informationLaplacian Eigenmaps and Bayesian Clustering Based Layout Pattern Sampling and Its Applications to Hotspot Detection and OPC
Laplacian Eigenmaps and Bayesian Clustering Based Layout Pattern Sampling and Its Applications to Hotspot Detection and OPC Tetsuaki Matsunawa 1, Bei Yu 2 and David Z. Pan 3 1 Toshiba Corporation 2 The
More informationThe Benefit of Tree Sparsity in Accelerated MRI
The Benefit of Tree Sparsity in Accelerated MRI Chen Chen and Junzhou Huang Department of Computer Science and Engineering, The University of Texas at Arlington, TX, USA 76019 Abstract. The wavelet coefficients
More informationEfficient MR Image Reconstruction for Compressed MR Imaging
Efficient MR Image Reconstruction for Compressed MR Imaging Junzhou Huang, Shaoting Zhang, and Dimitris Metaxas Division of Computer and Information Sciences, Rutgers University, NJ, USA 08854 Abstract.
More informationCLEANING UP TOXIC WASTE: REMOVING NEFARIOUS CONTRIBUTIONS TO RECOMMENDATION SYSTEMS
CLEANING UP TOXIC WASTE: REMOVING NEFARIOUS CONTRIBUTIONS TO RECOMMENDATION SYSTEMS Adam Charles, Ali Ahmed, Aditya Joshi, Stephen Conover, Christopher Turnes, Mark Davenport Georgia Institute of Technology
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationSparsity Based Regularization
9.520: Statistical Learning Theory and Applications March 8th, 200 Sparsity Based Regularization Lecturer: Lorenzo Rosasco Scribe: Ioannis Gkioulekas Introduction In previous lectures, we saw how regularization
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationTemplates for Convex Cone Problems with Applications to Sparse Signal Recovery
Mathematical Programming Computation manuscript No. (will be inserted by the editor) Templates for Convex Cone Problems with Applications to Sparse Signal Recovery Stephen R. Becer Emmanuel J. Candès Michael
More informationConnections between the Lasso and Support Vector Machines
Connections between the Lasso and Support Vector Machines Martin Jaggi Ecole Polytechnique 2013 / 07 / 08 ROKS 13 - International Workshop on Advances in Regularization, Optimization, Kernel Methods and
More informationStructured Sparsity with Group-Graph Regularization
Structured Sparsity with Group-Graph Regularization Xin-Yu Dai, Jian-Bing Zhang, Shu-Jian Huang, Jia-Jun hen, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing
More informationExploiting Low-Rank Structure in Semidenite Programming by Approximate Operator Splitting
Exploiting Low-Rank Structure in Semidenite Programming by Approximate Operator Splitting Mario Souto, Joaquim D. Garcia and Álvaro Veiga June 26, 2018 Outline Introduction Algorithm Exploiting low-rank
More informationA General Framework for Fast Stagewise Algorithms
Journal of Machine Learning Research 16 (2015) 2543-2588 Submitted 8/14; Revised 5/15; Published 12/15 A General Framework for Fast Stagewise Algorithms Ryan J. Tibshirani Departments of Statistics and
More informationA dual domain algorithm for minimum nuclear norm deblending Jinkun Cheng and Mauricio D. Sacchi, University of Alberta
A dual domain algorithm for minimum nuclear norm deblending Jinkun Cheng and Mauricio D. Sacchi, University of Alberta Downloaded /1/15 to 142.244.11.63. Redistribution subject to SEG license or copyright;
More informationRobust Transfer Principal Component Analysis with Rank Constraints
Robust Transfer Principal Component Analysis with Rank Constraints Yuhong Guo Department of Computer and Information Sciences Temple University, Philadelphia, PA 19122, USA yuhong@temple.edu Abstract Principal
More informationLSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems
LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems Xiangrui Meng Joint with Michael A. Saunders and Michael W. Mahoney Stanford University June 19, 2012 Meng, Saunders, Mahoney
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS
More informationA Geometric Analysis of Subspace Clustering with Outliers
A Geometric Analysis of Subspace Clustering with Outliers Mahdi Soltanolkotabi and Emmanuel Candés Stanford University Fundamental Tool in Data Mining : PCA Fundamental Tool in Data Mining : PCA Subspace
More informationEfficient Inexact Proximal Gradient Algorithm for Nonconvex Problems
Efficient Inexact Proximal Gradient Algorithm for Nonconvex Problems Quanming Yao James T. Kwok Fei Gao 2 Wei Chen 2 Tie-Yan Liu 2 Hong Kong University of Science and Technology, Clear Water Bay, Hong
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationSupplementary material for the paper Are Sparse Representations Really Relevant for Image Classification?
Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification? Roberto Rigamonti, Matthew A. Brown, Vincent Lepetit CVLab, EPFL Lausanne, Switzerland firstname.lastname@epfl.ch
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 WRI C225 Lecture 04 130131 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Histogram Equalization Image Filtering Linear
More information