Low-Rank Matrix Recovery III: Fast Algorithms and Scalable Applications Zhouchen Lin

Size: px
Start display at page:

Download "Low-Rank Matrix Recovery III: Fast Algorithms and Scalable Applications Zhouchen Lin"

Transcription

1 Low-Rank Matrix Recovery III: Fast Algorithms and Scalable Applications Zhouchen Lin Visual Computing Group Microsoft Research Asia Aug. 11, 2011

2 Why do we need new algorithms? min kak + kek 1 subj A + E = D High-dimensional, non-smooth convex optimization: 2n 2 unknowns: 1000x1000 matrix 2 millions unknowns! Existing second-order (e.g., interior point) algorithms cannot solve moderate-to-large scale instances Complexity is O(n 6 ). CVX (Stanford, Boyd) solves only up to 80x80 on a typical PC..

3 Existing work min kak + kek 1 min kek 1 L1-norm minimization: Stanford university: Emmanuel Candès 06~ 09 Rice university: Wotao Yin and Yin Zhang National University of Singapore: K.C. Toh 09 Technion Israel Institute of Technology: Amir Beck Tel Aviv University: Marc Teboulle University of Washington: Paul Tseng min kak Nuclear norm minimization: Stanford university: Jianfeng Cai and Emmanuel Candès Rice university: Wotao Yin and Yin Zhang 09 National University of Singapore: K.C. Toh, Zuowei Shen 09 Baptist university of Hong Kong: Xiaoming Yuan 09 Columbia University: Shiqian Ma, Donald Goldfarb, Lifeng Chen 09 and many others!

4 THIS TALK: exciting development of fast algorithms Time required to solve a 1000x1000 PRCA problem: min kak + kek 1 subj A + E = D Algorithms Accuracy Rank E _0 # iterations time (sec) IT 5.99e ,268 8, ,370.3 DUAL 8.65e , ,855.4 APG 5.85e , ,468.9 APG P 5.91e , ,000 times speedup! ALM P 2.07e , ADM P 3.83e , We will see: How to efficiently solve large matrix recovery problems by choosing the right first-order method (4 order of magnitude speedup!) Ideas behind the algorithms that apply to many related problems.

5 Why are scalable solutions possible? The complexity of solving the convex generic problem: min f(x) x using first-order methods depends strongly on the smoothness of f : f smooth, r f Lipschitz: O(" 1=2 ) f di erentiable: O(" 1 ) f non-smooth: O(" 2 ) BAD NEWS: Our model problem is large and nonsmooth. min kak + kek 1 subj A + E = D Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, 2003.

6 Why are scalable solutions possible? GOOD NEWS: The RPCA problem has a special structure min kak + kek 1 subj A + E = D KEY OBSERVATION: closed form solutions for the proximal minimizations: S " (Q) = argmin X "kxk kx Qk2 F D " (Q) = argmin X "kxk kx Qk2 F

7 Why are scalable solutions possible? GOOD NEWS: The RPCA problem has a special structure min kak + kek 1 subj A + E = D KEY OBSERVATION: closed form solutions for the proximal minimizations: S " (Q) = argmin X "kxk kx Qk2 F Lemma. The solution to the above problem is given by applying the soft-thresholding operator S " (q) = max(jqj "; 0) sgn(q) to each entry of the matrix Q. q W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative algorithms for l1-minimization with applications to compressed sensing, SIAM Journal on Imaging Sciences,1 (2008), pp

8 Why are scalable solutions possible? GOOD NEWS: The RPCA problem has a special structure min kak + kek 1 subj A + E = D KEY OBSERVATION: closed form solutions for the proximal minimizations: D " (Q) = argmin X "kxk kx Qk2 F Lemma. The solution to the above problem is given by applying the soft-thresholding operator to the singular values of Q = U V T D " (Q) = US " ( )V T q J.-F. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20(4): , 2008.

9 Our Roadmap Computing time #iterations time for each iteration Make the iterations as few as possible: Iterative Thresholding Accelerated Proximal Gradient Augmented Lagrange Multiplier Alternating Direction Method of Multipliers Make the iterations as efficient as possible: Partial SVD O(rn 2 ) Block Lanczos with Warm Start (BLWS)

10 Our Roadmap Computing time #iterations time for each iteration Make the iterations as few as possible: Iterative Thresholding Accelerated Proximal Gradient Augmented Lagrange Multiplier Alternating Direction Method of Multipliers Make the iterations as efficient as possible: Partial SVD O(rn 2 ) Block Lanczos with Warm Start (BLWS)

11 Iterative Thresholding Model problem: min kak + kek 1 subj A + E = D: Convenient approximation (exact as ¹ & 0 ) : min kak + kek 1 + ¹ 2 (kak2 F + kek 2 F) subj A + E = D:

12 Iterative Thresholding Model problem: min kak + kek 1 subj A + E = D: Convenient approximation (exact as ¹ & 0 ) : min kak + kek 1 + ¹ 2 (kak2 F + kek 2 F) subj A + E = D: Lagrangian: L(A; E; Y ) = : kak + kek 1 + ¹ 2 (kak2 F + kek2 F ) + hy; D A Ei Algorithm (Iterative Thresholding): (A k+1 ; E k+1 ) = arg min L(A; E; Y k ) Y k+1 = Y k + ± k (D A k+1 E k+1 ): ± k < 1 Theorem [Wright et. al ala Cai et. al. 09]. Provided, the iterates converge to the unique optimal solution to the approximated problem.

13 A recurring theme Similar ideas appear in many places in the literature min M(x)=b f(x) relaxation min M(x)=b f(x) + ¹ 2 kxk2 Solution via Uzawa s algorithm x k+1 = arg min f(x) + ¹ 2 kxk2 + hy k ; b M(x)i y k+1 = y k + ± k (b M(x k+1 )): [Cai, Candes, Shen 09] A Singular Value Thresholding Algorithm for Matrix Completion [Osher, Mao, Dong, Yin 09] Fast linearized Bregman iteration for Compressive Sensing and sparse denoising

14 How do we solve the subproblem? Key subproblem: (A k+1 ; E k+1 ) = arg min L(A; E; Y k ) = arg min kak + kek 1 + ¹ 2 (kak2 F + kek2 F ) + hy k; D A Ei : Using our previous observations A k+1 = arg min A kak + ¹ 2 kak2 F hy k; Ai = arg min A kak + ¹ 2 ka ¹ 1 Y k k 2 F = ¹ 1 D 1 (Y k ): Shrink singular values E k+1 = arg min E kek 1 + ¹ 2 kek2 F hy k; Ei = arg min E kek 1 + ¹ 2 ke ¹ 1 Y k k 2 F = ¹ 1 S (Y k ): Shrink absolute values So each iteration is relatively simple yet expensive ( cost of one SVD ).

15 Iterative Thresholding: Pros and Cons Extremely simple algorithm for robust PCA: A k+1 = ¹ 1 D 1 (Y k ); E k+1 = ¹ 1 S (Y k ); Y k+1 = Y k + ± k (D A k+1 E k+1 ): Strong points: In practice, controls the rank of iterates A k. Scalable: can solve medium-large problems. Weak point: Slow many iterations for convergence. E.g., recovering a 1,000 x 1,000 matrix of rank 50 from 10% errors, requires >8,000 iterations and >27 hours on a standard PC. Next: How can we cut the number of iterations?

16 Our Roadmap Computing time #iterations time for each iteration Make the iterations as few as possible: Iterative Thresholding Accelerated Proximal Gradient Augmented Lagrange Multiplier Alternating Direction Method of Multipliers Make the iterations as efficient as possible: Partial SVD O(rn 2 ) Block Lanczos with Warm Start (BLWS)

17 Accelerated Proximal Gradient (APG) Method The gradient descent: x k+1 = x k k rf(x k ) The convergence rate is only O(k 1 )! Prior to 1983, best lower bound was is it actually possible to achieve this? O(k 2 ) (much smaller ) Theorem [Nesterov 83]: Consider the problem min f(x); f : convex If f is differentiable with Lipschitz continuous gradient: k rf(x 1 ) rf(x 2 )k Lkx 1 x 2 k; there exists a first-order algorithm with rate in function values. O(k 2 ) convergence Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1/k 2 ). Soviet Mathematics Doklady, 27(2): , 1983.

18 Nesterov s Optimal Gradient Method Problem:, convex, differentiable, and L-Lipschitz. min f(x) f rf First Idea: minimize sequence of quadratic approximations to f: f(x) f(y) + hr f(y); x yi + L 2 kx yk2 : = QL (x; y) Repeat: x k+1 = arg min x Q L (x; y k ) = y k 1 L r f(y k) (point where we form approx.) Natural choice: y k = x k O(L=") standard gradient algorithm iterations for an -suboptimal solution "

19 Nesterov s Optimal Gradient Method Problem:, convex, differentiable, and L-Lipschitz. min f(x) f rf First Idea: minimize sequence of quadratic approximations to f: f(x) f(y) + hr f(y); x yi + L 2 kx yk2 : = QL (x; y) Repeat: x k+1 = arg min x Q L (x; y k ) = y k 1 L r f(y k) Non-obvious alternative: t k+1 = 1 + p 1 + 4t 2 k 2 ; y k+1 = x k + t k 1 t k+1 (x k x k 1 ); O( p L=") iterations for an -suboptimal solution! " Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1/k 2 ). Soviet Mathematics Doklady, 27(2): , 1983.

20 Generalization min F(x) g(x) + f(x); with g, f convex, f 2 C 1;1 We can still make quadratic approximations to the smooth part Q L (x; y) = f(y) + hrf(y); x yi + L 2 kx yk2 + g(x) and iteratively minimize them: x k+1 = arg min Q L (x; y k ) x = arg min g(x) + L x (y L 1 rf(y)) 2 x 2 Theorem [Beck and Teboulle 09]: The above algorithm converges with rate F(x k ) F 2Lkx 0 x k 2 (k + 1) 2 : Moral: if we can solve the problem min Q L (x; y k ) efficiently, we retain the x advantages of Nesterov s algorithm, even though F contains a nonsmooth term. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1): , Mar 2009.

21 Beck and Teboulle s APG method Algorithm 1 Accelerated Proximal Gradient (APG) method 1: Initialize x 0 = y 1, t 1 = 1, k = 1. 2: while not converged do 3: x k = arg min Q L (x; y k ), 4: t k+1 = 1 2 (1 + p 1 + 4t 2 k ), y k+1 = x k + t k 1 t k+1 (x k x k 1 ); 5: end while [Beck + Teboulle 09] A fast iterative shrinkage-thresholding algorithm for linear inverse problems (theory + application to sparse recovery) [Liu, Sun and Toh 09] An Implementable Proximal Point Algorithmic Framework for Nuclear Norm Minimization (application to matrix completion) [Ganesh, Lin, Ma, Wu, Wright 09] Fast Algorithms for Recovering a Corrupted Low-Rank Matrix (application to robust PCA)

22 What about matrix recovery? Model problem: min kak + kek 1 subj A + E = D: Penalized version (exact as ¹ & 0 ) : min ¹ ( kak + kek 1 ) kd A Ek2 F: Non-differentiable. Smooth, Lipschitz gradient.

23 Solving the subproblem? x k+1 = arg min Q L (x; y k ) x = arg min g(x) + L x (y L 1 rf(y)) 2 x 2 In our case, x = (x A ; x E ) 2 R m n R m n y = (y A ; y E ) 2 R m n R m n It is not difficult to show that L = p 2 and rf = (D A E; D A E). So, we find that the update equations are again given by shrinkage A k+1 = arg min ¹ L kak ka yk A L 1 (D ya k ye)k k 2 F = D ¹=L (ya k L 1 (D ya k ye)): k E k+1 = arg min ¹ L kek ke yk E L 1 (D y k A y k E)k 2 F = S ¹=L (y k E L 1 (D y k A y k E)):

24 APG: Pros and Cons A k+1 = D ¹ L y k A L 1 (D ya k yk E ) ; E k+1 = S ¹ y k L E L 1 (D ya k yk E ) ; t k+1 = 1 2 (1 + p 1 + 4t 2 k ); y k+1 A = A k + t k 1 t k+1 (A k A k 1 ); y k+1 E = E k + t k 1 t k+1 (E k E k 1 ): Strong points: Scalable: can solve medium-large problems. Dramatically improved iteration complexity: Cuts #iterations from >8,000 to 135 (!) Weak point: Does not control the rank of the iterates Requires continuation for very accurate solution: ¹ k+1 = max( ¹ k ; ¹ min ); 2 (0; 1): Next: Are there better frameworks for continuation?

25 Our Roadmap Computing time #iterations time for each iteration Make the iterations as few as possible: Iterative Thresholding Accelerated Proximal Gradient Augmented Lagrange Multiplier Alternating Direction Method of Multipliers Make the iterations as efficient as possible: Partial SVD O(rn 2 ) Block Lanczos with Warm Start (BLWS)

26 Augmented Lagrange Multiplier (ALM) Method Model problem: min kak + kek 1 subj A + E = D: We ve seen two approximations can we just efficiently solve the exact problem? Can write as min x f(x); s:t: g i (x) = 0; i = 1; ; m; Lagrangian: L(x; ) = f(x) + mx ig i (x): i=1 Augmented Lagrangian [Hestenes 69, Powell 69]: mx ~L(x; ; ¹) = f(x) + ig i (x) (k+1) i = (k) i i=1 + ¹ (k) i g i (x k+1 ) See, e.g., D. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods, 1981 mx ¹ i gi 2 (x): i=1 Penalty function

27 Augmented Lagrange Multiplier (ALM) Method Algorithm 1 Augmented Lagrange Multiplier Method 1: Initialize x 0, (0) i, ¹ (0) i > 0, k = 0, ½ > 1. 2: while not converged do 3: x k+1 = arg min L(x; ~ (k) ; ¹ (k) ); x 4: (k+1) i 5: ¹ (k+1) i = (k) i 6: k à k : end while = ½¹ (k) i ; + ¹ (k) i g i (x k+1 ); ALM is advantageous when the subproblem is easily solvable: fast convergence: O μ 1 x k+1 = arg min x ~ L(x; (k) ; ¹ (k) ) See, e.g., D. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods, 1981

28 ALM: Solving the Subproblem ~L(A; E; Y; ¹) = kak + kek 1 + hy; D A Ei + ¹ 2 kd A Ek2 F : Solve the subproblem : (A k+1 ; E k+1 ) = arg min A;E ~L(A; E; Y k ; ¹ k ) repeat k+1 = D ¹ 1(D E j k+1 ¹ 1 k Y k); k E k+1 = 1 S ¹ (D A j+1 k+1 ¹ 1 k Y k): k A j+1 Shrink singular values Shrink absolute values Then update Lagrange multiplier: Y k+1 = Y k + ¹ k (D A k+1 E k+1 ): The inner loop slows down when of SVDs grows! ¹ k grows! The total number Next: Do we need to solve the subproblem exactly? Z. Lin, M. Chen, L. Wu, and Y. Ma, The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrix, submitted to Mathematical Programming.

29 Our Roadmap Computing time #iterations time for each iteration Make the iterations as few as possible: Iterative Thresholding Accelerated Proximal Gradient Augmented Lagrange Multiplier Alternating Direction Method of Multipliers Make the iterations as efficient as possible: Partial SVD O(rn 2 ) Block Lanczos with Warm Start (BLWS)

30 ADM for RPCA ~L(A; E; Y; ¹) = kak + kek 1 + hy; D A Ei + ¹ 2 kd A Ek2 F : Minimizing ~L(A; E; Y; ¹) over (A; E) simultaneously is nontrivial Minimizing ~L(A; E; Y; ¹) with respect to just A or E is easy: min A ~L(A; E; Y; ¹) = D ¹ 1(D E ¹ 1 Y ); min E ~ L(A; E; Y; ¹) = S ¹ 1(D A ¹ 1 Y ): Solution: Alternating Direction Method of Multipliers [Gabay and Mercier 76]: A k+1 = D ¹ 1 k (D E k ¹ 1 k Y k); E k+1 = 1 S ¹ (D A k+1 ¹ 1 k Y k); k Y k+1 = Y k + ¹ k (D A k+1 E k+1 ): We only have to update A and E once before updating Y!

31 Convergence of ADM for RPCA Classical theory: convergence provided ¹ k is bounded. In practice, increasing sequences Recently justified by Lin et. al.: ¹ k yield much faster convergence. Theorem. If f¹ k g is nondecreasing, then (A k ; E k ) globally converges to an optimal solution (A ; E ) to the RPCA problem if and only if +1X k=1 ¹ 1 k = +1: Z. Lin, M. Chen, L. Wu, and Y. Ma, The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrix, submitted to Mathematical Programming.

32 Summary of Representative Algorithms IT: L(A; E; Y ) = kak + kek 1 + ¹ 2 (kak2 F + kek2 F ) + ¹ hy; D A Ei repeat Shrink singular values A k+1 = D ¹ 1(Y k ); E k+1 = S ¹ 1(Y k ); Y k+1 = Y k + ± k (D A k+1 E k+1 ): Shrink absolute values ADM: ~L(A; E; Y; ¹) = kak + kek 1 + hy; D A Ei + ¹ 2 kd A Ek2 F repeat A k+1 = D ¹ 1 k (D E k ¹ 1 k Y k); E k+1 = 1 S ¹ (D A k+1 ¹ 1 k Y k); k Y k+1 = Y k + ¹ k (D A k+1 E k+1 ): Shrink singular values Shrink absolute values

33 ADM: Pros and Cons A k+1 = D ¹ 1 k (D E k ¹ 1 k Y k); E k+1 = 1 S ¹ (D A k+1 ¹ 1 k Y k); k Y k+1 = Y k + ¹ k (D A k+1 E k+1 ): Strong points: Scalable: can solve medium-large problems. Further improved iteration complexity: down to iterations Best algorithm for this problem, in our experience. Weak point: Convergence rate still open (at least O(k 1 ) [X. Yuan et al.]). Extensions to >2 terms still open. [X. Yuan and collaborators] Next: Can we reduce the complexity of each iteration? Z. Lin, M. Chen, L. Wu, and Y. Ma, The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrix, submitted to Mathematical Programming.

34 Our Roadmap Computing time #iterations time for each iteration Make the iterations as few as possible: Iterative Thresholding Accelerated Proximal Gradient Augmented Lagrange Multiplier Alternating Direction Method of Multipliers Make the iterations as efficient as possible: Partial SVD O(rn 2 ) Block Lanczos with Warm Start (BLWS)

35 Using Partial SVD ADM: repeat A k+1 = D ¹ 1 k (D E k ¹ 1 k Y k); E k+1 = 1 S ¹ (D A k+1 ¹ 1 k Y k); k Y k+1 = Y k + ¹ k (D A k+1 E k+1 ): Shrink singular values Shrink absolute values Recall: If the SVD of Q is Q = U V T, then D " (Q) = US " ( )V T, where S " (x) = max(jxj "; 0) sgn(x). Only need singular values of D E k ¹ 1 k Y k that are larger than ¹ 1 k. Cuts complexity from O(n 3 ) to O(Kn 2 )! PROPACK: We have optimized it ( We also modified it ( Zhouchen Lin, Some Software Packages for Partial SVD Computation, arxiv:

36 Lanczos Method for Partial SVD Approximate Q as and can be found by Lanczos procedure, with k steps. U k is bi-diagonal. B k V k The Lanczos procedure starts from a random vector. Suppose the SVD of is B k = ^U k k V ~ T B k k, then the SVD of Q is approximately: The leading singular values/vectors in, U k Uk ~ and converge to those of Q quickly when k increases. The complexity is O(kn 2 +k 3 ). Q ¼ U k B k V T k Q ¼ (U k ~ Uk ) k (V k ~ Vk ) T k V k ~ Vk G. Golub et al. Matrix Computations, John Hopkins Univ. Press,1996.

37 Exciting development of fast algorithms For a 1000x1000 matrix of rank 50, with 10% (100,000) entries randomly corrupted: min kak + kek 1 subj A + E = D Algorithms Accuracy Rank E _0 # iterations time (sec) IT 5.99e ,268 8, ,370.3 DUAL 8.65e , ,855.4 APG 5.85e , ,468.9 APG P 5.91e , ALM P 2.07e , ADM P 3.83e , ,000 times speedup! Provably Robust PCA at only a constant factor ( 20) more computation than conventional PCA!

38 Our Roadmap Computing time #iterations time for each iteration Make the iterations as few as possible: Iterative Thresholding Accelerated Proximal Gradient Augmented Lagrange Multiplier Alternating Direction Method of Multipliers Make the iterations as efficient as possible: Partial SVD O(rn 2 ) Block Lanczos with Warm Start (BLWS)

39 Key Observations for Acceleration When solving the subproblem A j = argmin A usually slightly differs from. " j kak ka Q jk 2 F Q j Q j 1 Computing partial SVD of Q j independently does not utilize this information We are seeking the principal singular subspace of. Q j Z. Lin and S. Wei, A Block Lanczos with Warm Start Technique for Accelerating Nuclear Norm Minimization Algorithms, submitted to Optimization Letters.

40 Drawbacks of Vector-Based Lanczos Method The initial vector q 1 to start the Lanczos procedure does not carry enough information of the principal singular subspace. Even if q 1 is the largest singular vector of Q j 1 Z. Lin and S. Wei, A Block Lanczos with Warm Start Technique for Accelerating Nuclear Norm Minimization Algorithms, submitted to Optimization Letters.

41 Key Ideas Use Block-based Lanczos method for partial SVD Use the principal singular subspace of Q j 1 to start the block Lanczos procedure Code available at Z. Lin and S. Wei, A Block Lanczos with Warm Start Technique for Accelerating Nuclear Norm Minimization Algorithms, submitted to Optimization Letters.

42 Experimental Results - BLWS for RPCA Table 1: BLWS-ADM vs. ADM on di erent synthetic data. ^A and ^E are the computed low rank and sparse matrices and A is the ground truth. k m method F kak F rank( ^A) k ^Ek 0 #iter time(s) 500 ADM 5.27e BLWS-ADM 9.64e ADM 3.99e BLWS-ADM 6.05e ADM 2.80e BLWS-ADM 4.30e ADM 2.52e BLWS-ADM 3.90e Z. Lin and S. Wei, A Block Lanczos with Warm Start Technique for Accelerating Nuclear Norm Minimization Algorithms, submitted to Optimization Letters.

43 Experimental Results - BLWS for MC Table 1: BLWS-SVT vs. SVT on synthetic data. ^A is the recovered low rank matrix and A is the ground truth. m is the size of matrix and n is the number of sampled entries. d r = r(2m r) is the DOF in an m m matrix of rank r. m r n=d r n=m 2 algorithm time(s) #iter k ^A Ak F kak F SVT e BLWS-SVT e SVT e BLWS-SVT e SVT e BLWS-SVT e SVT e BLWS-SVT e SVT e BLWS-SVT e SVT e BLWS-SVT e SVT e BLWS-SVT e-004 Z. Lin and S. Wei, A Block Lanczos with Warm Start Technique for Accelerating Nuclear Norm Minimization Algorithms, submitted to Optimization Letters.

44 Conclusions State-of-the-art algorithms for RPCA and its many variations (10 5 speedup on PC). GPU implementation gives another 5x speedup HPC cluster implementation using distributed SVD. Next: Applications! What can we do with these new theoretical and algorithmic tools?

45 Extensions and Broad Applications of RPCA Background Modeling Image Alignment Latent Semantic Indexing for Text Documents Photometric Stereo Image Tagging Refinement Robust Filtering Graphical Modeling Learning Computational scale well beyond what is for TILT

46 Computation Examples Background modeling minkak + kek 1 subj D = A + E; A 0: High-res video, 720x576, 102 frames, s on a workstation (Intel Xeon E GHz CPU, 4 cores and 24GB memory)

47 Computation Examples Face Alignment [Peng et al.] minkak + kek 1 subj D ± = A + E: 80x60, 48, 7min on a workstation

48 Computation Examples Web document corpus analysis [Min et al.] minkak + kek 1 subj D = A + E; E 0: D: tf-idf matrix, x 18320, 90.3h on an HPC cluster

49 Other Applications Yi Ma Visual Computing Group Microsoft Research Asia Aug. 11, 2011

50 APPLICATIONS Background modeling from video Static camera surveillance video 200 frames, 144 x 172 pixels, Video = Low-rank appx. + Sparse error Significant foreground motion RPCA Candes, Li, Ma, and Wright, Journal of the ACM, May 2011.

51 APPLICATIONS Background modeling from video Surveillance video: 250 frames, 128 x 160 pixels, with significant illumination variation Video By RPCA Results of Black and de la Torre Candes, Li, Ma, and Wright, Journal of the ACM, May 2011.

52 APPLICATIONS Repairing vintage movies Original Repaired Corruptions Frame pixels

53 APPLICATIONS Repairing vintage movies Original Repaired Corruptions Frame 2

54 APPLICATIONS Repairing vintage movies Original Repaired Corruptions Frame 3

55 APPLICATIONS Repairing vintage movies Original Repaired Corruptions Frame 4

56 APPLICATIONS Repairing vintage movies Original Repaired Corruptions Frame 5

57 APPLICATIONS Repairing vintage movies Original Repaired Corruptions Frame 6

58 APPLICATIONS Repairing vintage movies Original Repaired Corruptions Frame 7

59 APPLICATIONS Faces under varying illumination 58 images of one person under varying lighting: RPCA Candes, Li, Ma, and Wright, Journal of the ACM, May 2011.

60 APPLICATIONS Faces under varying illumination 58 images of one person under varying lighting: specularity RPCA cast shadows Candes, Li, Ma, and Wright, Journal of the ACM, May 2011.

61 APPLICATIONS -- High-quality photometric stereo specularities, shadows surface normals relight motion blurs

62 Robust photometric stereo: synthesized images Input images Mean error o 0.96 o Max error 0.20 o 8.0 o Wu, Ganesh, Li, Matsushita, and Ma, in ACCV 2010.

63 Robust photometric stereo: real images Wu, Ganesh, Li, Matsushita, and Ma, in ACCV 2010.

64 Robust Alignment via Sparse and Low-rank Decomposition corrupted & misaligned observation aligned low-rank signals sparse errors o Problem: Given recover, and. Parametric deformations (rigid, affine, projective ) Low-rank component Sparse component Solution: Robust Alignment via Low-rank and Sparse (RASL) Decomposition Iteratively solving the linearized convex program::

65 APPLICATIONS Batch face alignment Initial imprecise alignment, inappropriate for recognition: Peng, Ganesh, Wright, and Ma, CVPR 10

66 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

67 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

68 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

69 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

70 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

71 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

72 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

73 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

74 APPLICATIONS Batch face alignment Peng, Ganesh, Wright, and Ma, CVPR 10

75 APPLICATIONS Batch face alignment Final result: per-pixel alignment Peng, Ganesh, Wright, and Ma, CVPR 10

76 APPLICATIONS Batch face alignment: accuracy evaluation 100 misaligned corrupted images: Vedaldi CVPR 08 direct/gradient RASL: Mean error Error std. Max error Initial misalignment Vedaldi (direct/gradient) 1.97/ / /4.02 RASL (this work) Peng, Ganesh, Wright, and Ma, CVPR 10

77 APPLICATIONS Simultaneous Alignment and Repairing Peng, Ganesh, Wright, Ma, CVPR 10

78 APPLICATIONS Aligning Face Images from the Internet *48 images collected from internet Peng, Ganesh, Wright, Ma, CVPR 10

79 APPLICATIONS Faces Detected Input: faces detected by a face detector ( ) Average Peng, Ganesh, Wright, Ma, CVPR 10

80 APPLICATIONS Faces Aligned Output: aligned faces ( ) Average Peng, Ganesh, Wright, Ma, CVPR 10

81 APPLICATIONS Faces Repaired and Cleaned Output: clean low-rank faces ( ) Average Peng, Ganesh, Wright, Ma, CVPR 10

82 APPLICATIONS Sparse errors of the face images Output: sparse error images ( ) Peng, Ganesh, Wright, Ma, CVPR 10

83 APPLICATIONS Celebrities from the Internet Average face before alignment & repairing Gloria Macapagal Arroyo Jennifer Capriati Laura Bush Serena Williams Barack Obama Ariel Sharon Arnold Schwarzenegger Colin Powell Donald Rumsfeld George W Bush Gerhard Schroeder Hugo Chavez Jacques Chirac Jean Chretien John Ashcroft Junichiro Koizumi Lleyton Hewitt Luiz Inacio Lula da Silva Tony Blair Vladimir Putin Peng, Ganesh, Wright, Ma, CVPR 10

84 APPLICATIONS Face recognition with less controlled data? Average face after alignment & repairing Gloria Macapagal Arroyo Jennifer Capriati Laura Bush Serena Williams Barack Obama Ariel Sharon Arnold Schwarzenegger Colin Powell Donald Rumsfeld George W Bush Gerhard Schroeder Hugo Chavez Jacques Chirac Jean Chretien John Ashcroft Junichiro Koizumi Lleyton Hewitt Luiz Inacio Lula da Silva Tony Blair Vladimir Putin Peng, Ganesh, Wright, Ma, CVPR 10

85 APPLICATIONS Aligning handwritten digits Learned-Miller PAMI 06 Vedaldi CVPR 08 Peng, Ganesh, Wright, Ma, CVPR 10

86 APPLICATIONS 2D image matching and 3D modeling 2D homographies Peng, Ganesh, Wright, Ma, CVPR 10

87 Other Applications: Web Document Corpus Analysis Latent Semantic Indexing: the classical solution (PCA) Documents CHRYSLER SETS STOCK SPLIT, HIGHER DIVIDEND Words Chrysler Corp said its board declared a three-for-two stock split in the form of a 50 pct stock dividend and raised the quarterly dividend by seven pct. The company said the dividend was raised to 37.5 cts a share from 35 cts on a pre-split basis, equal to a 25 ct dividend on a post-split basis. Chrysler said the stock dividend is payable April 13 to holders of record March 23 while the cash dividend is payable April 15 to holders of record March 23. It said cash will be paid in lieu of fractional shares. With the split, Chrysler said 13.2 mln shares remain to be purchased in its stock repurchase program that began in late That program now has a target of 56.3 mln shares with the latest stock split. Chrysler said in a statement the actions "re ect not only our outstanding performance over the past few years but also our optimism about the company's future." word frequency (or TF/IDF) Dense, difficult to interpret a better model/solution? Low-rank background topic model Informative, discriminative keywords Low dimensional topic models with keywords

88 Other Applications: Sparse Keywords Extracted Reuters dataset: 1,000 longest documents; 3,000 most frequent words CHRYSLER SETS STOCK SPLIT, HIGHER DIVIDEND Chrysler Corp said its board declared a three-for-two stock split in the form of a 50 pct stock dividend and raised the quarterly dividend by seven pct. The company said the dividend was raised to 37.5 cts a share from 35 cts on a pre-split basis, equal to a 25 ct dividend on a post-split basis. Chrysler said the stock dividend is payable April 13 to holders of record March 23 while the cash dividend is payable April 15 to holders of record March 23. It said cash will be paid in lieu of fractional shares. With the split, Chrysler said 13.2 mln shares remain to be purchased in its stock repurchase program that began in late That program now has a target of 56.3 mln shares with the latest stock split. Chrysler said in a statement the actions "re ect not only our outstanding performance over the past few years but also our optimism about the company's future." Min, Zhang, Wright, Ma, CIKM 2010.

89 Other Applications: Web Image Tag Refinement Zhu, Yan, and Ma, ACM MM 2010.

90 Other Applications: Robust Filtering and System ID GPS on a Car: ½ _x = Ax + Bu; A 2 < r r y = Cx + z + e gross sparse errors (due to buildings, trees ) Robust Kalman Filter: Robust System ID: ^x t+1 = Ax t + K(y t C^x t ) y n y n 1 y n 2 y 0 y n 1 y n 2... y 1 y n y n = O n r X r n + S y 0 y 1 y n+2 y n+1 Hankel matrix

91 Other Application: Graphical Model with Latent Variables cond. indep. given other variables Separation Principle: sparse pattern conditional (in)dependence rank of second component number of hidden variables Work of Chandrasekharan et. al.

92 A Perfect Storm in the Cloud Mathematical Theory (high-dimensional statistics, measure concentration, combinatorics ) Massive Data (images, videos, texts, audios, speeches, stocks, user rankings ) Cloud Computing (parallel, distributed, networked) Applications & Services (data processing, analysis, compression, knowledge discovery, search, recognition ) Computational Methods (convex optimization, first-order methods, hashing, approximate solutions )

93 THANK YOU! Questions, please?

Robust Principal Component Analysis (RPCA)

Robust Principal Component Analysis (RPCA) Robust Principal Component Analysis (RPCA) & Matrix decomposition: into low-rank and sparse components Zhenfang Hu 2010.4.1 reference [1] Chandrasekharan, V., Sanghavi, S., Parillo, P., Wilsky, A.: Ranksparsity

More information

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision Due to space limitation in the main paper, we present additional experimental results in this supplementary

More information

Low Rank Representation Theories, Algorithms, Applications. 林宙辰 北京大学 May 3, 2012

Low Rank Representation Theories, Algorithms, Applications. 林宙辰 北京大学 May 3, 2012 Low Rank Representation Theories, Algorithms, Applications 林宙辰 北京大学 May 3, 2012 Outline Low Rank Representation Some Theoretical Analysis Solution by LADM Applications Generalizations Conclusions Sparse

More information

Convex optimization algorithms for sparse and low-rank representations

Convex optimization algorithms for sparse and low-rank representations Convex optimization algorithms for sparse and low-rank representations Lieven Vandenberghe, Hsiao-Han Chao (UCLA) ECC 2013 Tutorial Session Sparse and low-rank representation methods in control, estimation,

More information

FAST PRINCIPAL COMPONENT PURSUIT VIA ALTERNATING MINIMIZATION

FAST PRINCIPAL COMPONENT PURSUIT VIA ALTERNATING MINIMIZATION FAST PRICIPAL COMPOET PURSUIT VIA ALTERATIG MIIMIZATIO Paul Rodríguez Department of Electrical Engineering Pontificia Universidad Católica del Perú Lima, Peru Brendt Wohlberg T-5 Applied Mathematics and

More information

ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING

ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING Fei Yang 1 Hong Jiang 2 Zuowei Shen 3 Wei Deng 4 Dimitris Metaxas 1 1 Rutgers University 2 Bell Labs 3 National University

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Proximal operator and methods

Proximal operator and methods Proximal operator and methods Master 2 Data Science, Univ. Paris Saclay Robert M. Gower Optimization Sum of Terms A Datum Function Finite Sum Training Problem The Training Problem Convergence GD I Theorem

More information

Convex Optimization MLSS 2015

Convex Optimization MLSS 2015 Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images

RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images FINAL MANUSCRIPT SUBMITTED TO IEEE TRANS. PAMI, DECEMBER 211. 1 RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images Yigang Peng, Arvind Ganesh, Student Member, IEEE,

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein October 8, 2015 1 / 30 Introduction Presentation Outline 1 Convex

More information

Convex Optimization / Homework 2, due Oct 3

Convex Optimization / Homework 2, due Oct 3 Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - C) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

COMPLETION OF STRUCTURALLY-INCOMPLETE MATRICES WITH REWEIGHTED LOW-RANK AND SPARSITY PRIORS. Jingyu Yang, Xuemeng Yang, Xinchen Ye

COMPLETION OF STRUCTURALLY-INCOMPLETE MATRICES WITH REWEIGHTED LOW-RANK AND SPARSITY PRIORS. Jingyu Yang, Xuemeng Yang, Xinchen Ye COMPLETION OF STRUCTURALLY-INCOMPLETE MATRICES WITH REWEIGHTED LOW-RANK AND SPARSITY PRIORS Jingyu Yang, Xuemeng Yang, Xinchen Ye School of Electronic Information Engineering, Tianjin University Building

More information

Convex Optimization. Lijun Zhang Modification of

Convex Optimization. Lijun Zhang   Modification of Convex Optimization Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf Outline Introduction Convex Sets & Functions Convex Optimization

More information

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know learn the proximal

More information

Real-time Background Subtraction via L1 Norm Tensor Decomposition

Real-time Background Subtraction via L1 Norm Tensor Decomposition Real-time Background Subtraction via L1 Norm Tensor Decomposition Taehyeon Kim and Yoonsik Choe Yonsei University, Seoul, Korea E-mail: pyomu@yonsei.ac.kr Tel/Fax: +82-10-2702-7671 Yonsei University, Seoul,

More information

ELEG Compressive Sensing and Sparse Signal Representations

ELEG Compressive Sensing and Sparse Signal Representations ELEG 867 - Compressive Sensing and Sparse Signal Representations Introduction to Matrix Completion and Robust PCA Gonzalo Garateguy Depart. of Electrical and Computer Engineering University of Delaware

More information

IN some applications, we need to decompose a given matrix

IN some applications, we need to decompose a given matrix 1 Recovery of Sparse and Low Rank Components of Matrices Using Iterative Method with Adaptive Thresholding Nematollah Zarmehi, Student Member, IEEE and Farokh Marvasti, Senior Member, IEEE arxiv:1703.03722v2

More information

Alternating Direction Method of Multipliers

Alternating Direction Method of Multipliers Alternating Direction Method of Multipliers CS 584: Big Data Analytics Material adapted from Stephen Boyd (https://web.stanford.edu/~boyd/papers/pdf/admm_slides.pdf) & Ryan Tibshirani (http://stat.cmu.edu/~ryantibs/convexopt/lectures/21-dual-meth.pdf)

More information

The Benefit of Tree Sparsity in Accelerated MRI

The Benefit of Tree Sparsity in Accelerated MRI The Benefit of Tree Sparsity in Accelerated MRI Chen Chen and Junzhou Huang Department of Computer Science and Engineering, The University of Texas at Arlington, TX, USA 76019 Abstract. The wavelet coefficients

More information

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1 Section 5 Convex Optimisation 1 W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 2018 page 5-1 Convex Combination Denition 5.1 A convex combination is a linear combination of points where all coecients

More information

Outlier Pursuit: Robust PCA and Collaborative Filtering

Outlier Pursuit: Robust PCA and Collaborative Filtering Outlier Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University of Singapore Joint w/ Constantine Caramanis, Yudong Chen, Sujay

More information

Lecture 4 Duality and Decomposition Techniques

Lecture 4 Duality and Decomposition Techniques Lecture 4 Duality and Decomposition Techniques Jie Lu (jielu@kth.se) Richard Combes Alexandre Proutiere Automatic Control, KTH September 19, 2013 Consider the primal problem Lagrange Duality Lagrangian

More information

Iterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development

Iterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development Iterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Technical University of Lisbon PORTUGAL

More information

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection Zhen Qin (University of California, Riverside) Peter van Beek & Xu Chen (SHARP Labs of America, Camas, WA) 2015/8/30

More information

Compressive Sensing Algorithms for Fast and Accurate Imaging

Compressive Sensing Algorithms for Fast and Accurate Imaging Compressive Sensing Algorithms for Fast and Accurate Imaging Wotao Yin Department of Computational and Applied Mathematics, Rice University SCIMM 10 ASU, Tempe, AZ Acknowledgements: results come in part

More information

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics 1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!

More information

An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems Kim-Chuan Toh Sangwoon Yun Abstract The affine rank minimization problem, which consists of finding a matrix

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department

More information

Composite Self-concordant Minimization

Composite Self-concordant Minimization Composite Self-concordant Minimization Volkan Cevher Laboratory for Information and Inference Systems-LIONS Ecole Polytechnique Federale de Lausanne (EPFL) volkan.cevher@epfl.ch Paris 6 Dec 11, 2013 joint

More information

An efficient algorithm for sparse PCA

An efficient algorithm for sparse PCA An efficient algorithm for sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato D.C. Monteiro Georgia Institute of Technology School of Industrial & System

More information

Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision

Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision 2013 IEEE International Conference on Computer Vision Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision Tae-Hyun Oh* KAIST Hyeongwoo Kim KAIST Yu-Wing Tai KAIST Jean-Charles Bazin

More information

Parallel and Distributed Sparse Optimization Algorithms

Parallel and Distributed Sparse Optimization Algorithms Parallel and Distributed Sparse Optimization Algorithms Part I Ruoyu Li 1 1 Department of Computer Science and Engineering University of Texas at Arlington March 19, 2015 Ruoyu Li (UTA) Parallel and Distributed

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

A low rank based seismic data interpolation via frequencypatches transform and low rank space projection

A low rank based seismic data interpolation via frequencypatches transform and low rank space projection A low rank based seismic data interpolation via frequencypatches transform and low rank space projection Zhengsheng Yao, Mike Galbraith and Randy Kolesar Schlumberger Summary We propose a new algorithm

More information

Limitations of Matrix Completion via Trace Norm Minimization

Limitations of Matrix Completion via Trace Norm Minimization Limitations of Matrix Completion via Trace Norm Minimization ABSTRACT Xiaoxiao Shi Computer Science Department University of Illinois at Chicago xiaoxiao@cs.uic.edu In recent years, compressive sensing

More information

Sparse Subspace Clustering for Incomplete Images

Sparse Subspace Clustering for Incomplete Images Sparse Subspace Clustering for Incomplete Images Xiao Wen 1, Linbo Qiao 2, Shiqian Ma 1,WeiLiu 3, and Hong Cheng 1 1 Department of SEEM, CUHK, Hong Kong. {wenx, sqma, hcheng}@se.cuhk.edu.hk 2 College of

More information

RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images

RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images Yigang Peng 1, Arvind Ganesh 2, John Wright 3 and Yi Ma 2,3 1 Dept. of Automation, Tsinghua University 2 Dept.

More information

Constrained optimization

Constrained optimization Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:

More information

MULTIVIEW STEREO (MVS) reconstruction has drawn

MULTIVIEW STEREO (MVS) reconstruction has drawn 566 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 6, NO. 5, SEPTEMBER 2012 Noisy Depth Maps Fusion for Multiview Stereo Via Matrix Completion Yue Deng, Yebin Liu, Qionghai Dai, Senior Member,

More information

ONLINE ROBUST PRINCIPAL COMPONENT ANALYSIS FOR BACKGROUND SUBTRACTION: A SYSTEM EVALUATION ON TOYOTA CAR DATA XINGQIAN XU THESIS

ONLINE ROBUST PRINCIPAL COMPONENT ANALYSIS FOR BACKGROUND SUBTRACTION: A SYSTEM EVALUATION ON TOYOTA CAR DATA XINGQIAN XU THESIS ONLINE ROBUST PRINCIPAL COMPONENT ANALYSIS FOR BACKGROUND SUBTRACTION: A SYSTEM EVALUATION ON TOYOTA CAR DATA BY XINGQIAN XU THESIS Submitted in partial fulfillment of the requirements for the degree of

More information

A More Efficient Approach to Large Scale Matrix Completion Problems

A More Efficient Approach to Large Scale Matrix Completion Problems A More Efficient Approach to Large Scale Matrix Completion Problems Matthew Olson August 25, 2014 Abstract This paper investigates a scalable optimization procedure to the low-rank matrix completion problem

More information

Efficient MR Image Reconstruction for Compressed MR Imaging

Efficient MR Image Reconstruction for Compressed MR Imaging Efficient MR Image Reconstruction for Compressed MR Imaging Junzhou Huang, Shaoting Zhang, and Dimitris Metaxas Division of Computer and Information Sciences, Rutgers University, NJ, USA 08854 Abstract.

More information

Decentralized Low-Rank Matrix Completion

Decentralized Low-Rank Matrix Completion Decentralized Low-Rank Matrix Completion Qing Ling 1, Yangyang Xu 2, Wotao Yin 2, Zaiwen Wen 3 1. Department of Automation, University of Science and Technology of China 2. Department of Computational

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming SECOND EDITION Dimitri P. Bertsekas Massachusetts Institute of Technology WWW site for book Information and Orders http://world.std.com/~athenasc/index.html Athena Scientific, Belmont,

More information

Online Subspace Estimation and Tracking from Missing or Corrupted Data

Online Subspace Estimation and Tracking from Missing or Corrupted Data Online Subspace Estimation and Tracking from Missing or Corrupted Data Laura Balzano www.ece.wisc.edu/~sunbeam Work with Benjamin Recht and Robert Nowak Subspace Representations Capture Dependencies Subspace

More information

ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS. Donghwan Kim and Jeffrey A.

ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS. Donghwan Kim and Jeffrey A. ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS Donghwan Kim and Jeffrey A. Fessler University of Michigan Dept. of Electrical Engineering and Computer Science

More information

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach Basic approaches I. Primal Approach - Feasible Direction

More information

Batch image alignment via subspace recovery based on alternative sparsity pursuit

Batch image alignment via subspace recovery based on alternative sparsity pursuit Computational Visual Media DOI 10.1007/s41095-017-0080-x Vol. 3, No. 3, September 2017, 295 304 Research Article Batch image alignment via subspace recovery based on alternative sparsity pursuit Xianhui

More information

Complex Non-Rigid Motion 3D Reconstruction by Union of Subspaces

Complex Non-Rigid Motion 3D Reconstruction by Union of Subspaces Complex Non-Rigid Motion 3D Reconstruction by Union of Subspaces Yingying Zhu University of Queensland Dong Huang Carnegie Mellon University zhuyingying2@gmail.com dghuang@andrew. Fernando De La Torre

More information

A Geometric Analysis of Subspace Clustering with Outliers

A Geometric Analysis of Subspace Clustering with Outliers A Geometric Analysis of Subspace Clustering with Outliers Mahdi Soltanolkotabi and Emmanuel Candés Stanford University Fundamental Tool in Data Mining : PCA Fundamental Tool in Data Mining : PCA Subspace

More information

Augmented Lagrangian Methods

Augmented Lagrangian Methods Augmented Lagrangian Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Augmented Lagrangian IMA, August 2016 1 /

More information

The Trainable Alternating Gradient Shrinkage method

The Trainable Alternating Gradient Shrinkage method The Trainable Alternating Gradient Shrinkage method Carlos Malanche, supervised by Ha Nguyen April 24, 2018 LIB (prof. Michael Unser), EPFL Quick example 1 The reconstruction problem Challenges in image

More information

An Iteratively Reweighted Least Square Implementation for Face Recognition

An Iteratively Reweighted Least Square Implementation for Face Recognition Vol. 6: 26-32 THE UNIVERSITY OF CENTRAL FLORIDA Published May 15, 2012 An Iteratively Reweighted Least Square Implementation for Face Recognition By: Jie Liang Faculty Mentor: Dr. Xin Li ABSTRACT: We propose,

More information

Robust Photometric Stereo via Low-Rank Matrix Completion and Recovery

Robust Photometric Stereo via Low-Rank Matrix Completion and Recovery Robust Photometric Stereo via Low-Rank Matrix Completion and Recovery Lun Wu, Arvind Ganesh, Boxin Shi, Yasuyuki Matsushita, Yongtian Wang and Yi Ma, School of Optics and Electronics, Beijing Institute

More information

Latent Low-Rank Representation

Latent Low-Rank Representation Latent Low-Rank Representation Guangcan Liu and Shuicheng Yan Abstract As mentioned at the end of previous chapter, a key aspect of LRR is about the configuration of its dictionary matrix. Usually, the

More information

Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization

Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization John Wright, Yigang Peng, Yi Ma Visual Computing Group Microsoft Research Asia {jowrig,v-yipe,mayi}@microsoft.com

More information

Scaling Distributed Machine Learning

Scaling Distributed Machine Learning Scaling Distributed Machine Learning with System and Algorithm Co-design Mu Li Thesis Defense CSD, CMU Feb 2nd, 2017 nx min w f i (w) Distributed systems i=1 Large scale optimization methods Large-scale

More information

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] Yongjia Song University of Wisconsin-Madison April 22, 2010 Yongjia Song

More information

SURVEILLANCE VIDEO PROCESSING USING COMPRESSIVE SENSING. Hong Jiang. Wei Deng. Zuowei Shen. (Communicated by the associate editor name)

SURVEILLANCE VIDEO PROCESSING USING COMPRESSIVE SENSING. Hong Jiang. Wei Deng. Zuowei Shen. (Communicated by the associate editor name) Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 200X Website: http://aimsciences.org pp. X XX SURVEILLANCE VIDEO PROCESSING USING COMPRESSIVE SENSING Hong Jiang Bell Labs, Alcatel-Lucent

More information

Optimization for Machine Learning

Optimization for Machine Learning with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex

More information

Robust Transfer Principal Component Analysis with Rank Constraints

Robust Transfer Principal Component Analysis with Rank Constraints Robust Transfer Principal Component Analysis with Rank Constraints Yuhong Guo Department of Computer and Information Sciences Temple University, Philadelphia, PA 19122, USA yuhong@temple.edu Abstract Principal

More information

Fast l 1 -Minimization and Parallelization for Face Recognition

Fast l 1 -Minimization and Parallelization for Face Recognition Fast l 1 -Minimization and Parallelization for Face Recognition Victor Shia, Allen Y. Yang, and S. Shankar Sastry Department of EECS, UC Berkeley Berkeley, CA 9472, USA {vshia, yang, sastry} @ eecs.berkeley.edu

More information

NULL SPACE CLUSTERING WITH APPLICATIONS TO MOTION SEGMENTATION AND FACE CLUSTERING

NULL SPACE CLUSTERING WITH APPLICATIONS TO MOTION SEGMENTATION AND FACE CLUSTERING NULL SPACE CLUSTERING WITH APPLICATIONS TO MOTION SEGMENTATION AND FACE CLUSTERING Pan Ji, Yiran Zhong, Hongdong Li, Mathieu Salzmann, Australian National University, Canberra NICTA, Canberra {pan.ji,hongdong.li}@anu.edu.au,mathieu.salzmann@nicta.com.au

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,

More information

Two-phase matrix splitting methods for asymmetric and symmetric LCP

Two-phase matrix splitting methods for asymmetric and symmetric LCP Two-phase matrix splitting methods for asymmetric and symmetric LCP Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University Joint work with Feng, Nocedal, and Pang

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

A primal-dual framework for mixtures of regularizers

A primal-dual framework for mixtures of regularizers A primal-dual framework for mixtures of regularizers Baran Gözcü baran.goezcue@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland

More information

Conditional gradient algorithms for machine learning

Conditional gradient algorithms for machine learning 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Compressed Sensing and L 1 -Related Minimization

Compressed Sensing and L 1 -Related Minimization Compressed Sensing and L 1 -Related Minimization Yin Wotao Computational and Applied Mathematics Rice University Jan 4, 2008 Chinese Academy of Sciences Inst. Comp. Math The Problems of Interest Unconstrained

More information

Real-Time Principal Component Pursuit

Real-Time Principal Component Pursuit Real-Time Principal Component Pursuit Graeme Pope, Manuel Baumann, Christoph Studer, and Giuseppe Durisi Dept. of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland e-mails:

More information

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer David G. Luenberger Yinyu Ye Linear and Nonlinear Programming Fourth Edition ö Springer Contents 1 Introduction 1 1.1 Optimization 1 1.2 Types of Problems 2 1.3 Size of Problems 5 1.4 Iterative Algorithms

More information

A dual domain algorithm for minimum nuclear norm deblending Jinkun Cheng and Mauricio D. Sacchi, University of Alberta

A dual domain algorithm for minimum nuclear norm deblending Jinkun Cheng and Mauricio D. Sacchi, University of Alberta A dual domain algorithm for minimum nuclear norm deblending Jinkun Cheng and Mauricio D. Sacchi, University of Alberta Downloaded /1/15 to 142.244.11.63. Redistribution subject to SEG license or copyright;

More information

Compressive Sensing. Part IV: Beyond Sparsity. Mark A. Davenport. Stanford University Department of Statistics

Compressive Sensing. Part IV: Beyond Sparsity. Mark A. Davenport. Stanford University Department of Statistics Compressive Sensing Part IV: Beyond Sparsity Mark A. Davenport Stanford University Department of Statistics Beyond Sparsity Not all signal models fit neatly into the sparse setting The concept of dimension

More information

arxiv: v1 [cs.cv] 9 Sep 2013

arxiv: v1 [cs.cv] 9 Sep 2013 Learning Transformations for Clustering and Classification arxiv:139.274v1 [cs.cv] 9 Sep 213 Qiang Qiu Department of Electrical and Computer Engineering Duke University Durham, NC 2778, USA Guillermo Sapiro

More information

Mining Sparse Representations: Theory, Algorithms, and Applications

Mining Sparse Representations: Theory, Algorithms, and Applications Mining Sparse Representations: Theory, Algorithms, and Applications Jun Liu, Shuiwang Ji, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona State University What is Sparsity?

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

CSC 411 Lecture 18: Matrix Factorizations

CSC 411 Lecture 18: Matrix Factorizations CSC 411 Lecture 18: Matrix Factorizations Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 18-Matrix Factorizations 1 / 27 Overview Recall PCA: project data

More information

Photometric Stereo with Auto-Radiometric Calibration

Photometric Stereo with Auto-Radiometric Calibration Photometric Stereo with Auto-Radiometric Calibration Wiennat Mongkulmann Takahiro Okabe Yoichi Sato Institute of Industrial Science, The University of Tokyo {wiennat,takahiro,ysato} @iis.u-tokyo.ac.jp

More information

Augmented Lagrangian Methods

Augmented Lagrangian Methods Augmented Lagrangian Methods Mário A. T. Figueiredo 1 and Stephen J. Wright 2 1 Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal 2 Computer Sciences Department, University of

More information

Leaves Machine Learning and Optimization Library

Leaves Machine Learning and Optimization Library Leaves Machine Learning and Optimization Library October 7, 07 This document describes how to use LEaves Machine Learning and Optimization Library (LEMO) for modeling. LEMO is an open source library for

More information

Random Walk Distributed Dual Averaging Method For Decentralized Consensus Optimization

Random Walk Distributed Dual Averaging Method For Decentralized Consensus Optimization Random Walk Distributed Dual Averaging Method For Decentralized Consensus Optimization Cun Mu, Asim Kadav, Erik Kruus, Donald Goldfarb, Martin Renqiang Min Machine Learning Group, NEC Laboratories America

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers Customizable software solver package Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein April 27, 2016 1 / 28 Background The Dual Problem Consider

More information

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude A. Migukin *, V. atkovnik and J. Astola Department of Signal Processing, Tampere University of Technology,

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation , pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,

More information

Robust l p -norm Singular Value Decomposition

Robust l p -norm Singular Value Decomposition Robust l p -norm Singular Value Decomposition Kha Gia Quach 1, Khoa Luu 2, Chi Nhan Duong 1, Tien D. Bui 1 1 Concordia University, Computer Science and Software Engineering, Montréal, Québec, Canada 2

More information

Lecture 17 Sparse Convex Optimization

Lecture 17 Sparse Convex Optimization Lecture 17 Sparse Convex Optimization Compressed sensing A short introduction to Compressed Sensing An imaging perspective 10 Mega Pixels Scene Image compression Picture Why do we compress images? Introduction

More information

Image Restoration using Accelerated Proximal Gradient method

Image Restoration using Accelerated Proximal Gradient method Image Restoration using Accelerated Proximal Gradient method Alluri.Samuyl Department of computer science, KIET Engineering College. D.Srinivas Asso.prof, Department of computer science, KIET Engineering

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Multilevel Approximate Robust Principal Component Analysis

Multilevel Approximate Robust Principal Component Analysis Multilevel Approximate Robust Principal Component Analysis Vahan Hovhannisyan Yannis Panagakis Stefanos Zafeiriou Panos Parpas Imperial College London, UK {v.hovhannisyan3, i.panagakis, s.zafeiriou, p.parpas}@imperial.ac.uk

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Scanning Real World Objects without Worries 3D Reconstruction

Scanning Real World Objects without Worries 3D Reconstruction Scanning Real World Objects without Worries 3D Reconstruction 1. Overview Feng Li 308262 Kuan Tian 308263 This document is written for the 3D reconstruction part in the course Scanning real world objects

More information

CLEANING UP TOXIC WASTE: REMOVING NEFARIOUS CONTRIBUTIONS TO RECOMMENDATION SYSTEMS

CLEANING UP TOXIC WASTE: REMOVING NEFARIOUS CONTRIBUTIONS TO RECOMMENDATION SYSTEMS CLEANING UP TOXIC WASTE: REMOVING NEFARIOUS CONTRIBUTIONS TO RECOMMENDATION SYSTEMS Adam Charles, Ali Ahmed, Aditya Joshi, Stephen Conover, Christopher Turnes, Mark Davenport Georgia Institute of Technology

More information

Characterizing Improving Directions Unconstrained Optimization

Characterizing Improving Directions Unconstrained Optimization Final Review IE417 In the Beginning... In the beginning, Weierstrass's theorem said that a continuous function achieves a minimum on a compact set. Using this, we showed that for a convex set S and y not

More information

Stereo and Epipolar geometry

Stereo and Epipolar geometry Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka

More information