A primal-dual framework for mixtures of regularizers

Size: px

Start display at page:

Download "A primal-dual framework for mixtures of regularizers"

Derrick Wheeler
5 years ago
Views:

1 A primal-dual framework for mixtures of regularizers Baran Gözcü Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland EPFL [March 25, 2015] Joint work with Luca Baldassarre, Quoc Tran Dinh, Cosimo Aprile and Volkan LIONS

2 Outline Mixture of regularizers Constrained convex minimization: A primal-dual framework Application Conclusion A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 2/ 28

3 Overview of compressive imaging System model y = Mx + ω (1) M is the measurement matrix (Fourier, Gaussian etc.) x is the image that is in vectorized form ω is the image that is in vectorized form b is the measurement vector Solution Then we solve min x W x 1 subject to Mx = y (2) where W is the sparsity basis. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 3/ 28

4 Why a mixture model? Signals can posseses various structures at the same time. For example the reconstruction with TV norm x TV = i,j ( (x)) i,j 2 will introduce flat regions. min x subject to x TV Mx = y What happens if we solve with a mixture of regularizers? min x α x TV + (1 α) x 1 subject to Mx = y (3) (4) A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 4/ 28

Illustration: Mixture model performs better SNR

TV TV&BP 21 20.8 20.6 20.4 tv-bp max SNR 20.

5 Illustration: Mixture model performs better SNR (db) Original (2048x2048,%15 of DCT) Original BP TV TV&BP tv-bp max SNR , A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 5/ 28

6 General Problem min x subject to p f (x) := i=0 f i(x) Ax = b (5) How to solve with computational efficiency guarantee on objective function guarantee on feasibility A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 6/ 28

7 Outline Mixture of regularizers Constrained convex minimization: A primal-dual framework Application Conclusion A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 7/ 28

8 Swiss army knife of convex formulations Our primal problem prototype: A simple mathematical formulation 1 { } f := min f (x) : Ax = b, x X, (6) x R p f is a proper, closed and convex function, and X is a nonempty, closed convex set. A R n p and b R n are known. An optimal solution x to (6) satisfies f (x ) = f, Ax = b and x X. 1 We can simply replace Ax = b with Ax b C for a convex cone C without any fundamental change. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 8/ 28

9 Swiss army knife of convex formulations Our primal problem prototype: A simple mathematical formulation 1 { } f := min f (x) : Ax = b, x X, (6) x R p f is a proper, closed and convex function, and X is a nonempty, closed convex set. A R n p and b R n are known. An optimal solution x to (6) satisfies f (x ) = f, Ax = b and x X. Example to keep in mind in the sequel { } x := arg min x 1 : Ax = b, x 1 x R p 1 We can simply replace Ax = b with Ax b C for a convex cone C without any fundamental change. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 8/ 28

10 Swiss army knife of convex formulations Our primal problem prototype: A simple mathematical formulation 1 { } f := min f (x) : Ax = b, x X, (6) x R p f is a proper, closed and convex function, and X is a nonempty, closed convex set. A R n p and b R n are known. An optimal solution x to (6) satisfies f (x ) = f, Ax = b and x X. Example to keep in mind in the sequel { } x := arg min x 1 : Ax = b, x 1 x R p Broader context for (6): Standard convex optimization formulations: linear programming, convex quadratic programming, second order cone programming, semidefinite programming and interior point algorithms. Reformulations of existing unconstrained problems via convex splitting: composite convex minimization, consensus optimization,... 1 We can simply replace Ax = b with Ax b C for a convex cone C without any fundamental change. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 8/ 28

11 Numerical ɛ-accuracy Exact vs. approximate solutions Computing an exact solution x to (6) is impracticable unless problem has a closed form solution, which is extremely limited in reality. Numerical optimization algorithms result in x ɛ that approximates x up to a given accuracy ɛ in some sense. In the sequel, by ɛ-accurate solutions x ɛ of (6), we mean the following Definition (ɛ-accurate solutions) Given a numerical tolerance ɛ 0, a point xɛ Rp is called an ɛ-solution of (6) if f (xɛ ) f ɛ (objective residual), Axɛ b ɛ (feasibility gap), xɛ X (exact simple set feasibility). Indeed, ɛ can be different for the objective, feasibility gap, or the iterate residual. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 9/ 28

12 The optimal solution set Before we talk about algorithms, we must first characterize what we are looking for! Optimality condition The optimality condition of min x R p {f (x) : Ax = b} can be written as { 0 A T λ + f (x ), 0 = Ax b. (Subdifferential) f (x) := {v R p : f (y) f (x) + v T (y x), y R p }. This is the well-known KKT (Karush-Kuhn-Tucker) condition. Any point (x, λ ) satisfying (7) is called a KKT point. x is called a stationary point and λ is the corresponding multipliers. Lagrange function and the minimax formulation We can naturally interpret the optimality condition via a minimax formulation max λ min L(x, λ), x dom(f ) where λ R n is the vector of Lagrange multipliers or dual variables w.r.t. Ax = b associated with the Lagrange function: L(x, λ) := f (x) + λ T (Ax b) (7) A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 10/ 28

13 Finding an optimal solution A plausible strategy: To solve the constrained problem (6), we therefore seek the solutions (x, λ ) arg max min L(x, λ), λ x X which we can naively brake down into two in general nonsmooth problems: Lagrangian subproblem: x (λ) arg min x X {L(x, λ) := f (x) + λ, Ax b } Dual problem: λ arg max λ {d(λ) := L(x (λ), λ)} The function d(λ) is called the dual function. The optimal dual objective value is d = d(λ ). The dual function d(λ) is concave. Hence, we can attempt the following strategy: 1. Find the optimal solution λ of the convex dual problem. 2. Obtain the optimal primal solution x = x (λ ) via the convex primal problem. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 11/ 28

14 Finding an optimal solution A plausible strategy: To solve the constrained problem (6), we therefore seek the solutions (x, λ ) arg max min L(x, λ), λ x X which we can naively brake down into two in general nonsmooth problems: Lagrangian subproblem: x (λ) arg min x X {L(x, λ) := f (x) + λ, Ax b } Dual problem: λ arg max λ {d(λ) := L(x (λ), λ)} The function d(λ) is called the dual function. The optimal dual objective value is d = d(λ ). The dual function d(λ) is concave. Hence, we can attempt the following strategy: 1. Find the optimal solution λ of the convex dual problem. 2. Obtain the optimal primal solution x = x (λ ) via the convex primal problem. Challenges for the plausible strategy above 1. Establishing its correctness 2. Computational efficiency of finding an ɛ-approximate optimal dual solution λ ɛ 3. Mapping λ ɛ x ɛ (i.e., ɛ(ɛ)), where ɛ is for the original constrained problem (6) A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 11/ 28

15 Finding an optimal solution A plausible strategy: To solve the constrained problem (6), we therefore seek the solutions (x, λ ) arg max min L(x, λ), λ x X which we can naively brake down into two in general nonsmooth problems: Lagrangian subproblem: x (λ) arg min x X {L(x, λ) := f (x) + λ, Ax b } Dual problem: λ arg max λ {d(λ) := L(x (λ), λ)} The function d(λ) is called the dual function. The optimal dual objective value is d = d(λ ). The dual function d(λ) is concave. Hence, we can attempt the following strategy: 1. Find the optimal solution λ of the convex dual problem. 2. Obtain the optimal primal solution x = x (λ ) via the convex primal problem. Challenges for the plausible strategy above 1. Establishing its correctness: Assume f > and Slater s condition for f = d 2. Computational efficiency of finding an ɛ-approximate optimal dual solution λ ɛ 3. Mapping λ ɛ x ɛ (i.e., ɛ(ɛ)), where ɛ is for the original constrained problem (6) A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 11/ 28

16 Nesterov s smoothing idea: From O ( ) ( ) 1 ɛ to O 1 ɛ 2 When can the dual function have Lipschitz gradient? When f (x) is γ-strongly convex, the dual function d(λ) is A 2 -Lipschitz gradient. γ (Strong convexity) f (x) is γ-strongly convex iff f (x) γ 2 x 2 2 is convex. d(λ) = min f (x) γ x:x X 2 x 2 2 } {{ } convex & possibly nonsmooth + λ, Ax b + γ 2 x 2 2 } {{ } leads to d F L ( ) AGM automatically obtains d d(x k 1 ) ɛ with k = O ɛ A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 12/ 28

17 Nesterov s smoothing idea: From O ( ) ( ) 1 ɛ to O 1 ɛ 2 When can the dual function have Lipschitz gradient? When f (x) is γ-strongly convex, the dual function d(λ) is A 2 -Lipschitz gradient. γ (Strong convexity) f (x) is γ-strongly convex iff f (x) γ 2 x 2 2 is convex. d(λ) = min f (x) γ x:x X 2 x 2 2 } {{ } convex & possibly nonsmooth + λ, Ax b + γ 2 x 2 2 } {{ } leads to d F L Nesterov s smoother [3] We add a strongly convex term to Lagrange subproblem so that the dual is smooth! d γ(λ) = min f (x) + λ, Ax b + γ x:x X 2 x xc 2 2, with a center point xc X d γ(λ) = Ax γ(λ) b (x γ(λ): the γ-lagrangian subproblem solution) 1. d γ(λ) γd X d(λ) d γ(λ), where D X = max x X 1 2 x xc x k of AGM on d γ(λ) has d d(x k ) γd X + d γ dγ(xk ) γd X + 2 A 2 R 2 γ(k+2) We minimize the upperbound wrt γ and obtain d d(x k ) ɛ with k = O ( 1 ɛ ). A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 12/ 28

18 Computational efficiency: The key role of the prox-operator Smoothed dual: d γ (λ) = min x:x X f (x) + λ, Ax b + γ 2 x x c 2 2 x (λ) = prox f /γ (x c 1 ) γ AT λ Definition (Prox-operator) Key properties: prox g (x) := arg min z R p{g(z) + (1/2) z x 2 }. distributes when the primal problem has decomposable structure: f (x) := i=1 where m 1 is the number of components. m f i (x i ), and X := X 1 X m. often efficient & has closed form expression. For instance, if g(z) = z 1, then the prox-operator performs coordinate-wise soft-thresholding by 1. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 13/ 28

19 Going from the dual ɛ to the primal ɛ I Optimality condition (revisted) Two equivalent ways of viewing the optimality condition of the primal problem (6) mixed variational inequality (MVIP) inclusion { f (x) f (x ) + M(z ) T (z z ) 0, z X R n 0 A T λ + f (x ), = 0 = Ax b. [ ] A where M(z) := T λ and z Ax b := (x, λ ) is a primal-dual solution of (6). A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 14/ 28

20 Going from the dual ɛ to the primal ɛ I Optimality condition (revisted) Two equivalent ways of viewing the optimality condition of the primal problem (6) mixed variational inequality (MVIP) inclusion { f (x) f (x ) + M(z ) T (z z ) 0, z X R n 0 A T λ + f (x ), = 0 = Ax b. [ ] A where M(z) := T λ and z Ax b := (x, λ ) is a primal-dual solution of (6). Measuring progress via the gap function Unfortunately, measuring progress with the inclusion formulation is hard. However, associated with MVIP, we can define a gap function to measure our progress { } G(z) := max f (x) f (ˆx) + M(z) T (z ẑ). (8) ẑ X R n Key observations: G(z) = max f (x) + ˆλ, Ax b min f (ˆx) + λ, Aˆx b 0, z X R n ˆλ R } n ˆx X {{ }} {{ } =f (x) if Ax=b, o/w =d(λ) G(z ) = 0 iff z := (x, λ ) is a primal-dual solution of (6). Primal accuracy ɛ and the dual accuracy ɛ can be related via the gap function. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 14/ 28

21 Going from the dual ɛ to the primal ɛ II A smoothed gap function measuring the excessive primal-dual gap We define a smoothed version of the gap function G γβ (z) = max f (x) + ˆλ, Ax b β ˆλ R n 2 ˆλ ˆλ c 2 2 min f (ˆx) + λ, Aˆx b + γ ˆx X 2 ˆx ˆxc 2 2 } {{ }} {{ } =f β (x)=f (x)+ ˆλ c,ax b + 2β 1 Ax b 2 =d γ (λ) 2 where (ˆx c, ˆλ c) X R n are primal-dual center points. In the sequel, they are 0. The primal accuracy ɛ is related to our primal model estimate f β (x) The dual accuracy ɛ is related to our smoothed dual function d γ(λ) We must relate G γβ (z) to G(z) so that we can tie ɛ to ɛ A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 15/ 28

22 Going from the dual ɛ to the primal ɛ II A smoothed gap function measuring the excessive primal-dual gap We define a smoothed version of the gap function G γβ (z) = max f (x) + ˆλ, Ax b β ˆλ R n 2 ˆλ ˆλ c 2 2 min f (ˆx) + λ, Aˆx b + γ ˆx X 2 ˆx ˆxc 2 2 } {{ }} {{ } =f β (x)=f (x)+ ˆλ c,ax b + 2β 1 Ax b 2 =d γ (λ) 2 where (ˆx c, ˆλ c) X R n are primal-dual center points. In the sequel, they are 0. The primal accuracy ɛ is related to our primal model estimate f β (x) The dual accuracy ɛ is related to our smoothed dual function d γ(λ) We must relate G γβ (z) to G(z) so that we can tie ɛ to ɛ Our algorithm via MEG: model-based excessive gap (cf., [4]) Let G k ( ) := G γk β k ( ). We generate a sequence { z k, γ k, β k } k 0 such that G k+1 ( z k+1 ) (1 τ k )G k ( z k ) + ψ k (MEG) for ψ k 0, rate τ k (0, 1) ( k τ k = ), γ k β k+1 < γ k β k so that G γk β k ( ) G( ). Consequence: G( z k ) 0 + z k z = (x, λ ) (primal-dual solution). A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 15/ 28

23 Going from the dual ɛ to the primal ɛ III An uncertainty relation via MEG The product of the primal and dual convergence rates is lowerbounded by MEG: γ k β k τ 2 k A 2 Note that τ 2 k = Ω ( 1 k 2 ) due to Nesterov s lowerbound. The rate of γ k controls the primal residual: f (x k ) f O (γ k ) The rate of β k controls the feasibility: Ax k b 2 O (β k + τ k ) = O (β k ) They cannot be simultaneously O ( 1 k 2 )! A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 16/ 28

24 Convergence guarantee Theorem [4, 5] 1. When f is strongly convex with µ > 0, we can take γ k = µ and β k = O ( 1 k 2 ) : D Λ Ax k b f (x k ) f 0 Ax k b 4 A 2 (k+2) 2 µ D Λ x k x 4 A (k+2)µ D Λ ( ) ( ) 1 2. When f is non-smooth, the best we can do is γ k = O k and 1 βk = O k : { D Λ Ax k b f (x k ) f 2 2 A D X, k+1 Ax k b 2 2 A (D Λ + D X ) k+1. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 17/ 28

25 Outline Mixture of regularizers Constrained convex minimization: A primal-dual framework Application Conclusion A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 18/ 28

26 An application: Magnetic Resonance Imaging (MRI) Mixture Model: min x 1 2 Mx y α x TV + µ W x 1 + β W x tree (9) x tree := s x gi 2 (10) With z = Gx we can define the tree norm with non-overlapping groups g i s x tree = (Gx) gi 2 (11) i=1 i=1 x_1 g_1 ğ_1 g_2 x_2 G z = ğ_2 ğ_3 x_3 g_3 x_4 A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 19/ 28

27 Wavelet Tree Sparsity Algorithm (WaTMRI) [1] Mixture Model 1 min x,z 2 Mx y α x TV + µ W x 1 + β s (z) gi 2 + λ 2 z GW x 2 2 (12) i=1 Two subproblems: min z gi β (z) gi 2 + λ 2 z g i (GW x) gi 2 2 is solved by proximity operator. min x 1 2 Mx y λ 2 z GW x α x TV + µ W x 1 is solved by FISTA Proximal operator of α x TV + µ W x 1 is solved with an iterative algorithm Fast empirical convergence but does not allow parallelization No guarantee and does not solve the original problem nor the augmented problem A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 20/ 28

28 Solving with the Primal-Dual Framework [2] Mixture Model min x 1 2 Mx } {{ y } x α x }{{} x 1 TV + µ W x }{{} 1 + β x 2 s i=1 (GW x } {{ } x 3 ) g i 2 (13) f 0 (x 0 ) = x 0 2 2, f 1(x 1 ) = α x 1 TV, f 2 (x 2 ) = µ x 2 1, f 3 (x 3 ) = β s i=1 (x 3) gi 2 Decomposable form where 3 min x=[x T 0,...,x T f (x) := p ]T i=0 f i(x i ) (14) subject to Ax = b W I A = 0 G I 0 and b = 0 (15) M 0 0 I y A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 21/ 28

Experimental Setup Original image Subsampling map N = 128 128 MRI brain image sampled via a

2 Note that although we use the same coefficient values for α, β, µ, WaTMRI addresses the

29 Experimental Setup Original image Subsampling map N = MRI brain image sampled via a partial Fourier operator at a subsampling ratio of 0.2 Note that although we use the same coefficient values for α, β, µ, WaTMRI addresses the augmented problem without the constraint. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 22/ 28

30 Results 10 7 Objective Decopt 1P2D WaTMRI 10 5 Feasibility Decopt 1P2D WaTMRI SNR (db) Decopt 1P2D WaTMRI iteration iteration iteration (a) (b) (c) Figure : MRI experiment. (a) Objective function vs iterations. (b) Feasibility gap z GW x 2 vs iterations. (c) Signal-to-noise ratio of the iterates vs iterations. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 23/ 28

31 Outline Mixture of regularizers Constrained convex minimization: A primal-dual framework Application Conclusion A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 24/ 28

32 Conclusion Reliable solver for mixture of regularizers Convergence guarantee on both objective and feasibility gap Can handle as many regularizers as we want Requires only proximal operator computations and parallelizable A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 25/ 28

33 References [1] C. Chen and J. Huang. Compressive sensing mri with wavelet tree sparsity. In Advances in neural information processing systems, [2] Baran Gözcü, Luca Baldassarre, Quoc Tran-Dinh, Cosimo Aprile, and Volkan Cevher. A primal-dual framework for mixtures of regularizers [3] Yu. Nesterov. Smooth minimization of non-smooth functions. Math. Program., Ser. A, 103: , [4] Quoc Tran-Dinh and Volkan Cevher. Constrained convex minimization via model-based excessive gap. In Conference of Neural Information Processing Systems (NIPS), [5] Quoc Tran-Dinh and Volkan Cevher. A primal-dual algorithmic framework for constrained convex minimization. Technical report, EPFL, A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 26/ 28

34 A primal-dual framework for mixtures of regularizers Baran Gözcü, Slide 27/ 28

35 1P2D Algorithm Update the primal-dual sequence { z k } We can design different strategies to update {z k }. For instance: ˆλ k := (1 τ k ) λ k + τ k λ β ( x k ) k x k+1 := (1 τ k ) x k + τ k xγ k+1 (ˆλ k ) λ k+1 := ˆλ k + α k (Axγ k+1 (ˆλ k ) b) (1P2D) where α k := γ k+1 A 2 (Bregman), or α k := γ k+1 (augmented Lagrangian). Update parameters The parameters β k and γ k are updated as (c k ( 1, 1] given): γ k+1 := (1 c k τ k )γ k and β k+1 = (1 τ k )β k (16) The parameter τ k is updated as: ( a k+1 := 1 + c k+1 + 4ak 2 + (1 c k+1) 2) /2, and τ k+1 = a 1 k+1. A primal-dual framework for mixtures of regularizers Baran Gözcü, baran.goezcue@epfl.ch Slide 28/ 28

Composite Self-concordant Minimization

Composite Self-concordant Minimization Volkan Cevher Laboratory for Information and Inference Systems-LIONS Ecole Polytechnique Federale de Lausanne (EPFL) volkan.cevher@epfl.ch Paris 6 Dec 11, 2013 joint